Twitter-ae-lda.dvi

Topic Modeling for Discovering Drug-related Adverse Events from Social Media Mengjun Xie, PhD1, Jiang Bian, PhD2, Umit Topaloglu, PhD2 1University of Arkansas at Little Rock, 2University of Arkansas for Medical Sciences AbstractEarly detection of drug-related adverse events benefits both the drug regulators and manufacturers for pharmacovigi-lance. Existing adverse events detection mainly relies on patients’ “spontaneous” self-reports, in which many problemsarise. With increasing popularity, social media such as Twitter opens a new horizon for detecting signals of adverseevents. Here we present a new topic modeling based approach that can effectively analyze tweets and find the drug usersand potential adverse events resulting from the drug of interest.
Introduction and BackgroundSocial media especially Twitter, given its deep roots in personal and social life, has a great potential of becoming a newdata feed for real-time pharmacovigilance. For example, from such a Twitter message (i.e., tweet) “this warm weather+ tamoxifen hot flushes is a nightmare!” we can infer the possible drug use (“tamoxifen”) and associated side effects(“hot flushes”) about the tweet publisher. The knowledge discovered through social media mining can certainly be usedto identify new novel adverse events (AEs) in a timely manner and to confirm those adverse events reported throughexisting mechanisms.
In 1, we proposed an analytic framework for detection of drug-related AEs using Twitter messages. Specifically, webuilt Support Vector Machine (SVM) models with textural and ontological features extracted from tweets. However,the feature extraction methods we applied are not designed for tweet-like social media contents that often contain1) misspellings, odd abbreviations and non-word terms, and 2) fragmented and ungrammatical sentences. Thus, theidentification accuracy of the models are not satisfying (∼ 74%). In this study, we employ a powerful topic modelingmethod for feature extraction.
MethodsThe process of discovering AEs from tweets has two steps: 1) identifying the users of the drug of interest, 2) findingpossible side effects attributed to the use of the drug. In both steps, an SVM classification model is built based onfeatures extracted from tweet content. Specifically, we apply the Latent Dirichlet allocation (LDA) 2, a generativeprobabilistic model, to categorize the collection of tweets into latent topics. In the LDA model, each document (tweet(s)in our study)—treated as a vector of word counts using the bag-of-words (Bow) approach—is viewed as a mixture ofprobabilities over the topics, where each topic is represented as a probability distribution over a set of words. We thenuse the probability distribution of topics as features to train the SVM prediction models. Before applying the LDAalgorithm, we first sanitize the dataset by removing the hyperlinks, hash-tags “#” and reply-tags “@”, and then applythe WordNet Lemmatizer in the NTLK 3 to group the different inflected forms of a word into its lemma. By doing so,we create a “clean” dictionary for the LDA model.
Results and DiscussionThe performance of both SVM models has been much improved compared to our previous study. The mean accuracyof the model for identifying drug users is 79% (ROC-AUC value is 0 87). For the AE classification model, the accuracy is 81% (ROC-AUC is 0 86). We believe that the performance improvement is mainly due to the improved features.
In the previous study, the ontological features are extracted using MetaMap based on UMLS Metathesaurus concepts.
The MetaMap’s NLP models are trained on biomedical literature (e.g., medical journal articles) that 1) contain lesslanguage errors and 2) are formulated with standard medical terminologies. However, Twitter users have a very differentvocabulary and their writing styles are usually informal. In this study, we apply the LDA model directly to the tweetswithout introducing a different vocabulary. Furthermore, the methods for obtaining textual features in the previous studyrequire manual analyses of the tweets to identify language features related to the tasks. In contrast, LDA is a data-drivenmethod, where no prior knowledge of the topics and any explicit “understanding” of the language are required.
1. J. Bian, U. Topaloglu and F. Yu. Towards large-scale twitter mining for drug-related adverse events. In Proceedings of the International Workshop on Smart Health and Wellbeing, SHB ’12, Sheraton, Maui Hawaii, 2012. ACM.
2. M. Blei D., Ng A. Y. and Jordan M. I. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, March 2003. ISSN 1532-4435.
3. NLTK Project . Natural language toolkit. http://nltk.org/, Oct 2012.

Source: http://jiangbian.info/papers/2013/twitter-ae-lda.pdf

Cv carlos dujovne md

CURRICULUM VITAE CARLOS A. DUJOVNE, M.D. Internist,Basic & Clinical Pharmacologist,Clinical Lipidologist, Clinical trials’Investigator Phases I-IV, R&D of drugs-nutraceuticals for prevention and treatment of Cardio-metabolic risks. 1920 SW River Drive, Portland, Oregon 97201 APPOINTMENTS, ACADEMIC TRAINING AND EDUCATION 1998- PresentChairman and Medical Director, Kansas Fou

Bio data

__________________________________________________________________________ Dr. PRODEEP PHUKAN PROFESSOR , Department of Chemistry Gauhati University, GUWAHATI – 781014, Assam, INDIA Tel: +91-0361-2570535 (O), +91-94351-11114(Cell), Fax: +91-0361-2700311 (O) E-mail: [email protected] __________________________________________________________________________ Educational Qualific

Copyright © 2018 Medical Abstracts