An unsupervised machine learning model for discovering latent infectious diseases using social media data

Lim, Sunghoon; Tucker, Conrad S.; Kumara, Soundar

doi:10.1016/j.jbi.2016.12.007

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

임성훈

Lim, Sunghoon: Industrial Intelligence Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.endPage	94	-
dc.citation.startPage	82	-
dc.citation.title	JOURNAL OF BIOMEDICAL INFORMATICS	-
dc.citation.volume	66	-
dc.contributor.author	Lim, Sunghoon	-
dc.contributor.author	Tucker, Conrad S.	-
dc.contributor.author	Kumara, Soundar	-
dc.date.accessioned	2023-12-21T22:39:31Z	-
dc.date.available	2023-12-21T22:39:31Z	-
dc.date.created	2018-08-21	-
dc.date.issued	2017-02	-
dc.description.abstract	Introduction: The authors of this work propose an unsupervised machine learning model that has the ability to identify real-world latent infectious diseases by mining social media data. In this study, a latent infectious disease is defined as a communicable disease that has not yet been formalized by national public health institutes and explicitly communicated to the general public. Most existing approaches to modeling infectious-disease-related knowledge discovery through social media networks are top-down approaches that are based on already known information, such as the names of diseases and their symptoms. In existing top-down approaches, necessary but unknown information, such as disease names and symptoms, is mostly unidentified in social media data until national public health institutes have formalized that disease. Most of the formalizing processes for latent infectious diseases are time consuming. Therefore, this study presents a bottom-up approach for latent infectious disease discovery in a given location without prior information, such as disease names and related symptoms. Methods: Social media messages with user and temporal information are extracted during the data preprocessing stage. An unsupervised sentiment analysis model is then presented. Users' expressions about symptoms, body parts, and pain locations are also identified from social media data. Then, symptom weighting vectors for each individual and time period are created, based on their sentiment and social media expressions. Finally, latent-infectious-disease-related information is retrieved from individuals' symptom weighting vectors. Datasets and results: Twitter data from August 2012 to May 2013 are used to validate this study. Real electronic medical records for 104 individuals, who were diagnosed with influenza in the same period, are used to serve as ground truth validation. The results are promising, with the highest precision, recall, and F1 score values of 0.773, 0.680, and 0.724, respectively. Conclusion: This work uses individuals' social media messages to identify latent infectious diseases, without prior information, quicker than when the disease(s) is formalized by national public health institutes. In particular, the unsupervised machine learning model using user, textual, and temporal information in social media data, along with sentiment analysis, identifies latent infectious diseases in a given location.	-
dc.identifier.bibliographicCitation	JOURNAL OF BIOMEDICAL INFORMATICS, v.66, pp.82 - 94	-
dc.identifier.doi	10.1016/j.jbi.2016.12.007	-
dc.identifier.issn	1532-0464	-
dc.identifier.scopusid	2-s2.0-85009133616	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/24675	-
dc.identifier.url	https://linkinghub.elsevier.com/retrieve/pii/S1532046416301812	-
dc.identifier.wosid	000409293100008	-
dc.language	영어	-
dc.publisher	ACADEMIC PRESS INC ELSEVIER SCIENCE	-
dc.title	An unsupervised machine learning model for discovering latent infectious diseases using social media data	-
dc.type	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.