Application of Twitter and web news mining in infectious disease surveillance systems and prospects for public health

dgkh000334 10.3205/dgkh000334 urn:nbn:de:0183-dgkh0003343 Research Article Application of Twitter and web news mining in infectious disease surveillance systems and prospects for public health Anwendung von Twitter und Web News Mining in Überwachungssystemen für Infektionskrankheiten und Perspektiven der öffentlichen Gesundheit Jahanbin Jahanbin Kia K

Research Center for social determinants of health, Jahrom University of Medical Sciences, Jahrom, Iran

author Rahmanian Rahmanian Fereshte F

Research Center for social determinants of health, Jahrom University of Medical Sciences, Jahrom, Iran

author Rahmanian Rahmanian Vahid V

Zoonoses Research Center, Jahrom University of Medical Sciences, Jahrom, Iran, Phone: +98 9175985204Zoonoses Research Center, Jahrom University of Medical Sciences, Jahrom, Iran

vahid.rahmani1392@gmail.com author Jahromi Jahromi Abdolreza Sotoodeh AS

Zoonoses Research Center, Jahrom University of Medical Sciences, Jahrom, Iran

author German Medical Science GMS Publishing House

Düsseldorf

610 fuzzy classification surveillance system Twitter text mining infectious disease Fuzzy Klassifikation Surveillance System Twitter Text Mining Infektionskrankheit 20191202 engl This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). 2196-5226 14 GMS Hygiene and Infection Control GMS Hyg Infect Control 19 Zielsetzung: Mit der Weiterentwicklung der Kommunikationstechnologie und dem wachsenden Zugang zu sozialen Netzwerken spielen diese Netzwerke eine wichtige Rolle zur Verbreitung von Informationen und Nachrichten, ohne dass die zeitaufwendigen Kanäle offizieller Nachrichtennetzwerke durchlaufen werden müssen. Die Analyse sozialer Netzwerkdaten ist ein neuer, interessanter Zweig der Text-Mining-Wissenschaft. Diese Studie zielt darauf ab, eine Text-Mining-Technik zu entwickeln, um Informationen über Infektionskrankheiten aus Tweets und Nachrichten in sozialen Medien zu extrahieren.Methode: Als Analysemethode wurde der sog. „Fuzzy-Algorithmus zur Extraktion, Überwachung und Klassifizierung von Infektionskrankheiten“ (FAEMC-ID) unter Verwendung des Fuzzy-Modells des Takagi-Sugeno-Kang-Typs entwickelt. Zusätzlich zur Echtzeitklassifizierung kann die Methode neue Schlüsselwörter aktualisieren und die klassifizierten Daten auf der Weltkarte visualisieren, um Hochrisikobereiche zu markieren.Ergebnisse: Als Beispiel wurde das Monitoring für Nachrichten mit Bezug zu Masern über einen Zeitraum von 183 Stunden vom 01.03.2014 (01:00 Uhr) bis 08.03.2014 (12:00 Uhr) durchgeführt, das 2.870 Tweets von 2.556 Benutzern umfasste. Das Monitoring ergab als Anzahl der von jeder Region geposteten Tweets 1 und 47 mit der höchsten Anzahl von 47 Tweets aus Kanada. Der Ursprung der meisten Nachrichten über Masern war in Amerika und Europa; die Tweets stammten größtenteils aus den Vereinigten Staaten und Kanada.Schlussfolgerung: Die Analyse der entwickelten Methode liefert im Vergleich zu anderen Algorithmen in der Literatur eine ausgezeichnete Präzision mit einer Rückrufquote von 88,41% und einer hohen Interkorrelation der Daten in jeder Klasse. Der vorgeschlagene Algorithmus kann auch zur Entwicklung wirksamer Überwachungs- und Nachverfolgungssysteme für andere Gesundheitsgefahren für Mensch und Tier verwendet werden. Aims: With the advancements of communication technology and growing access to social networks, these networks now play an important role in the dissemination of information and news without going through the time-consuming channels of official news networks. Analysis of social networking data is a new, interesting branch of text mining science. This study aimed to develop a text mining technique for extracting information about infectious diseases from tweets and news on social media.Methods: A method called “Fuzzy Algorithm for Extraction, Monitoring, and Classification of Infectious Diseases” (FAEMC-ID) was developed by the use of fuzzy modeling of the Takagi-Sugeno-Kang type. In addition to the real-time classification, the method is able to update its vocabulary for new keywords and visualize the classified data on the world map to mark the high risk areas.Results: As an example, the monitoring was performed for measles-related news items over a 183-hour period from 01/03/2019 (01:00 am) to 08/03/2019 (12:00 pm), which were related to 2,870 tweets from 2,556 users. This monitoring showed that the number of tweets posted from each region ranged from 1 to 47, with the highest number, 47 tweets, belonging to Canada. The origins of most measles-related news were in the Americas and Europe, and they were mostly from the United States and Canada.Conclusion: The performance analysis of the developed method in comparison with other algorithms in the literature demonstrated the excellent precision of the method with a recall ratio of 88.41% and the high inter-correlation of data in each class. The proposed algorithm can also be used in the development of more effective monitoring and tracking systems for other human and even animal health hazards. IntroductionToday, social media generate vast amounts of data on a daily basis in a wide variety of areas including technology, medicine, history, political and social news, sports, and many other fields. These data can be refined and analyzed to extract economically and scientifically valuable knowledge and have therefore piqued the interest of researchers in many areas , , .In recent years, big data science has emerged as a powerful tool for collecting, storing, managing, and analyzing data on a large scale . Big data can be characterized by five features: volume, variety, velocity, variability, and veracity. Among these features, the most important is the volume or size, according to which data can be classified into three categories :Structured: Data that is organized in a predefined schema.Semi-structured: Data that does not require a predefined schema.Unstructured: Data that is stored without any defined structure or schema.A great portion of all data produced and consumed across the world is in textual form. The science of text mining is focused on the extraction of high-quality information from textual data . The major applications of text mining include texts categorization, concept/entity extraction, text clustering, text summarization, sentiment analysis, and entity relation modeling .Web-news mining from media and social networks is one of the major applications of text mining in social sciences. An automated news-mining-based system can monitor, analyze, and classify news according to its contents, which is useful not only for managing news articles but also for developing recommenders and security systems .Twitter is one of the world’s most popular social networks. The highly interesting applications of this micro-blogging platform have attracted the attention of researchers. At present, Twitter has over 11 million active users, who post about 6 million tweets every day, including instant messages and comments. Given the easily accessible and extremely rich information contained in tweets, they can be used in a wide range of applications, including the analysis of political trends, product performance, and the monitoring of health-related events , .In the model proposed in this paper, the unstructured data about infectious diseases like influenza, HIV/AIDS, malaria, measles, poliomyelitis, tuberculosis, plague, Ebola and cholera are extracted from Twitter and then subjected to text cleanup, term filtering, and finally categorization operations. Since the focus of the work is on real-time application, the model is implemented with the help of a fuzzy rule-based evolutionary algorithm called Eclass1-MIMO.Literature reviewIn 2014, the term “social big data” was used for the first time to refer to the data generated by social networks , . This includes, for example, the 30 million tweets posted every day, the 3,000 photos uploaded to Flickr every minute, and the 15 million blog posts written on a daily basis. These social networking data can have scientifically and economically significant uses in many fields including sociology, psychology, politics, commerce, and healthcare , , , .Text mining can be discussed from two perspectives – the type of knowledge extracted and applications. Applications of text mining can be categorized as follows:Security applications: Text mining packages have exte</PlainText></TextGroup>ns<TextGroup><PlainText>ive</PlainText></TextGroup> use in security software, especially for analyzing online plain texts such as websites and weblogs for national security protection purposes <TextLink reference="8"></TextLink>.</ListItem><ListItem level="1"><Mark1>Biomedical applications: </Mark1>A wide range of text mining tools and software has been developed for biomedical <TextGroup><PlainText>applic</PlainText></TextGroup>at<TextGroup><PlainText>ions</PlainText></TextGroup> <TextLink reference="10"></TextLink>. For example, PubGene is a well-known Internet service that combines biomedical text mining with network visualization <TextLink reference="14"></TextLink>.</ListItem><ListItem level="1"><Mark1>Online media applications: </Mark1>Media corporations such as the Tribune Company have utilized text mining to achieve enhanced data clarity and create more interesting contents for readers. This science has also been used in the public sector to develop software for the monitoring and tracking of terrorist activities <TextLink reference="15"></TextLink>.</ListItem><ListItem level="1"><Mark1>Business and marketing applications:</Mark1> Text mining is finding extensive use in business and marketing intelligence and particularly in customer relations management <TextLink reference="16"></TextLink>, <TextLink reference="17"></TextLink>.</ListItem><ListItem level="1"><Mark1>Sentiment analysis: </Mark1>Sentiment analysis can be discussed from the perspective of the type of information extracted and its application. For example, sentiment analysis has been used for the analysis of movie reviews <TextLink reference="18"></TextLink> and also for comment recognition in the field of artificial emotional intelligence <TextLink reference="2"></TextLink>, <TextLink reference="19"></TextLink>.</ListItem><ListItem level="1"><Mark1>Academic applications: </Mark1>Text mining is one of the major tools that large publishers use for data categorization and retrieval from large databases <TextLink reference="8"></TextLink>.</ListItem><ListItem level="1"><Mark1>Text categorization: </Mark1>Text categorization is an automatic process whereby text data are organized into multiple predefined categories or classes. One of the applications of text categorization is the opinion categorization, which gives an insight into the opinion of users of social networks like Facebook or Twitter about a certain topic (e.g. a law, a treatment, a political view, etc.) <TextLink reference="20"></TextLink>.</ListItem><ListItem level="1"><Mark1>Text clustering: </Mark1>Unlike text categorization, text clustering is focused on the unsupervised management of text documents <TextLink reference="21"></TextLink>.</ListItem><ListItem level="1"><Mark1>Text summarization:</Mark1> Automatic text summarization algorithms are language-independent (multilingual) tools for generating a summary of a text <TextLink reference="5"></TextLink>, <TextLink reference="22"></TextLink>, <TextLink reference="23"></TextLink>.</ListItem></UnorderedList></Pgraph><Pgraph><LineBreak></LineBreak>This paper presents a method based on a Takagi-Sugeno-Kang (TSK) fuzzy system called the Eclass1-MIMO model for the categorization of news on Twitter about infectious diseases with epidemic potential. In developing the method, the authors aim to create an accurate text categorization system with real-time applicability for marking high risk areas based on tweets for improved monitoring and timely control of growing epidemics and related damage.</Pgraph></TextBlock> <TextBlock linked="yes" name="Methods"> <MainHeadline>Methods</MainHeadline><Pgraph>One of the most effective ways to prevent and control epidemics is to monitor and track the news about the spread of contagious diseases. This section explains the general frame and main structure of the proposed model for the collection of raw data about a select group of contagious diseases from related news and tweets and the analysis of these data. </Pgraph><Pgraph>The proposed method consists of 4 phases:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">Data cleanup and integration and term extraction</ListItem><ListItem level="1" levelPosition="2" numString="2.">Web and tweet crawling </ListItem><ListItem level="1" levelPosition="3" numString="3.">Applying fuzzy rules and fuzzy classifier</ListItem><ListItem level="1" levelPosition="4" numString="4.">Visualization</ListItem></OrderedList></Pgraph><Pgraph>The first phase consists of data cleanup, data integration, and term extraction steps. The term extraction step consists of letter case homogenization (transforming all words to lowercase), tokenization, stemming, filtering stop words (removing pronouns, auxiliary verbs, and so on), and term filtering with the TF-IDF method.</Pgraph><Pgraph>In the proposed method, classification and evolving fuzzy rules are developed with the help of fuzzy rule-based classification package (FRBS) <TextLink reference="24"></TextLink>, <TextLink reference="25"></TextLink>. The evolving fuzzy system plays a fundamental role in the text analysis, i.e. updating the terms being extracted from the database <TextLink reference="26"></TextLink>. This is important because, considering the large volume and unpredictable nature of the news and tweets related to infectious diseases and the likely emergence of new terms over time, the terms used in classification must be regularly updated. To resolve this issue, the proposed method makes use of evolving fuzzy rules and implements the text classification scheme with the Eclass1-MIMO method based on TSK rules <TextLink reference="27"></TextLink>, <TextLink reference="28"></TextLink>, <TextLink reference="29"></TextLink>, <TextLink reference="30"></TextLink>.</Pgraph><Pgraph>The visualization component of the proposed method aims to assist real-time monitoring and tracking of the onset and spread of epidemics, which can greatly contribute to the efficacy of active health and research systems in this area. Details of the proposed method are illustrated in Figure 1 <ImgLink imgNo="1" imgType="figure"/>.</Pgraph><SubHeadline>Data cleanup, data integration, and term extraction</SubHeadline><Pgraph>As shown in Figure 1 <ImgLink imgNo="1" imgType="figure"/>, the first phase of the proposed method consists of three steps:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">Text cleanup: This step involves processing the tweet and news contents to remove redundant characters such as @ (“at” sign), # (hashtag), rt (for retweets), emotions, metadata, links, etc., which should be cleaned before classification <TextLink reference="31"></TextLink></ListItem><ListItem level="1" levelPosition="2" numString="2.">Data integration: After text cleanup, tweets and news are integrated into related classifications.</ListItem><ListItem level="1" levelPosition="3" numString="3.">Term extraction: This step consists of the following processes:</ListItem></OrderedList></Pgraph><Pgraph><UnorderedList><UnorderedList><ListItem level="2"><Mark1>Tokenization:</Mark1> In this step, the streams of textual data are decomposed into words, symbols, phrases, and other meaningful elements as well as keywords that are valuable for classification, clustering and analysis of texts <TextLink reference="1"></TextLink>, <TextLink reference="31"></TextLink>.</ListItem></UnorderedList><UnorderedList><ListItem level="2"><Mark1>Homogenization:</Mark1> In this step, all words in the database are transformed to lowercase in order to prevent <TextGroup><PlainText>redu</PlainText></TextGroup>nd<TextGroup><PlainText>ant</PlainText></TextGroup> terms <TextLink reference="31"></TextLink>.</ListItem></UnorderedList><UnorderedList><ListItem level="2"><Mark1>Stopword filtering:</Mark1> This step involves finding and removing pronouns, prepositions, and “to be” verbs from the text <TextLink reference="32"></TextLink>.</ListItem></UnorderedList><UnorderedList><ListItem level="2"><Mark1>Stemming:</Mark1> In this step, the inflected and derived words (with prefixes, suffixes, etc.) are converted to their base form in order to reduce the number of redundant terms <TextLink reference="1"></TextLink>, <TextLink reference="31"></TextLink>. In this work, stemming is done with the help of the Snowball algorithm <TextLink reference="32"></TextLink>.</ListItem></UnorderedList><UnorderedList><ListItem level="2"><Mark1>n-gram generation:</Mark1> n-gram is an alternating sequence of n items (characters, letters, etc.); an <TextGroup><PlainText>n-gram</PlainText></TextGroup> is said to be a unigram if n=1, bigram if <TextGroup><PlainText>n=2</PlainText></TextGroup>, and trigram if n=3. n-gram generation has extensive use in language identification <TextLink reference="20"></TextLink> and speech recognition, and contributes to the identification of keywords that are not valuable by themselves. In this study, the learning accuracy of the model is improved by the use of bigrams <TextLink reference="33"></TextLink>, <TextLink reference="34"></TextLink>.</ListItem></UnorderedList><UnorderedList><ListItem level="2"><Mark1>Term filtering:</Mark1> The tokenization step extracts all terms of each tweet without considering the frequency of each term, which can reflect its importance. The term filtering step involves removing the terms that rarely appear in the text, the terms that have a constant distribution, and the terms that appear too frequently in the text in order to prevent the redundant growth of the term set <TextLink reference="1"></TextLink>.</ListItem></UnorderedList></UnorderedList></Pgraph><SubHeadline>Database collection method</SubHeadline><Pgraph>In the proposed method, the news about smallpox, influenza, malaria, measles, poliomyelitis, tuberculosis, plague, Ebola and cholera in various news sites is <TextGroup><PlainText>collec</PlainText></TextGroup>ted by a powerful API called Newsapi. This API collects the news of 54 countries from 134 major news organizations including CNN, BBC, CBC, Washington Post, etc. The code written in Ruby for extracting news about measles, for example from Twitter between 01/03/2019 and 08/03/2019 is presented below:<LineBreak></LineBreak><LineBreak></LineBreak></Pgraph><Pgraph>Require 'open-url'</Pgraph><Pgraph>url='https:/newsapi.org/v2/everything'</Pgraph><Pgraph><Indentation>‘language=en&’</Indentation></Pgraph><Pgraph><Indentation>‘q=measles disease&’</Indentation></Pgraph><Pgraph><Indentation>‘from=2019-03-01&’</Indentation></Pgraph><Pgraph><Indentation>‘to=2019-03-08&’</Indentation></Pgraph><Pgraph><Indentation>‘sortBy=relevancy&’</Indentation></Pgraph><Pgraph><Indentation>‘apikey=[write your api]’</Indentation></Pgraph><Pgraph>Req=open(url)</Pgraph><Pgraph>Response_body-req.read</Pgraph><Pgraph>Put response_body<LineBreak></LineBreak><LineBreak></LineBreak></Pgraph><Pgraph>Tweets crawler was coded with the R language. For example, the following code was used to crawl the HIV-related tweets from 01/03/2019 to 08/03/2019:<LineBreak></LineBreak><LineBreak></LineBreak></Pgraph><Pgraph>Library(twitterR)</Pgraph><Pgraph>Consumer_key=“[your consumer_key]”</Pgraph><Pgraph>Consumer_secret=“[your consumer_key]”</Pgraph><Pgraph>Access_token=“[your access_token]”</Pgraph><Pgraph>Access-secret=“[your access_secret]”</Pgraph><Pgraph>Setup_twitter_oauth(Consumer_key, Consumer_secret, Access_token, Access-secret)</Pgraph><Pgraph>Tw=SearchTwitter(“#HIV”,n=1e4,scince='2019-03-01')</Pgraph><SubHeadline>Application of fuzzy rules and fuzzy classifier</SubHeadline><Pgraph>The next step after extracting the terms related to each class involves the application of fuzzy rules and fuzzy classifier. The system developed for this phase consists of two steps:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">Generation and updating of fuzzy rules</ListItem><ListItem level="1" levelPosition="2" numString="2.">Classification of news/tweet related data</ListItem></OrderedList></Pgraph><Pgraph>In the proposed system, fuzzy rules are generated and updated by a fuzzy model called Eclass1-MIMO, which is a multi-input-multi-output framework based on the rules of the TSK fuzzy system <TextLink reference="26"></TextLink>. In addition to using the TSK fuzzy system, the Eclass1-MIMO model can remove useless potential terms with the help of an “aging” <TextGroup><PlainText>mech</PlainText></TextGroup>an<TextGroup><PlainText>ism.</PlainText></TextGroup> Using this mechanism, the potential terms that have not been recently used to classify any text are removed from the list of keywords.</Pgraph><Pgraph>The rules of the TSK-based fuzzy model are defined as follows:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">Rule<Subscript>i</Subscript> = IF (A<Subscript>1</Subscript> is around Port<Subscript>1</Subscript>) AND …<LineBreak></LineBreak>AND (A<Subscript>n</Subscript> is around Prot<Subscript>n</Subscript>) Then = J<Subscript>i</Subscript> = A<Superscript>-t</Superscript> *Θ</ListItem></OrderedList></Pgraph><Pgraph>Where i is the rule number, n is the number of input variables (or terms) in Rule<Subscript>i</Subscript>, Port<Subscript>i</Subscript> is the value of variables at A<Subscript>i</Subscript> (obtained using tf-idf), Ā is the vector of input features, i.e. Ā =[1,x<Subscript>1</Subscript>,x<Subscript>2</Subscript>,…,x<Subscript>n</Subscript> ], and y<Subscript>i</Subscript> is the resulting output. The normalized output is obtained using the following equation <TextLink reference="2"></TextLink>:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="2" numString="2."><ImgLink imgNo="1" imgType="inlineFigure"/></ListItem></OrderedList></Pgraph><Pgraph>The y<Subscript>i</Subscript> values should sum up to 1: <ImgLink imgNo="2" imgType="inlineFigure"/><LineBreak></LineBreak><LineBreak></LineBreak></Pgraph><Pgraph>The normalized output can be interpreted to find a match with the existing classes. If classes are binary, “1” means the output is a member of the class, and “0” means it is not. If the objective function has more than two classes or multiple inputs and multiple outputs with (n+1)*k members (where k is the number of classes or classifications, and n is the number of terms), then:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="3" numString="3."><ImgLink imgNo="3" imgType="inlineFigure"/></ListItem></OrderedList></Pgraph><Pgraph>The output of the fuzzy rules related to the k<Superscript>th</Superscript> row of this vector is the normalized output for the class:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="4" numString="4."><ImgLink imgNo="4" imgType="inlineFigure"/></ListItem></OrderedList></Pgraph><Pgraph>In this study, the choice of using Eclass1-MIMO in the classification algorithm is made because of the dynamic adaptability of fuzzy rules to the changes in the input data stream.</Pgraph><SubHeadline>Fuzzy rule generation, removal, and updating</SubHeadline><Pgraph>Provided that the aging condition is met, the fuzzy rules for assigning the news or tweet A<Subscript>z</Subscript> to classes or categories C<Subscript>j</Subscript> are updated in the following steps:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">Compute the potential terms of news or tweet A<Subscript>z</Subscript></ListItem><ListItem level="1" levelPosition="2" numString="2.">Update the patterns (list of all existing terms) according to the potential terms of news or tweet A<Subscript>z</Subscript></ListItem><ListItem level="1" levelPosition="3" numString="3.">Insert A<Subscript>z</Subscript> as a new pattern (new pattern of the class C<Subscript>j</Subscript>) if necessary.</ListItem><ListItem level="1" levelPosition="4" numString="4.">Remove duplicate patterns if necessary</ListItem></OrderedList></Pgraph><SubHeadline>Comparison of the proposed method with other algorithms</SubHeadline><Pgraph>For performance evaluation, the proposed method was compared with the conventional algorithms listed below. This comparison was made in terms of accuracy, misclassification, Kappa statistic, and absolute error.</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">Naïve Bayes algorithm: This is a simple classifier based on the Bayes theory, which has no configurable parameters <TextLink reference="35"></TextLink>.</ListItem><ListItem level="1" levelPosition="2" numString="2.">Bayesian network algorithm: Bayesian networks (BNs) are a family of probabilistic graphic models (GMs) developed by the combination of graph theory, probability theory, and statistics. In this algorithm, each vertex of the graph represents a random variable and the edges between vertices represent the probabilistic dependence between the corresponding random variables <TextLink reference="36"></TextLink>.</ListItem><ListItem level="1" levelPosition="3" numString="3.">Deep learning: Deep learning is a multi-layer feed-forward artificial neural network that is trained by a stochastic gradient descent scheme using back-propagation <TextLink reference="37"></TextLink>.</ListItem><ListItem level="1" levelPosition="4" numString="4.">K-nearest neighbor’s algorithm (KNN): In this algorithm, the parameter K is the number of closest training examples or the number of nearest neighbors in the feature space. After receiving K as an input, this puts the K nearest neighbors of an object to the same class. In this algorithm, distance is measured based on a distance criterion like Euclidian distance <TextLink reference="38"></TextLink>.</ListItem><ListItem level="1" levelPosition="5" numString="5.">Learning Vector Quantization (LVQ) neural network: LVQ neural network is an artificial neural network based on local competitive learning. In this network, neurons are called codebooks or prototypes <TextLink reference="39"></TextLink>.</ListItem><ListItem level="1" levelPosition="6" numString="6.">Support Vector Machine (SVM): SVM algorithms are supervised/unsupervised learning models developed for classification and regression analysis <TextLink reference="40"></TextLink>. </ListItem></OrderedList></Pgraph><Pgraph>The comparison between the classification methods was performed by the use two of metrics, accuracy and confusion matrix, which represent, respectively, the degree and extent of text classification precision.</Pgraph></TextBlock> <TextBlock linked="yes" name="Results"> <MainHeadline>Results</MainHeadline><Pgraph>The collected database consisted of 10,000 news items and tweets (selected), which, after data cleanup and integration, yielded 1,100 keywords in 9 classes. After applying a pruning technique (in the 10%–30% range) to remove the terms with low tf-idf index, the results provided in Table 1 <ImgLink imgNo="1" imgType="table"/> were obtained. It should be noted that to improve the accuracy and speed of the process in real-time applications, pruning was performed with p<20%.</Pgraph><Pgraph>Figure 2 <ImgLink imgNo="2" imgType="figure"/> shows the accuracy and confusion matrix of the proposed algorithm, named Fuzzy Algorithm for Extraction, Monitoring and Classification of infectious Diseases or FAEMC-ID. While using FAEMC-ID, the highest and lowest precisions or recall ratios were obtained for cholera and plague. </Pgraph><Pgraph>As shown in Figure 2 <ImgLink imgNo="2" imgType="figure"/>, unlike in other works <TextLink reference="1"></TextLink>, the precision of the method increases with the sampling volume. This reflects the applicability of the method to large-scale databases and hence in real-world applications.</Pgraph><Pgraph>In Table 2 <ImgLink imgNo="2" imgType="table"/>, FAEMC-ID is compared with the conventional algorithms commonly used previous works. This comparison is in terms of accuracy, misclassification, Kappa coefficient, and absolute error. As can be seen, the proposed method exhibits a higher accuracy in the classification of the test data. In addition, the high correlation of data in each class is reflected in the obtained Kappa coefficient.</Pgraph><Pgraph>With an automatic system for extraction of news and comments, one can rapidly build a large database of disease-related events. With the provided visualization process, it is also possible to track the geographical <TextGroup><PlainText>loc</PlainText></TextGroup>at<TextGroup><PlainText>ion</PlainText></TextGroup> of the sources of news or comments.</Pgraph><Pgraph>Figure 3 <ImgLink imgNo="3" imgType="figure"/> shows the results obtained by monitoring measles-related news in a continuous 183-hour period from 01/03/2019 (01:00 am) to 08/03/2019 <TextGroup><PlainText>(12:00 pm),</PlainText></TextGroup> which are related to 2,870 tweets from 2,556 users. The number of tweets posted from each region range from 1 to 47, with the highest number (47 tweets) from Canada. The origins of most measles-related news were in the Americas and Europe, and they were mostly from the United States and Canada. </Pgraph><Pgraph>This is consistent with the map illustrated in Figure 4 <ImgLink imgNo="4" imgType="figure"/>, which was obtained from the United States Centers for Disease Control and Prevention (CDC).</Pgraph><Pgraph>Figure 5 <ImgLink imgNo="5" imgType="figure"/> displays the map of Ebola-related news and tweets obtained using the proposed method, and <TextGroup><PlainText>Figure 6 </PlainText></TextGroup><ImgLink imgNo="6" imgType="figure"/> shows the map of Ebola epidemics according to the WHO. As can be seen, there is a high degree of <TextGroup><PlainText>consi</PlainText></TextGroup>st<TextGroup><PlainText>ency</PlainText></TextGroup> between these maps. The map of HIV/AIDS-related news and tweets for the study period is shown in Figure 7 <ImgLink imgNo="7" imgType="figure"/>.</Pgraph></TextBlock> <TextBlock linked="yes" name="Discussion"> <MainHeadline>Discussion</MainHeadline><Pgraph>In this study, we developed a new method based on the evolving fuzzy algorithm of TSK type for the extraction, monitoring, storage and visualization of news and tweets about various infectious diseases. To implement the method, more than 10,000 tweets and news were cleaned, integrated and classified with the help of the Eclass1-MIMO method, then visualized on the world map in real-time. </Pgraph><Pgraph>In recent years, many researchers have worked on classification, clustering, sentiment analysis, opinion mining and development of recommenders based on social data, but most of these works have focused either on news websites or Twitter <TextLink reference="1"></TextLink>, <TextLink reference="41"></TextLink>, <TextLink reference="42"></TextLink>, <TextLink reference="43"></TextLink>.</Pgraph><Pgraph>The findings of the present study are consistent with those of Angelov PP and Zhou X <TextLink reference="26"></TextLink> and Bhattacharyya et al. <TextLink reference="25"></TextLink>, who reported the high efficacy of evolving fuzzy algorithms in real-time applications in terms of ensuring satisfactory precision, speed, and flexibility.</Pgraph><Pgraph>In the study by Iglesias et al., an evolving fuzzy algorithm with the Eclass1-MIMO method was used to classify the public news into 6 categories of science, health, technology, sports, arts, and commerce <TextLink reference="1"></TextLink>. But unlike this model, in the proposed method, increasing the data size not only does not reduce the accuracy but actually improves it. Another advantage of the proposed method over similar works <TextLink reference="44"></TextLink>, <TextLink reference="45"></TextLink>, <TextLink reference="46"></TextLink>, <TextLink reference="47"></TextLink> is the ability to visualize the results for improved monitoring and tracking of epidemics.</Pgraph><Pgraph>Also, the geographic origins of tweets posted about measles and Ebola were found to be consistent with official CDC and WHO reports about their incidence during the studied period. This reflects the efficacy of the proposed method in monitoring and tracking the targeted diseases.</Pgraph><Pgraph>The evolving fuzzy method has also been used by Del Jesus <TextLink reference="48"></TextLink> to enhance low-grade classification algorithms, by Lughofer <TextLink reference="49"></TextLink> to solve the problems of online multiclass classification, and Lughofer <TextLink reference="50"></TextLink> for online incremental feature dimension reduction. Our findings about the use of evolving fuzzy method are in agreement with the results of these studies in terms of high accuracy, high correlation of data in each class (kappa coefficient), and efficacy in online multiclass data analysis.</Pgraph><SubHeadline>Study limitation</SubHeadline><Pgraph>A limitation of the suggested method is that it cannot be used to monitor and track infectious diseases in areas with poor or no access to social networks such as Twitter and Facebook, and this includes poor countries, where morbidity and mortality due to infectious diseases are noticeably higher.</Pgraph></TextBlock> <TextBlock linked="yes" name="Conclusions"> <MainHeadline>Conclusions</MainHeadline><Pgraph>This paper presented a method for extraction, monitoring, storage, and visualization of data related to certain infectious diseases through news mining and tweet crawling. The proposed framework consists of four phases, including data collection with a code written in the R-programming language, text cleanup, classification with the evolving fuzzy model Eclass1-MIMO, and visualization. The fuzzy classification component was developed based on fuzzy TSK rules and evolving fuzzy model, and hence is able to update its vocabulary and remain efficient and accurate upon encountering new terms. Moreover, unlike previous methods, the proposed method exhibits satisfactory flexibility regarding the size of input data and can handle large datasets without a decline in classification accuracy. Other notable features of this method include the simultaneous extraction of news from tweets and websites, the real-time classification capability, data storage in one database, and visualization of data in real-time. The analysis of this proposed method in comparison with other algorithms in the literature showed its high accuracy (88.41%) and the high correlation of data within each class. The proposed algorithm can also be used in the development of more effective monitoring and tracking systems for other human and even animal health hazards.</Pgraph></TextBlock> <TextBlock linked="yes" name="Notes"> <MainHeadline>Notes</MainHeadline><SubHeadline>Acknowledgments</SubHeadline><Pgraph>We would like to express our gratitude to Dr. Antonio <TextGroup><PlainText>Iglesias</PlainText></TextGroup> at the University of Madrid for the helpful comments, the instructors of the online course “Machine Learning for Data Science and Analytics” provided by Columbia University for giving us better insight into the area of data and text mining, and also the members of the Iran Data Mining Group, who patiently answered our questions.</Pgraph><SubHeadline>Competing interests</SubHeadline><Pgraph>The authors declare that they have no competing interests.</Pgraph></TextBlock> <References linked="yes"> <Reference refNo="1"> <RefAuthor>Iglesias JA</RefAuthor> <RefAuthor>Tiemblo A</RefAuthor> <RefAuthor>Ledezma A</RefAuthor> <RefAuthor>Sanchis A</RefAuthor> <RefTitle>Web news mining in an evolving framework</RefTitle> <RefYear>2016</RefYear> <RefJournal>Information Fusion</RefJournal> <RefPage>90-8</RefPage> <RefTotal>Iglesias JA, Tiemblo A, Ledezma A, Sanchis A. Web news mining in an evolving framework. Information Fusion. 2016;28:90-8. DOI: 10.1016/j.inffus.2015.07.004 </RefTotal> <RefLink>http://dx.doi.org/10.1016/j.inffus.2015.07.004</RefLink> </Reference> <Reference refNo="2"> <RefAuthor>Ravi K</RefAuthor> <RefAuthor>Ravi V</RefAuthor> <RefTitle>A survey on opinion mining and sentiment analysis: tasks, approaches and applications</RefTitle> <RefYear>2015</RefYear> <RefJournal>Knowledge-Based Systems</RefJournal> <RefPage>14-46</RefPage> <RefTotal>Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems. 2015;89:14-46. DOI: 10.1016/j.knosys.2015.06.015</RefTotal> <RefLink>http://dx.doi.org/10.1016/j.knosys.2015.06.015</RefLink> </Reference> <Reference refNo="3"> <RefAuthor>Guellil I</RefAuthor> <RefAuthor>Boukhalfa K</RefAuthor> <RefTitle>Social big data mining: A survey focused on opinion mining and sentiments analysis</RefTitle> <RefYear>2015</RefYear> <RefBookTitle>12th International Symposium on Programming and Systems (ISPS); 2015 Apr 28-30; Algiers, Algeria</RefBookTitle> <RefPage></RefPage> <RefTotal>Guellil I, Boukhalfa K. Social big data mining: A survey focused on opinion mining and sentiments analysis. In: 12th International Symposium on Programming and Systems (ISPS); 2015 Apr 28-30; Algiers, Algeria. IEEE; 2015. DOI: 10.1109/ISPS.2015.7244976</RefTotal> <RefLink>https://doi.org/10.1109/ISPS.2015.7244976</RefLink> </Reference> <Reference refNo="4"> <RefAuthor>Mukkamala RR</RefAuthor> <RefAuthor>Hussain A</RefAuthor> <RefAuthor>Vatrapu R</RefAuthor> <RefTitle>Fuzzy-set based sentiment analysis of big social data</RefTitle> <RefYear>2014</RefYear> <RefBookTitle>18th International Enterprise Distributed Object Computing Conference (EDOC); 2014 1-5 Sept; Ulm, Germany</RefBookTitle> <RefPage>DOI: 10.1109/EDOC.2014.19</RefPage> <RefTotal>Mukkamala RR, Hussain A, Vatrapu R. Fuzzy-set based sentiment analysis of big social data. In: 18th International Enterprise Distributed Object Computing Conference (EDOC); 2014 1-5 Sept. 2014; Ulm, Germany. IEEE; 2014. DOI: 10.1109/EDOC.2014.19</RefTotal> <RefLink>https://doi.org/10.1109/EDOC.2014.19</RefLink> </Reference> <Reference refNo="5"> <RefAuthor>Evans DK</RefAuthor> <RefAuthor>Klavans JL</RefAuthor> <RefAuthor>McKeown KR</RefAuthor> <RefTitle>Columbia newsblaster: Multilingual news summarization on the web</RefTitle> <RefYear>2004</RefYear> <RefBookTitle>HLT-NAACL Demonstrations '04; 2004 May 2-7; Boston, USA</RefBookTitle> <RefPage></RefPage> <RefTotal>Evans DK, Klavans JL, McKeown KR. Columbia newsblaster: Multilingual news summarization on the web. In: HLT-NAACL Demonstrations '04; 2004 May 2-7; Boston, USA. Stroudsburg, PA: Association for Computational Linguistics; 2004. DOI: 10.3115/1614025.1614026</RefTotal> <RefLink>https://doi.org/10.3115/1614025.1614026</RefLink> </Reference> <Reference refNo="6"> <RefAuthor>Tan AH</RefAuthor> <RefTitle>Text mining: The state of the art and the challenges</RefTitle> <RefYear>1999</RefYear> <RefBookTitle>Methodologies for Knowledge Discovery and Data Mining</RefBookTitle> <RefPage>65-70</RefPage> <RefTotal>Tan AH. Text mining: The state of the art and the challenges. In: Zhong N, Zhou L, editors. Methodologies for Knowledge Discovery and Data Mining. Third Pacific-Asia Conference PAKDD'99; 1999 Apr 26-28; Beijing, China. Springer; 1999. p. 65-70.</RefTotal> </Reference> <Reference refNo="7"> <RefAuthor>McCaig D</RefAuthor> <RefAuthor>Bhatia S</RefAuthor> <RefAuthor>Elliott MT</RefAuthor> <RefAuthor>Walasek L</RefAuthor> <RefAuthor>Meyer C</RefAuthor> <RefTitle>Text-mining as a methodology to assess eating disorder-relevant factors: Comparing mentions of fitness tracking technology across online communities</RefTitle> <RefYear>2018</RefYear> <RefJournal>Int J Eat Disord</RefJournal> <RefPage>647-55</RefPage> <RefTotal>McCaig D, Bhatia S, Elliott MT, Walasek L, Meyer C. Text-mining as a methodology to assess eating disorder-relevant factors: Comparing mentions of fitness tracking technology across online communities. Int J Eat Disord. 2018 07;51(7):647-55. DOI: 10.1002/eat.22882</RefTotal> <RefLink>https://doi.org/10.1002/eat.22882</RefLink> </Reference> <Reference refNo="8"> <RefAuthor>Russell MA</RefAuthor> <RefTitle></RefTitle> <RefYear>2013</RefYear> <RefBookTitle>Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More</RefBookTitle> <RefPage></RefPage> <RefTotal>Russell MA. Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. Sebastopol, CA: O’Reilly Media Inc; 2013.</RefTotal> </Reference> <Reference refNo="9"> <RefAuthor>Tang J</RefAuthor> <RefAuthor>Chang Y</RefAuthor> <RefAuthor>Liu H</RefAuthor> <RefTitle>Mining social media with social theories: a survey</RefTitle> <RefYear>2014</RefYear> <RefJournal>ACM SIGKDD Explorations Newsletter</RefJournal> <RefPage>20-9</RefPage> <RefTotal>Tang J, Chang Y, Liu H. Mining social media with social theories: a survey. ACM SIGKDD Explorations Newsletter. 2014;15(2):20-9. DOI: 10.1145/2641190.2641195</RefTotal> <RefLink>http://dx.doi.org/10.1145/2641190.2641195</RefLink> </Reference> <Reference refNo="10"> <RefAuthor>Ngoc PT</RefAuthor> <RefAuthor>Yoo M</RefAuthor> <RefTitle>The lexicon-based sentiment analysis for fan page ranking in Facebook</RefTitle> <RefYear></RefYear> <RefBookTitle>International Conference on Information Networking 2014 (ICOIN 2014); 2014 Feb 10-12; Phuket, Thailand</RefBookTitle> <RefPage></RefPage> <RefTotal>Ngoc PT, Yoo M. The lexicon-based sentiment analysis for fan page ranking in Facebook. In: International Conference on Information Networking 2014 (ICOIN 2014); 2014 Feb 10-12; Phuket, Thailand.</RefTotal> </Reference> <Reference refNo="11"> <RefAuthor>Ueda M</RefAuthor> <RefAuthor>Mori K</RefAuthor> <RefAuthor>Matsubayashi T</RefAuthor> <RefAuthor>Sawada Y</RefAuthor> <RefTitle>Tweeting celebrity suicides: Users’ reaction to prominent suicide deaths on Twitter and subsequent increases in actual suicides</RefTitle> <RefYear>2017</RefYear> <RefJournal>Soc Sci Med</RefJournal> <RefPage>158-166</RefPage> <RefTotal>Ueda M, Mori K, Matsubayashi T, Sawada Y. Tweeting celebrity suicides: Users’ reaction to prominent suicide deaths on Twitter and subsequent increases in actual suicides. Soc Sci Med. 2017 Sep;189:158-166. DOI: 10.1016/j.socscimed.2017.06.032</RefTotal> <RefLink>https://doi.org/10.1016/j.socscimed.2017.06.032</RefLink> </Reference> <Reference refNo="12"> <RefAuthor>O’Dea B</RefAuthor> <RefAuthor>Wan S</RefAuthor> <RefAuthor>Batterham PJ</RefAuthor> <RefAuthor>Calear AL</RefAuthor> <RefAuthor>Paris C</RefAuthor> <RefAuthor>Christensen HJII</RefAuthor> <RefTitle>Detecting suicidality on Twitter</RefTitle> <RefYear>2015</RefYear> <RefJournal>Int Intervent</RefJournal> <RefPage>183-8</RefPage> <RefTotal>O’Dea B, Wan S, Batterham PJ, Calear AL, Paris C, Christensen HJII. Detecting suicidality on Twitter. Int Intervent. 2015;2(2):183-8. DOI: 10.1016/j.invent.2015.03.005</RefTotal> <RefLink>https://doi.org/10.1016/j.invent.2015.03.005</RefLink> </Reference> <Reference refNo="13"> <RefAuthor>Colombo GB</RefAuthor> <RefAuthor>Burnap P</RefAuthor> <RefAuthor>Hodorog A</RefAuthor> <RefAuthor>Scourfield J</RefAuthor> <RefTitle>Analysing the connectivity and communication of suicidal users on twitter</RefTitle> <RefYear>2016</RefYear> <RefJournal>Comput Commun</RefJournal> <RefPage>291-300</RefPage> <RefTotal>Colombo GB, Burnap P, Hodorog A, Scourfield J. Analysing the connectivity and communication of suicidal users on twitter. Comput Commun. 2016 Jan;73(Pt B):291-300. DOI: 10.1016/j.comcom.2015.07.018</RefTotal> <RefLink>http://dx.doi.org/10.1016/j.comcom.2015.07.018</RefLink> </Reference> <Reference refNo="14"> <RefAuthor>Masys DR</RefAuthor> <RefTitle>Linking microarray data to the literature</RefTitle> <RefYear>2001</RefYear> <RefJournal>Nat Genet</RefJournal> <RefPage>9-10</RefPage> <RefTotal>Masys DR. Linking microarray data to the literature. Nat Genet. 2001 May;28(1):9-10. DOI: 10.1038/88324 </RefTotal> <RefLink>http://dx.doi.org/10.1038/88324</RefLink> </Reference> <Reference refNo="15"> <RefAuthor>Srivastava AN</RefAuthor> <RefAuthor>Sahami M</RefAuthor> <RefTitle></RefTitle> <RefYear>2009</RefYear> <RefBookTitle>Text mining: Classification, clustering, and applications</RefBookTitle> <RefPage></RefPage> <RefTotal>Srivastava AN, Sahami M, editors. Text mining: Classification, clustering, and applications. New York: Chapman and Hall/CRC; 2009. DOI: 10.1201/9781420059458</RefTotal> <RefLink>http://dx.doi.org/10.1201/9781420059458</RefLink> </Reference> <Reference refNo="16"> <RefAuthor>Coussement K</RefAuthor> <RefAuthor>Van den Poel D</RefAuthor> <RefTitle>Integrating the voice of customers through call center emails into a decision support system for churn prediction</RefTitle> <RefYear>2008</RefYear> <RefJournal>Inform Manag</RefJournal> <RefPage>164-74</RefPage> <RefTotal>Coussement K, Van den Poel D. Integrating the voice of customers through call center emails into a decision support system for churn prediction. Inform Manag. 2008;45(3):164-74. DOI: 10.1016/j.im.2008.01.005</RefTotal> <RefLink>http://dx.doi.org/10.1016/j.im.2008.01.005</RefLink> </Reference> <Reference refNo="17"> <RefAuthor>Nassirtoussi AK</RefAuthor> <RefAuthor>Aghabozorgi S</RefAuthor> <RefAuthor>Wah TY</RefAuthor> <RefAuthor>Ngo DCL</RefAuthor> <RefTitle>Text mining for market prediction: A systematic review</RefTitle> <RefYear>2014</RefYear> <RefJournal>Expert Syst Applications</RefJournal> <RefPage>7653-70</RefPage> <RefTotal>Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL. Text mining for market prediction: A systematic review. Expert Syst Applications. 2014;41(16):7653-70. DOI: 10.1016/j.eswa.2014.06.009</RefTotal> <RefLink>http://dx.doi.org/10.1016/j.eswa.2014.06.009</RefLink> </Reference> <Reference refNo="18"> <RefAuthor>Gálvez RH</RefAuthor> <RefAuthor>Gravano A</RefAuthor> <RefTitle>Assessing the usefulness of online message board mining in automatic stock prediction systems</RefTitle> <RefYear>2017</RefYear> <RefJournal>J Comput Sci</RefJournal> <RefPage>43-56</RefPage> <RefTotal>Gálvez RH, Gravano A. Assessing the usefulness of online message board mining in automatic stock prediction systems. J Comput Sci. 2017;19:43-56. DOI: 10.1016/j.jocs.2017.01.001</RefTotal> <RefLink>https://doi.org/10.1016/j.jocs.2017.01.001</RefLink> </Reference> <Reference refNo="19"> <RefAuthor>Valitutti A</RefAuthor> <RefAuthor>Strapparava C</RefAuthor> <RefAuthor>Stock O</RefAuthor> <RefTitle>Developing affective lexical resources</RefTitle> <RefYear>2004</RefYear> <RefJournal>Psych J</RefJournal> <RefPage>61-83</RefPage> <RefTotal>Valitutti A, Strapparava C, Stock O. Developing affective lexical resources. Psych J. 2004;2(1):61-83.</RefTotal> </Reference> <Reference refNo="20"> <RefAuthor>Zhang Z</RefAuthor> <RefAuthor>Li X</RefAuthor> <RefAuthor>Chen Y</RefAuthor> <RefTitle>Deciphering word-of-mouth in social media: Text-based metrics of consumer reviews</RefTitle> <RefYear>2012</RefYear> <RefJournal>ACM Transactions on Management Information Systems (TMIS)</RefJournal> <RefPage>5</RefPage> <RefTotal>Zhang Z, Li X, Chen Y. Deciphering word-of-mouth in social media: Text-based metrics of consumer reviews. ACM Transactions on Management Information Systems (TMIS). 2012;3(1):5. DOI: 10.1145/2151163.2151168</RefTotal> <RefLink>http://dx.doi.org/10.1145/2151163.2151168</RefLink> </Reference> <Reference refNo="21"> <RefAuthor>Irfan R</RefAuthor> <RefAuthor>King CK</RefAuthor> <RefAuthor>Grages D</RefAuthor> <RefAuthor>Ewen S</RefAuthor> <RefAuthor>Khan SU</RefAuthor> <RefAuthor>Madani SA</RefAuthor> <RefTitle>A survey on text mining in social networks</RefTitle> <RefYear>2015</RefYear> <RefJournal>Knowl Eng Rev</RefJournal> <RefPage>157-70</RefPage> <RefTotal>Irfan R, King CK, Grages D, Ewen S, Khan SU, Madani SA. A survey on text mining in social networks. Knowl Eng Rev. 2015;30(2):157-70. DOI: 10.1017/S0269888914000277</RefTotal> <RefLink>https://doi.org/10.1017/S0269888914000277</RefLink> </Reference> <Reference refNo="22"> <RefAuthor>Mani S</RefAuthor> <RefTitle>UGT1A1 polymorphism predicts irinotecan toxicity: evolving proof</RefTitle> <RefYear>2001</RefYear> <RefJournal>AAPS PharmSci</RefJournal> <RefPage>2</RefPage> <RefTotal>Mani S. UGT1A1 polymorphism predicts irinotecan toxicity: evolving proof. AAPS PharmSci. 2001;3(3):2.</RefTotal> </Reference> <Reference refNo="23"> <RefAuthor>Litvak M</RefAuthor> <RefAuthor>Last M</RefAuthor> <RefAuthor>Friedman M</RefAuthor> <RefTitle>A new approach to improving multilingual summarization using a genetic algorithm</RefTitle> <RefYear>2010</RefYear> <RefBookTitle>Proceedings of the 48th annual meeting of the association for computational linguistics; 2010 Jul 11-16; Uppsala, Sweden</RefBookTitle> <RefPage></RefPage> <RefTotal>Litvak M, Last M, Friedman M. A new approach to improving multilingual summarization using a genetic algorithm. In: Proceedings of the 48th annual meeting of the association for computational linguistics; 2010 Jul 11-16; Uppsala, Sweden. Stroudsburg, PA: Association for Computational Linguistic; 2010.</RefTotal> </Reference> <Reference refNo="24"> <RefAuthor>Riza L</RefAuthor> <RefAuthor>Bergmeir C</RefAuthor> <RefAuthor>Herrera F</RefAuthor> <RefAuthor>Benítez J</RefAuthor> <RefTitle>frbs: Fuzzy Rule-Based Systems for Classification and Regression in R</RefTitle> <RefYear>2015</RefYear> <RefJournal>J Stat Softw</RefJournal> <RefPage>1-30</RefPage> <RefTotal>Riza L, Bergmeir C, Herrera F, Benítez J. frbs: Fuzzy Rule-Based Systems for Classification and Regression in R. J Stat Softw. 2015;65(6):1-30. DOI: 10.18637/jss.v065.i06</RefTotal> <RefLink>https://doi.org/10.18637/jss.v065.i06</RefLink> </Reference> <Reference refNo="25"> <RefAuthor>Bhattacharyya S</RefAuthor> <RefAuthor>Basu D</RefAuthor> <RefAuthor>Konar A</RefAuthor> <RefAuthor>Tibarewala D</RefAuthor> <RefTitle>Interval type-2 fuzzy logic based multiclass ANFIS algorithm for real-time EEG based movement control of a robot arm</RefTitle> <RefYear>2015</RefYear> <RefJournal>Rob Auton Syst</RefJournal> <RefPage>104-15</RefPage> <RefTotal>Bhattacharyya S, Basu D, Konar A, Tibarewala D. Interval type-2 fuzzy logic based multiclass ANFIS algorithm for real-time EEG based movement control of a robot arm. Rob Auton Syst. 2015;68:104-15. DOI: 10.1016/j.robot.2015.01.007</RefTotal> <RefLink>https://doi.org/10.1016/j.robot.2015.01.007</RefLink> </Reference> <Reference refNo="26"> <RefAuthor>Angelov PP</RefAuthor> <RefAuthor>Zhou X</RefAuthor> <RefTitle>Evolving fuzzy-rule-based classifiers from data streams</RefTitle> <RefYear>2008</RefYear> <RefJournal>IEEE Trans Fuzzy Syst</RefJournal> <RefPage>1462-75</RefPage> <RefTotal>Angelov PP, Zhou X. Evolving fuzzy-rule-based classifiers from data streams. IEEE Trans Fuzzy Syst. 2008;16(6):1462-75. DOI: 10.1109/TFUZZ.2008.925904</RefTotal> <RefLink>https://doi.org/10.1109/TFUZZ.2008.925904</RefLink> </Reference> <Reference refNo="27"> <RefAuthor>Takagi T</RefAuthor> <RefAuthor>Sugeno M</RefAuthor> <RefTitle>Fuzzy identification of systems and its applications to modeling and control</RefTitle> <RefYear>1985(1)</RefYear> <RefJournal>IEEE Trans Syst Man Cybern</RefJournal> <RefPage>116-32</RefPage> <RefTotal>Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern. 1985(1):116-32. DOI: 10.1109/TSMC.1985.6313399</RefTotal> <RefLink>https://doi.org/10.1109/TSMC.1985.6313399</RefLink> </Reference> <Reference refNo="28"> <RefAuthor>Ishibuchi H</RefAuthor> <RefAuthor>Yamamoto T</RefAuthor> <RefAuthor>Nakashima T</RefAuthor> <RefTitle>Hybridization of fuzzy GBML approaches for pattern classification problems</RefTitle> <RefYear>2005</RefYear> <RefJournal>IEEE Trans Syst Man Cybern B Cybern</RefJournal> <RefPage>359-65</RefPage> <RefTotal>Ishibuchi H, Yamamoto T, Nakashima T. Hybridization of fuzzy GBML approaches for pattern classification problems. IEEE Trans Syst Man Cybern B Cybern. 2005;35(2):359-65. DOI: 10.1109/TSMCB.2004.842257</RefTotal> <RefLink>https://doi.org/10.1109/TSMCB.2004.842257</RefLink> </Reference> <Reference refNo="29"> <RefAuthor>Boyacioglu MA</RefAuthor> <RefAuthor>Avci D</RefAuthor> <RefTitle>An adaptive network-based fuzzy inference system (ANFIS) for the prediction of stock market return: the case of the Istanbul stock exchange</RefTitle> <RefYear>2010</RefYear> <RefJournal>Expert Syst Appl</RefJournal> <RefPage>7908-12</RefPage> <RefTotal>Boyacioglu MA, Avci D. An adaptive network-based fuzzy inference system (ANFIS) for the prediction of stock market return: the case of the Istanbul stock exchange. Expert Syst Appl. 2010;37(12):7908-12. DOI: 10.1016/j.eswa.2010.04.045</RefTotal> <RefLink>https://doi.org/10.1016/j.eswa.2010.04.045</RefLink> </Reference> <Reference refNo="30"> <RefAuthor>Bai Y</RefAuthor> <RefAuthor>Zhuang H</RefAuthor> <RefAuthor>Roth ZS</RefAuthor> <RefTitle>Fuzzy logic control to suppress noises and coupling effects in a laser tracking system</RefTitle> <RefYear>2005</RefYear> <RefJournal>IEEE Trans Control Syst Technol</RefJournal> <RefPage>113-21</RefPage> <RefTotal>Bai Y, Zhuang H, Roth ZS. Fuzzy logic control to suppress noises and coupling effects in a laser tracking system. IEEE Trans Control Syst Technol. 2005;13(1):113-21. DOI: 10.1109/TCST.2004.833653</RefTotal> <RefLink>https://doi.org/10.1109/TCST.2004.833653</RefLink> </Reference> <Reference refNo="31"> <RefAuthor>Basari ASH</RefAuthor> <RefAuthor>Hussin B</RefAuthor> <RefAuthor>Ananta IGP</RefAuthor> <RefAuthor>Zeniarja J</RefAuthor> <RefTitle>Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization</RefTitle> <RefYear>2013</RefYear> <RefJournal>Procedia Eng</RefJournal> <RefPage>453-62</RefPage> <RefTotal>Basari ASH, Hussin B, Ananta IGP, Zeniarja J. Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Procedia Eng. 2013;53:453-62. DOI: 10.1016/j.proeng.2013.02.059</RefTotal> <RefLink>https://doi.org/10.1016/j.proeng.2013.02.059</RefLink> </Reference> <Reference refNo="32"> <RefAuthor>Gupta V</RefAuthor> <RefAuthor>Lehal GS</RefAuthor> <RefTitle>A survey of text summarization extractive techniques</RefTitle> <RefYear>2010</RefYear> <RefJournal>J Emerg Technol Innov Res Web Intell</RefJournal> <RefPage>258-68</RefPage> <RefTotal>Gupta V, Lehal GS. A survey of text summarization extractive techniques. J Emerg Technol Innov Res Web Intell. 2010;2(3):258-68. DOI: 10.4304/jetwi.2.3.258-268</RefTotal> <RefLink>https://doi.org/10.4304/jetwi.2.3.258-268</RefLink> </Reference> <Reference refNo="33"> <RefAuthor>Broder AZ</RefAuthor> <RefAuthor>Glassman SC</RefAuthor> <RefAuthor>Manasse MS</RefAuthor> <RefAuthor>Zweig G</RefAuthor> <RefTitle>Syntactic clustering of the web</RefTitle> <RefYear>1997</RefYear> <RefJournal>Computer Networks and ISDN Systems</RefJournal> <RefPage>1157-66</RefPage> <RefTotal>Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the web. Computer Networks and ISDN Systems. 1997;29(8):1157-66. DOI: 10.1016/S0169-7552(97)00031-7</RefTotal> <RefLink>http://dx.doi.org/10.1016/S0169-7552(97)00031-7</RefLink> </Reference> <Reference refNo="34"> <RefAuthor>Cavnar WB</RefAuthor> <RefAuthor>Trenkle JM</RefAuthor> <RefTitle>N-gram-based text categorization</RefTitle> <RefYear>1994</RefYear> <RefJournal>Ann arbor mi</RefJournal> <RefPage>161-75</RefPage> <RefTotal>Cavnar WB, Trenkle JM. N-gram-based text categorization. Ann arbor mi. 1994;48113(2):161-75.</RefTotal> </Reference> <Reference refNo="35"> <RefAuthor>Angelov P</RefAuthor> <RefAuthor>Filev D</RefAuthor> <RefTitle>Simpl_eTS: A simplified method for learning evolving Takagi-Sugeno fuzzy models</RefTitle> <RefYear>2015</RefYear> <RefBookTitle>The 14th IEEE International Conference on Fuzzy Systems FUZZ'05; 2005 May 25; Reno, USA</RefBookTitle> <RefPage></RefPage> <RefTotal>Angelov P, Filev D. Simpl_eTS: A simplified method for learning evolving Takagi-Sugeno fuzzy models. In: The 14th IEEE International Conference on Fuzzy Systems FUZZ'05; 2005 May 25; Reno, USA. IEEE; 2015. DOI: 10.1109/FUZZY.2005.1452543</RefTotal> <RefLink>https://doi.org/10.1109/FUZZY.2005.1452543</RefLink> </Reference> <Reference refNo="36"> <RefAuthor>Friedman N</RefAuthor> <RefAuthor>Geiger D</RefAuthor> <RefAuthor>Goldszmidt M</RefAuthor> <RefTitle>Bayesian network classifiers</RefTitle> <RefYear>1997</RefYear> <RefJournal>Mach Learn</RefJournal> <RefPage>131-63</RefPage> <RefTotal>Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Mach Learn. 1997;29(2-3):131-63. DOI: 10.1023/A:1007465528199</RefTotal> <RefLink>https://doi.org/10.1023/A:1007465528199</RefLink> </Reference> <Reference refNo="37"> <RefAuthor>LeCun Y</RefAuthor> <RefAuthor>Bengio Y</RefAuthor> <RefAuthor>Hinton G</RefAuthor> <RefTitle>Deep learning</RefTitle> <RefYear>2015</RefYear> <RefJournal>Nature</RefJournal> <RefPage>436-44</RefPage> <RefTotal>LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015 May;521(7553):436-44. DOI: 10.1038/nature14539</RefTotal> <RefLink>http://dx.doi.org/10.1038/nature14539</RefLink> </Reference> <Reference refNo="38"> <RefAuthor>Cover T</RefAuthor> <RefAuthor>Hart P</RefAuthor> <RefTitle>Nearest neighbor pattern classification</RefTitle> <RefYear>1967</RefYear> <RefJournal>IEEE Trans Inf Theory</RefJournal> <RefPage>21-7</RefPage> <RefTotal>Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21-7. DOI: 10.1109/TIT.1967.1053964</RefTotal> <RefLink>https://doi.org/10.1109/TIT.1967.1053964</RefLink> </Reference> <Reference refNo="39"> <RefAuthor>Kohonen T</RefAuthor> <RefTitle>Learning vector quantization</RefTitle> <RefYear>1995</RefYear> <RefBookTitle>Self-Organizing Maps</RefBookTitle> <RefPage>175-89</RefPage> <RefTotal>Kohonen T. Learning vector quantization. In: Kohonen T, editor. Self-Organizing Maps. Berlin, Heidelberg: Springer; 1995. (SSINFL; Vol. 30). p. 175-89. DOI: 10.1007/978-3-642-97610-0_6</RefTotal> <RefLink>https://doi.org/10.1007/978-3-642-97610-0_6</RefLink> </Reference> <Reference refNo="40"> <RefAuthor>Hearst MA</RefAuthor> <RefAuthor>Dumais ST</RefAuthor> <RefAuthor>Osuna E</RefAuthor> <RefAuthor>Platt J</RefAuthor> <RefAuthor>Scholkopf B</RefAuthor> <RefTitle>Support vector machines</RefTitle> <RefYear>1998</RefYear> <RefJournal>IEEE Intell Syst</RefJournal> <RefPage>18-28</RefPage> <RefTotal>Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst. 1998;13(4):18-28. DOI: 10.1109/5254.708428</RefTotal> <RefLink>https://doi.org/10.1109/5254.708428</RefLink> </Reference> <Reference refNo="41"> <RefAuthor>Amato F</RefAuthor> <RefAuthor>Moscato V</RefAuthor> <RefAuthor>Picariello A</RefAuthor> <RefAuthor>Piccialli F</RefAuthor> <RefTitle>SOS: A multimedia recommender System for Online Social networks</RefTitle> <RefYear>2019</RefYear> <RefJournal>Future Gener Comput Syst</RefJournal> <RefPage>914-923</RefPage> <RefTotal>Amato F, Moscato V, Picariello A, Piccialli F. SOS: A multimedia recommender System for Online Social networks. Future Gener Comput Syst. 2019;93: 914-923. DOI: 10.1016/j.future.2017.04.028</RefTotal> <RefLink>https://doi.org/10.1016/j.future.2017.04.028</RefLink> </Reference> <Reference refNo="42"> <RefAuthor>Li J</RefAuthor> <RefAuthor>Li X</RefAuthor> <RefAuthor>Zhu B</RefAuthor> <RefTitle>User opinion classification in social media: A global consistency maximization approach</RefTitle> <RefYear>2016</RefYear> <RefJournal>Inform Manag</RefJournal> <RefPage>987-96</RefPage> <RefTotal>Li J, Li X, Zhu B. User opinion classification in social media: A global consistency maximization approach. Inform Manag. 2016;53(8):987-96. DOI: 10.1016/j.im.2016.06.004</RefTotal> <RefLink>http://dx.doi.org/10.1016/j.im.2016.06.004</RefLink> </Reference> <Reference refNo="43"> <RefAuthor>Mathioudakis M</RefAuthor> <RefAuthor>Koudas N</RefAuthor> <RefTitle>Twittermonitor: trend detection over the twitter stream</RefTitle> <RefYear>2010</RefYear> <RefBookTitle>Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data; 2010 Jun 6-10; Indianapolis, USA</RefBookTitle> <RefPage></RefPage> <RefTotal>Mathioudakis M, Koudas N. Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data; 2010 Jun 6-10; Indianapolis, USA. New York: ACM; 2010. DOI: 10.1145/1807167.1807306</RefTotal> <RefLink>https://doi.org/10.1145/1807167.1807306</RefLink> </Reference> <Reference refNo="44"> <RefAuthor>Al-Surimi K</RefAuthor> <RefAuthor>Khalifa M</RefAuthor> <RefAuthor>Bahkali S</RefAuthor> <RefAuthor>El-Metwally A</RefAuthor> <RefAuthor>Househ M</RefAuthor> <RefTitle></RefTitle> <RefYear>2016</RefYear> <RefBookTitle>The potential of social media and internet-based data in preventing and fighting infectious diseases: from internet to twitter</RefBookTitle> <RefPage>131-9</RefPage> <RefTotal>Al-Surimi K, Khalifa M, Bahkali S, El-Metwally A, Househ M. The potential of social media and internet-based data in preventing and fighting infectious diseases: from internet to twitter. Basel: Springer; 2016. p. 131-9. DOI: 10.1007/5584_2016_132</RefTotal> <RefLink>http://dx.doi.org/10.1007/5584_2016_132</RefLink> </Reference> <Reference refNo="45"> <RefAuthor>Ku LW</RefAuthor> <RefAuthor>Chen HH</RefAuthor> <RefTitle>Mining opinions from the Web: Beyond relevance retrieval</RefTitle> <RefYear>2007</RefYear> <RefJournal>J Am Soc Information Sci Technol</RefJournal> <RefPage>1838-50</RefPage> <RefTotal>Ku LW, Chen HH. Mining opinions from the Web: Beyond relevance retrieval. J Am Soc Information Sci Technol. 2007;58(12):1838-50. DOI: 10.1002/asi.20630</RefTotal> <RefLink>http://dx.doi.org/10.1002/asi.20630</RefLink> </Reference> <Reference refNo="46"> <RefAuthor>Krishnalal G</RefAuthor> <RefAuthor>Rengarajan SB</RefAuthor> <RefAuthor>Srinivasagan K</RefAuthor> <RefTitle>A new text mining approach based on HMM-SVM for web news classification</RefTitle> <RefYear>2010</RefYear> <RefJournal>Int J Comput Appl</RefJournal> <RefPage>98-104</RefPage> <RefTotal>Krishnalal G, Rengarajan SB, Srinivasagan K. A new text mining approach based on HMM-SVM for web news classification. Int J Comput Appl. 2010;1(19):98-104. DOI: 10.5120/395-589</RefTotal> <RefLink>https://doi.org/10.5120/395-589</RefLink> </Reference> <Reference refNo="47"> <RefAuthor>Maghdid HS</RefAuthor> <RefTitle>Web News Mining Using New Features: A Comparative Study</RefTitle> <RefYear>2019</RefYear> <RefJournal>IEEE Access</RefJournal> <RefPage>5626-41</RefPage> <RefTotal>Maghdid HS. Web News Mining Using New Features: A Comparative Study. IEEE Access. 2019;7:5626-41. DOI: 10.1109/ACCESS.2018.2890088</RefTotal> <RefLink>http://dx.doi.org/10.1109/ACCESS.2018.2890088</RefLink> </Reference> <Reference refNo="48"> <RefAuthor>Del Jesus MJ</RefAuthor> <RefAuthor>Hoffmann F</RefAuthor> <RefAuthor>Navascués LJ</RefAuthor> <RefAuthor>Sánchez L</RefAuthor> <RefTitle>Induction of fuzzy-rule-based classifiers with evolutionary boosting algorithms</RefTitle> <RefYear>2004</RefYear> <RefJournal>IEEE Trans Fuzzy Syst</RefJournal> <RefPage>296-308</RefPage> <RefTotal>Del Jesus MJ, Hoffmann F, Navascués LJ, Sánchez L. Induction of fuzzy-rule-based classifiers with evolutionary boosting algorithms. IEEE Trans Fuzzy Syst. 2004;12(3):296-308. DOI: 10.1109/TFUZZ.2004.825972</RefTotal> <RefLink>https://doi.org/10.1109/TFUZZ.2004.825972</RefLink> </Reference> <Reference refNo="49"> <RefAuthor>Lughofer E</RefAuthor> <RefAuthor>Buchtala O</RefAuthor> <RefTitle>Reliable all-pairs evolving fuzzy classifiers</RefTitle> <RefYear>2013</RefYear> <RefJournal>IEEE Trans Fuzzy Syst</RefJournal> <RefPage>625-41</RefPage> <RefTotal>Lughofer E, Buchtala O. Reliable all-pairs evolving fuzzy classifiers. IEEE Trans Fuzzy Syst. 2013;21(4):625-41. DOI: 10.1109/TFUZZ.2012.2226892</RefTotal> <RefLink>https://doi.org/10.1109/TFUZZ.2012.2226892</RefLink> </Reference> <Reference refNo="50"> <RefAuthor>Lughofer E</RefAuthor> <RefTitle>On-line incremental feature weighting in evolving fuzzy classifiers</RefTitle> <RefYear>2011</RefYear> <RefJournal>Fuzzy Sets Syst</RefJournal> <RefPage>1-23</RefPage> <RefTotal>Lughofer E. On-line incremental feature weighting in evolving fuzzy classifiers. Fuzzy Sets Syst. 2011;163(1):1-23. DOI: 10.1016/j.fss.2010.08.012</RefTotal> <RefLink>https://doi.org/10.1016/j.fss.2010.08.012</RefLink> </Reference> <Reference refNo="51"> <RefAuthor>Centers for Disease Control and Prevention</RefAuthor> <RefTitle></RefTitle> <RefYear></RefYear> <RefBookTitle>Measles Cases and Outbreaks 2019</RefBookTitle> <RefPage></RefPage> <RefTotal>Centers for Disease Control and Prevention. Measles Cases and Outbreaks 2019. [updated 2019 Jan 16; cited 2019 Mar 9]. Available from: https://www.cdc.gov/measles/cases-outbreaks.html</RefTotal> <RefLink>https://www.cdc.gov/measles/cases-outbreaks.html</RefLink> </Reference> <Reference refNo="52"> <RefAuthor>WHO</RefAuthor> <RefTitle></RefTitle> <RefYear>2015</RefYear> <RefBookTitle>Ebola outbreak 2014</RefBookTitle> <RefPage></RefPage> <RefTotal>WHO. Ebola outbreak 2014. [updated 2015 Jul 23; cited 2019 Mar 8]. Available from: https://www.who.int/features/ebola/storymap/en/</RefTotal> <RefLink>https://www.who.int/features/ebola/storymap/en/</RefLink> </Reference> </References> <Media> <Tables> <Table format="png"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <Caption><Pgraph><Mark1>Table 1: The number of keywords after the application of pruning technique</Mark1></Pgraph></Caption> </Table> <Table format="png"> <MediaNo>2</MediaNo> <MediaID>2</MediaID> <Caption><Pgraph><Mark1>Table 2: Performance of the proposed algorithms in comparison with other algorithms</Mark1></Pgraph></Caption> </Table> <NoOfTables>2</NoOfTables> </Tables> <Figures> <Figure format="png" height="766" width="561"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <Caption><Pgraph><Mark1>Figure 1: Framework of data collection and monitoring for infectious diseases</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="323" width="780"> <MediaNo>2</MediaNo> <MediaID>2</MediaID> <Caption><Pgraph><Mark1>Figure 2: Accuracy and confusion matrix of the proposed algorithm</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="612" width="819"> <MediaNo>3</MediaNo> <MediaID>3</MediaID> <Caption><Pgraph><Mark1>Figure 3: Monitoring of geographical distribution of the tweets about measles from 01:00 am 01/03/2019 to 12:00 pm 08/03/2019</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="590" width="803"> <MediaNo>4</MediaNo> <MediaID>4</MediaID> <Caption><Pgraph><Mark1>Figure 4: CDC report about the incident of measles in the United States [51]</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="545" width="801"> <MediaNo>5</MediaNo> <MediaID>5</MediaID> <Caption><Pgraph><Mark1>Figure 5: Monitoring of geographical distribution of the tweets about Ebola from 01:00 am 01/03/2019 to 12:00 pm 08/03/2019</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="542" width="816"> <MediaNo>6</MediaNo> <MediaID>6</MediaID> <Caption><Pgraph><Mark1>Figure 6: WHO report about the incident of Ebola [52]</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="515" width="813"> <MediaNo>7</MediaNo> <MediaID>7</MediaID> <Caption><Pgraph><Mark1>Figure 7: Monitoring of geographical distribution of the tweets about HIV from 01:00 am 01/03/2019 to 12:00 pm 08/03/2019</Mark1></Pgraph></Caption> </Figure> <NoOfPictures>7</NoOfPictures> </Figures> <InlineFigures> <Figure format="png" height="30" width="74"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <AltText>Equation 1</AltText> </Figure> <Figure format="png" height="22" width="74"> <MediaNo>2</MediaNo> <MediaID>2</MediaID> <AltText>Equation 2</AltText> </Figure> <Figure format="png" height="62" width="143"> <MediaNo>3</MediaNo> <MediaID>3</MediaID> <AltText>Equation 3</AltText> </Figure> <Figure format="png" height="20" width="131"> <MediaNo>4</MediaNo> <MediaID>4</MediaID> <AltText>Equation 4</AltText> </Figure> <NoOfPictures>4</NoOfPictures> </InlineFigures> <Attachments> <NoOfAttachments>0</NoOfAttachments> </Attachments> </Media> </OrigData> </GmsArticle>