Modality prediction of biomedical literature images using multimodal feature representation

mibe000166 10.3205/mibe000166 urn:nbn:de:0183-mibe0001665 Research Article Modality prediction of biomedical literature images using multimodal feature representation Klassifikation von Bildern der biomedizinischen Literatur unter Anwendung multimodaler Merkmale Pelka Pelka Obioma O

Department of Computer Science, University of Applied Sciences and Arts Dortmund, GermanyDepartment of Computer Science, University of Applied Sciences and Arts Dortmund, Germany

obioma.pelka@googlemail.com author Friedrich Friedrich Christoph M. CM

Department of Computer Science, University of Applied Sciences and Arts Dortmund, Germany

author German Medical Science GMS Publishing House

Düsseldorf

610 biomedical literature multimodal imaging principal component analysis Random Forest support vector machines biomedizinische Literatur multimodale Bildgebung Hauptkomponentenanalyse Random Forest Support Vector Machines 20160824 engl This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). 1860-9171 12 2 GMS Medizinische Informatik, Biometrie und Epidemiologie GMS Med Inform Biom Epidemiol 20 Jahre Medizinische Informatik in Dortmund 04 Dieser Beitrag stellt Modellierungsansätze vor, die die Modalität von Bildern der biomedizinischen Literatur automatisch vorhersagen. Verschiedene state-of-the-art-Verfahren der visuellen Merkmalsextraktion, wie Bag-of-Keypoints mit Dense-SIFT-Deskriptoren, Joint Composite-Descriptor und Fuzzy Color-Histogram wurden eingesetzt, um die charakteristische Gegebenheiten der Bilder darzustellen. Für die textuellen Merkmalsextraktion wurde das Bag-Of-Words-Verfahren angewandt. Die Reduktion des Lexikons erfolgte mittels des χ2-Tests. Die Anwendung der Principal Components Analysis führte zur Reduzierung der Merkmalsdimension. Eine Verbesserung der Klassifikationsrate wurde durch unterschiedliche Kombinationen zwischen visuellen und textuellen Merkmalen erzielt. Die Lernverfahren Random Forest mit 100 bis 500 Entscheidungsbäumen und SVM mit einer linearen Kernel-Funktion und dem C-Parameter (C=0,05) kamen zum Vorhersagen der Modalitäten zum Einsatz. Bessere Klassifikationsraten wurden mit dem Lernverfahren Random Forest erzielt. Mit der Anwendung von Dense-SIFT-Deskriptoren an Stelle von Lowe-SIFT-Deskriptoren wird das Ergebnis zusätzlich verbessert. This paper presents the modelling approaches performed to automatically predict the modality of images found in biomedical literature. Various state-of-the-art visual features such as Bag-of-Keypoints computed with dense SIFT descriptors, texture features and Joint Composite Descriptors were used for visual image representation. Text representation was obtained by vector quantisation on a Bag-of-Words dictionary generated using attribute importance derived from a χ2-test. Computing the principal components separately on each feature, dimension reduction as well as computational load reduction was achieved. Various multiple feature fusions were adopted to supplement visual image information with corresponding text information. The improvement obtained when using multimodal features vs. visual or text features was detected, analysed and evaluated. Random Forest models with 100 to 500 deep trees grown by resampling, a multi class linear kernel SVM with C=0.05 and a late fusion of the two classifiers were used for modality prediction. A Random Forest classifier achieved a higher accuracy and computed Bag-of-Keypoints with dense SIFT descriptors proved to be a better approach than with Lowe SIFT. 1 IntroductionClinicians have implied on the importance of the modality of an image in several user-studies. The usage of modality information significantly increases the retrieval efficiency, thus the image modality has become an essential and relevant factor regarding medical information retrieval, as this helps to filter out irrelevant information from the retrieval process . This task was proposed at the ImageCLEF 2015 Medical Classification Task and this paper describes the modelling approaches done by the Biomedical Computer Science Group (BCSG) . Several approaches were used by participating research groups such as the image tensor decomposition technique with maximum margin regression (MMR) in , 2D color feature based covariance descriptors proposed in and convolutional neural networks in .In ImageCLEF 2010–2013, a similar task was proposed. The two differences to ImageCLEF 2015 was an additional class ‘COMP’ representing compound figures with at least two subfigures and the size of the distributed collection. The number of figures distributed for the task was 2,933 images</PlainText></TextGroup>. Various research teams proposed different approaches. In <TextLink reference="7"></TextLink>, a spatial pyramid on opponent SIFT with χ<Superscript>2</Superscript> SVM was applied. Multiple color and texture features combined with different fusing techniques were used in <TextLink reference="8"></TextLink>. </Pgraph><Pgraph>The aim of the ImageCLEF 2015 Medical Task was to adjust the task, by removing the compound class to avoid possible bias and by increasing the number of images in the collection to observe accuracy outcome on a larger database. The distributed image collection in ImageCLEF 2015 contains a total number of 6,776 images. <TextGroup><PlainText>4,532 images</PlainText></TextGroup> in the training set and 2,244 images in the independent evaluation set. The proposed modality class hierarchy in Figure 1 <ImgLink imgNo="1" imgType="figure"/> was developed for images that occur in biomedical literature. The journal articles corresponding to the images were distributed as XML files, giving the opportunity of using text information, such as captions and MeSH terms. </Pgraph><Pgraph>The objective in this proposed work is to extract visual and text information from biomedical literature images in a large database, model and train classifiers to automatically predict the modality using the hierarchical classification scheme in Figure 1 <ImgLink imgNo="1" imgType="figure"/>. The classification scheme was proposed in <TextLink reference="9"></TextLink>. Dimension reduction is computed using the principal component analysis as this was not evaluated in previous approaches.</Pgraph><Pgraph>The proposed approach can be mapped to other classification problems in the medical field. For example, information from clinical reports found in the Picture Archival and Communication System (PACS) can be used to index these reports to a defined classification scheme. This leads to a more effective case retrieval search, as text information from clinical reports can be combined with the modality of medical images to filter relevant cases.</Pgraph></TextBlock> <TextBlock linked="yes" name="2 Materials and methods"> <MainHeadline>2 Materials and methods</MainHeadline><Pgraph>This section describes in subsection 2.1 the image database distributed for the proposed task. The methods used for visual feature extraction are detailed in subsection 2.2 and for text feature extraction in section 2.3. In subsection 2.4, the setup used for classification is described together with evaluated learning algorithms.</Pgraph><SubHeadline>2.1 Distributed image collection</SubHeadline><Pgraph>Before extracting features and training classifiers, an explorative analysis on the distributed training database was done. Some modality categories were represented by few annotated examples, thus the expansion of the original collection was strived in order to counteract the imbalanced dataset. The result of the analysis can be seen in Figure 2 <ImgLink imgNo="2" imgType="figure"/>. The modality ‘GFIG’ which represents ‘statistical figures, graphs and charts’ has over 2000 annotated images in comparison to ‘GPLI’ representing ‘program listings’ with 1 annotated example.</Pgraph><Pgraph>Hence, additional datasets were created using the images distributed at the ImageCLEF 2013 AMIA Medical Task <TextLink reference="1"></TextLink>. This could be done, due to the similarity of the modality hierarchies used at both tasks. The four datasets detailed below were used for modality prediction:</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1."><Mark1>DataSet</Mark1><Mark1><Subscript>1</Subscript></Mark1> (DS<Subscript>1</Subscript>): The original training collection distributed for the subfigure classification task in ImageCLEF 2015 Medical Classification. </ListItem><ListItem level="1" levelPosition="2" numString="2."><Mark1>DataSet</Mark1><Mark1><Subscript>2</Subscript></Mark1> (DS<Subscript>2</Subscript>): Additive to DataSet<Subscript>1</Subscript>, the complete collection distributed in ImageCLEF 2013 AMIA Medical Task. The collection contains over 300,000 annotated images from over 45,000 biomedical research articles of the PubMed Central repository (<Hyperlink href="http://www.ncbi.nlm.nih.gov/pmc/">http://www.ncbi.nlm.nih.gov/pmc/</Hyperlink>) hosted by the U.S. National Library of Medicine. </ListItem><ListItem level="1" levelPosition="3" numString="3."><Mark1>DataSet</Mark1><Mark1><Subscript>3</Subscript></Mark1> (DS<Subscript>3</Subscript>): Additive to DataSet<Subscript>1</Subscript>, the collection distributed for the Modality Classification ImageCLEF2013 AMIA Medical Subtask. This is a sub-collection of DataSet<Subscript>2</Subscript> and contains figures annotated into 31 categories. Figures belonging to the compound figure ‘COMP’ category were eliminated to attain the same categories as in DataSet<Subscript>1</Subscript>. </ListItem><ListItem level="1" levelPosition="4" numString="4."><Mark1>DataSet</Mark1><Mark1><Subscript>4</Subscript></Mark1> (DS<Subscript>4</Subscript>): The sub-collection for the Modality Classification Task as in ImageCLEF 2013 AMIA Medical Task but without the ‘COMP’ category. </ListItem></OrderedList></Pgraph><Pgraph>The distributed collection for training and evaluation contained grayscale as well as coloured images. <TextGroup><PlainText>Figure 3</PlainText></TextGroup> <ImgLink imgNo="3" imgType="figure"/> shows some examples of the images.</Pgraph><SubHeadline>2.2 Visual features</SubHeadline><Pgraph>Medical imaging techniques have been enhancing over the years, bringing along additional possibilities of detailed diagnosis as well as several useful clinical applications. These techniques have different acquiring methods and hence several feature extraction methods are needed to capture the various characteristics found in medical imaging <TextLink reference="10"></TextLink>. Analysing the modality classification hierarchy, it is necessary that the images are completely represented. Global features were extracted from the complete image and local features from subregions of the images.</Pgraph><Pgraph>The images were visually represented using various state-of-the-art feature extraction methods, which are common techniques used in computer vision. </Pgraph><SubHeadline2>2.2.1 Local features</SubHeadline2><SubHeadline3>Bag-of-Keypoints</SubHeadline3><Pgraph>In <TextLink reference="11"></TextLink>, <TextLink reference="12"></TextLink>, the Bag-of-Features approach was shown to achieve high prediction results. This approach originates from the Bag-of-Words (BoW) approach which is frequently used for text categorisation. The Bag-of-Key<TextGroup><PlainText>points</PlainText></TextGroup> (BoK) proposed in <TextLink reference="13"></TextLink> is based on vector quantisation of affine invariant descriptors of image patches. The advantages of this approach are computational efficiency, simplicity and it’s invariance to affine transformations. These features have been extracted and represented using a 12000-dimensional vector. The Bag-of-Keypoints has become a common state-of-the-art technique for image classification.</Pgraph><Pgraph>The functions used for feature extraction are from the <Mark1><Mark2>VLFEAT</Mark2></Mark1> library <TextLink reference="14"></TextLink>. Dense SIFT descriptors were chosen as visual key-points. The key-points were uniformly extracted at several resolutions with an interval grid of 4 pixels. The <Mark1><Mark2>vl-phow</Mark2></Mark1> function of <TextLink reference="14"></TextLink> was used to extract the descriptors. To reduce computational time, <Mark1><Mark2>k-means</Mark2></Mark1> clustering with approximated nearest neighbours (ANN) <TextLink reference="12"></TextLink> was applied. The <Mark1><Mark2>k-means</Mark2></Mark1> technique was computed on randomly chosen dense SIFT descriptors using the <TextGroup><Mark1><Mark2>vl-kmeans</Mark2></Mark1></TextGroup> function to partition into <Mark1><Mark2>k</Mark2></Mark1> clusters in order to minimise the within-cluster sum of squares <TextLink reference="15"></TextLink>. Several parameters were used to tune the extraction of the BoK feature with respect to this specific task and data collection. The tuning was done by applying the approach with parameter values other than default and analysing the effect on prediction accuracy. An excerpt of these parameters is listed below:</Pgraph><Pgraph><UnorderedList><ListItem level="1">Initialisation: The centres of <Mark1><Mark2>k-means</Mark2></Mark1> clustering were initialised with random points.</ListItem><ListItem level="1">Codebook size: The number of keypoints used for vector quantisation was 12,000.</ListItem><ListItem level="1">Convergence: A maximum number of 20 iterations was applied to allow the <Mark1><Mark2>k-means</Mark2></Mark1> algorithm to converge.</ListItem></UnorderedList></Pgraph><Pgraph>Detailed description of parameters for the VLFEAT functions can be found in <TextLink reference="14"></TextLink>. </Pgraph><SubHeadline3>Pyramid Histogram of Oriented Gradients</SubHeadline3><Pgraph>The Pyramid Histogram of Oriented Gradients was proposed in <TextLink reference="16"></TextLink>. The idea of this approach originates from two sources: the image pyramid representation of <TextLink reference="17"></TextLink> and the Histogram of Gradient Orientation (HOG) of <TextLink reference="18"></TextLink>. The proposed approach measures the shape correspondence between two images by the distance between their descriptors using the spatial pyramid kernel. Hence, the images can be represented by their local shapes as well as the spatial layout of the shapes. This feature was represented using a 630-dimensional vector.</Pgraph><SubHeadline2>2.2.2 Global features</SubHeadline2><SubHeadline3>Basic features</SubHeadline3><Pgraph>To obtain a global representation of the images, the following high-level features were extracted: brightness, clipping, contrast, hueCount, saturation, complexity, skew and energy. These features were extracted using the LiRE Library <TextLink reference="19"></TextLink>. The basic features (BAF) were represented as an 8-dimensional vector.</Pgraph><SubHeadline3>Color Edge Directivity Descriptor</SubHeadline3><Pgraph>The Color and Edge Directivity Descriptor (CEDD) is a low-level feature proposed in <TextLink reference="20"></TextLink>. Using a 3bits/bin quantization, the feature incorporates color and texture information in a histogram. This feature is suitable for large image databases as the CEDD size per image is limited to <TextGroup><PlainText>54 bytes</PlainText></TextGroup>. An advantage of the CEDD is the low computational power compared to that of MPEG-7 Descriptors. The features were extracted using the <Mark1><Mark2>cedd-matlab</Mark2></Mark1> function and represented as a 144-dimensional vector. </Pgraph><SubHeadline3>Joint Composite Descriptor</SubHeadline3><Pgraph>The Joint Composite Descriptor (JCD) is a combination of two Compact Composite Descriptors: Color and Edge Directivity Descriptor (CEDD) and Fuzzy Color Texture Histogram (FCTH) <TextLink reference="20"></TextLink>. The color information extracted from the two descriptors are derived using the same fuzzy system, hence combining the different texture areas is taken to be an optimised unification of both descriptors. The feature is represented as a 168-dimensional vector and was extracted using <TextLink reference="19"></TextLink>.</Pgraph><SubHeadline3>Tamura</SubHeadline3><Pgraph>The Tamura features are texture features which strive to correspond to human visual perception and are useful for feature selection and texture analyser design <TextLink reference="21"></TextLink>. The features consist of the following six approximated basic texture features: coarseness, contrast, directionality, line-likeness, regularity and roughness and were represented as a 18-dimensional vector using <TextLink reference="19"></TextLink>.</Pgraph><SubHeadline3>Gabor</SubHeadline3><Pgraph>The texture feature based on Gabor functions were extracted and represented as a 60-dimensional vector <TextLink reference="22"></TextLink>.</Pgraph><SubHeadline3>Fuzzy Color Histogram</SubHeadline3><Pgraph>As the Conventional Color Histogram (CCH) neither considers color similarity across different bins nor color dissimilarity in the same bin, a new color histogram representation Fuzzy Color Histogram (FCH) was presented. The proposed approach considers the color similarity of each pixels color associated with all histogram bins using the fuzzy-set membership function. Experimental results have shown that the FCH achieves better results than the CCH when applied for image indexing and retrieval <TextLink reference="23"></TextLink>, <TextLink reference="24"></TextLink>. A 10-dimensional vector was used to represent the FCH features.</Pgraph><SubHeadline>2.3 Text features</SubHeadline><Pgraph>For text representation, the figure caption belonging to each image was used. The images in the ImageCLEF 2015 Medical Classification task are figures from biomedical literature which were published in PubMed Central (<Hyperlink href="http://www.ncbi.nlm.nih.gov/pmc/">http://www.ncbi.nlm.nih.gov/pmc/</Hyperlink>) and are licensed for redistribution under a creative commons license. For each image in the collection, the corresponding journal article can be retrieved using the indexed image ID. The text representation of all images was computed using the Bag-of-Words approach. </Pgraph><SubHeadline3>Bag-of-Words</SubHeadline3><Pgraph>The Bag-of-Words (BoW) approach is a common method used for text classification. The text features are extracted by counting the frequency or presence of words in the text to be classified <TextLink reference="25"></TextLink>. Hence, a dictionary has to be defined first. The dictionary generation was obtained <TextGroup><PlainText>using</PlainText></TextGroup> all words from all captions found in the distributed collection. Various text preprocessing procedures such as removal of stop-words and stemming using the Porter stemmer <TextLink reference="26"></TextLink> were applied. The Porter stemming technique aims to automatically remove suffixes in words to find terms with a common stem which usually have similar meanings. Striving to generate a dictionary containing relevant words to each modality class, the attribute importance for all words was computed. This process was done by vector quantising all figures using the dictionary and then applying the χ<Superscript>2</Superscript>-test on the derived matrix. A final dictionary containing 438 words was obtained by selecting words with attribute importance over the fixed cutoff threshold of 0.36 (maximum attribute importance). Several dictionaries with other cutoff thresholds [minimum attribute importance 0 and mean attribute importance 0.15] were created in the development stage. However, the dictionary containing 438 words (cutoff 0.36) proved to achieve the best prediction results in the development set.</Pgraph><SubHeadline>2.4 Classifier setup</SubHeadline><Pgraph>To reduce computational time, the feature dimensions were reduced using principal component analysis <TextLink reference="27"></TextLink>. The Principal Component Analysis (PCA) replaces the original variables by a smaller number of derived variables, the principal components, which are linear combinations of the original variables <TextLink reference="28"></TextLink>. The PCA was applied on all features beside the basic features. Table 1 <ImgLink imgNo="1" imgType="table"/> displays the original and reduced vector size after computing the principal component analysis on the features.</Pgraph><Pgraph>The PCA is separately computed on each feature vector group, which is displayed in Figure 4 <ImgLink imgNo="4" imgType="figure"/>. Subsequently, the best number of principal components needed to describe the various features were estimated iteratively by model selection. This step is described in Figure 5 <ImgLink imgNo="5" imgType="figure"/>. The PCA was computed on the complete data collection, i.e. training and test set. The pca function from the MATLAB software package <TextLink reference="29"></TextLink> was used with default values. The application of this unsupervised learning proved to be a better approach in comparison to a separate projection of the test data.</Pgraph><Pgraph>Several runs were submitted for prediction evaluation accuracy. These runs are a different concatenation of the derived principal components from the feature groups.</Pgraph><Pgraph>The reduced vector size for the Gabor features is 0 as they were not added to the final fused feature vector. This was done as the principal components from the Gabor features did not improve prediction accuracy.</Pgraph><Pgraph>During development stage, the ImageCLEF 2015 training database was divided into 10 randomly generated training and validation datasets using the bootstrap algorithm <TextLink reference="30"></TextLink>. Approximately 68.2 (%) of the images were used for training and 31.8 (%) for validation. For the official evaluation, the complete training set was used for training and the distributed test set for prediction. A Random Forest classifier <TextLink reference="31"></TextLink> was used for modality prediction. The Random Forest approach combines several tree predictors in a way that each tree is dependable on the values of a Random vector which is sampled independently using the same distribution for all trees in the forest. </Pgraph><Pgraph>The classifier was modelled with the <Mark1><Mark2>fitensemble</Mark2></Mark1> function from the MATLAB software package <TextLink reference="29"></TextLink>. The list below shows an excerpt of parameters from the <Mark1><Mark2>fitensemble</Mark2></Mark1> function that were tuned.</Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">Number of trees = 500</ListItem><ListItem level="1" levelPosition="2" numString="2.">Number of leaf size = [0.04, 0.06, 0.3]</ListItem><ListItem level="1" levelPosition="3" numString="3.">Split criterion = Deviance (max. deviance reduction)</ListItem><ListItem level="1" levelPosition="4" numString="4.">Ensemble grown = By resampling</ListItem></OrderedList></Pgraph><Pgraph>The parameters were tuned by modelling the classifier with the <Mark1><Mark2>fitensemble</Mark2></Mark1> default values. Each parameter is then assigned a different value and the outcome on the prediction accuracy is noted. When a parameter proves to achieve better results compared to the default values, the changed values are used. This step was manually done repeatedly and is similar to Figure 5 <ImgLink imgNo="5" imgType="figure"/>.</Pgraph><Pgraph>In addition to the Random Forest classifier, a multiclass linear kernel Support Vector Machines (SVM) was modelled using the LibSVM library <TextLink reference="32"></TextLink>. SVMs are a common machine learning method used for regression and classification tasks. This step was done in order to compare prediction accuracies between the two classifiers. SVMs have been a popular approach in former ImageCLEF medical challenges <TextLink reference="1"></TextLink> and have proved to achieve good results. </Pgraph><Pgraph>Figure 4 <ImgLink imgNo="4" imgType="figure"/> shows the classifier setup for the early fusion prediction using either the Random Forest learning algorithm or the support vector machines learning algorithm. The late fusion classification setup is similar to that shown in Figure 4 <ImgLink imgNo="4" imgType="figure"/>. The final prediction is obtained by combining the predicted results from the Random Forest classifier and from the support vector machine classifier.</Pgraph><Pgraph> </Pgraph><Pgraph>The SVM classifier was tuned similar to the Random Forest classifier. Several parameters were assigned values other than the default values and the outcome on the accuracy is noted. Different kernel functions were tested. The list below shows the best values for the svmtrain function. </Pgraph><Pgraph><OrderedList><ListItem level="1" levelPosition="1" numString="1.">SVM_type = Classification SVM</ListItem><ListItem level="1" levelPosition="2" numString="2.">Kernel function = linear</ListItem><ListItem level="1" levelPosition="3" numString="3.">Cost parameter = 0.05</ListItem><ListItem level="1" levelPosition="4" numString="4.">Probability estimates = 1</ListItem></OrderedList></Pgraph></TextBlock> <TextBlock linked="yes" name="3 Results"> <MainHeadline>3 Results</MainHeadline><Pgraph>This section describes the results obtained using the proposed modelling approach at the ImageCLEF 2015 Medical Classification Task. The setups for all submitted runs are described as well as the achieved official evaluation prediction accuracy in subsection 3.1 and 3.2. The classification rate (%) obtained using a different hierarchy level as that of the submitted runs and different parameters for the classifier are outlined in section 3.3. After the test set ground truth was distributed, an ex-post analysis was computed. This was done to detect the contribution each feature has to the overall prediction accuracy. The feature contribution is described in section 3.4. In section 3.5, other findings of this experimental modelling approach are listed.</Pgraph><SubHeadline>3.1 Runs</SubHeadline><Pgraph>Eight runs belonging to the three submission categories: Visual, Textual and Mixed were submitted for evaluation. The submission category ‘Mixed’ represents a combination of visual and text features. Six of the submitted runs belong to Mixed, one to Text and one to Visual. This decision was made because not only better accuracies were obtained during development, but also evaluation results presented by other ImageCLEF participant groups in the previous years have proven to be better when the ‘Mixed’ submission category is used <TextLink reference="1"></TextLink>, <TextLink reference="33"></TextLink>. </Pgraph><Pgraph> </Pgraph><SubHeadline>3.2 Evaluation results</SubHeadline><Pgraph>In Figure 6 <ImgLink imgNo="6" imgType="figure"/>, the prediction accuracies (%) achieved with the eight runs described in section 3.1 are displayed. Each bar represents the different feature combinations as mentioned in Table 2 <ImgLink imgNo="2" imgType="table"/>. The blue parts show the official prediction accuracies obtained at evaluation and the purple parts display the difference to the prediction accuracies obtained during the development stage. </Pgraph><Pgraph>Run1 and Run4 are significantly (p<0.05) better than other submitted runs but not among each other.</Pgraph><Pgraph>The Biomedical Computer Science Group (BCSG) outperformed all other participants in all submission categories <TextLink reference="34"></TextLink>. This is displayed in Figure 7 <ImgLink imgNo="7" imgType="figure"/>. The coloured bars display the results of the submitted runs and the gray bars that of other participants <TextLink reference="5"></TextLink>, <TextLink reference="6"></TextLink>. The BCSG was the only participant to use both visual and text features.</Pgraph><SubHeadline>3.3 Development results </SubHeadline><SubHeadline3>Number of Random trees</SubHeadline3><Pgraph>Several numbers of trees were used to tune the Random Forest classifier on the development set. Increasing the number of trees showed to improve the accuracy: [<TextGroup><PlainText>100 trees</PlainText></TextGroup> = 87.9%, 500 trees = 90.12 % and 1000 trees = 90.54%]. Adjusting the number of trees to more than 1000 trees lead to a slight increase in accuracy but increased computational time.</Pgraph><SubHeadline3>Bag-Of-Words Dictionary Generation</SubHeadline3><Pgraph>The effect of the χ<Superscript>2</Superscript>-test in the text preprocessing stage was evaluated. A dictionary for the BoW approach was generated by deliberately omitting the application of the χ<Superscript>2</Superscript>-test to calculate attribute importance. The prediction accuracy obtained was approximately 4% less.</Pgraph><SubHeadline3>Flat Hierarchy Classification Scheme</SubHeadline3><Pgraph>The Random Forest and SVM classifiers were trained using different hierarchy interpretations of the classification scheme proposed in <TextLink reference="2"></TextLink>. In Table 3 <ImgLink imgNo="3" imgType="table"/>, the results of the deep hierarchy interpertation is listed. Thus, the first level is “Modality Classification” containing two classes “Diagnostic images” and “Generic biomedical illustrations”. The final prediction accuracy obtained using this method is listed in the row “Complete Classification” and is 67.07 (%).</Pgraph><Pgraph>This observation was computed using DataSet<Subscript>4</Subscript> and the Bag-of-Keypoints and Color and Edge Directivity Descriptor features. Table 3 <ImgLink imgNo="3" imgType="table"/> shows the prediction accuracies (%) obtained at the various hierarchy levels. Random Forest was used as learning algorithm.</Pgraph><SubHeadline>3.4 Feature contribution</SubHeadline><Pgraph>In an ex-post analysis, the contribution of all features was evaluated. The contribution of a feature to prediction performance is an important attribute that assists efficient feature selection. To compute each feature contribution, the difference between the accuracy when all features are combined and the accuracy when a certain feature is omitted was calculated. This step was computed by applying the classifier model setup Run1 on the original evaluation set. </Pgraph><Pgraph>The calculated feature contribution of all features is displayed in Figure 8 <ImgLink imgNo="8" imgType="figure"/>. It can be seen that omitting most of the extracted features led to a negative effect on prediction performance. In contrary, the omission of the PHOG feature has a positive effect on the prediction performance and hence increased the evaluation accuracy with +0.27 (%).</Pgraph><Pgraph>As can be seen, the BoK, BoW and Basic Features contributed the most to the overall prediction performance. The extracted Gabor features were not added to the final fused feature vector used for classification. This was done as the principal components from the Gabor image representation did not improve prediction accuracy at development stage.</Pgraph><SubHeadline>3.5 Findings</SubHeadline><Pgraph>Negative differences in prediction performance were observed in the following:</Pgraph><Pgraph><UnorderedList><ListItem level="1">when the Bag-of-Keypoints visual representation was computed using Lowe SIFT descriptors <TextLink reference="35"></TextLink> instead of dense SIFT descriptors </ListItem><ListItem level="1">when feature vectors were not normalised before training the classifier</ListItem><ListItem level="1">when single precision format was used instead of double precision format to define floating-points numbers</ListItem></UnorderedList></Pgraph><Pgraph>Computing the PCA on the complete data collection as described in section 2.4 proved to be a better approach. The prediction accuracy increased [~4%] when the un<TextGroup><PlainText>super</PlainText></TextGroup>vised learning information was added. The prediction accuracy for Run2 was 60.91 (%) with unsupervised learning information and 56.63 (%) without.</Pgraph></TextBlock> <TextBlock linked="yes" name="4 Discussion"> <MainHeadline>4 Discussion</MainHeadline><Pgraph>Various classification prediction approaches using multiple feature fusion and combinations of learning algorithms were explored for predicting the modality of biomedical literature images. There is a discrepancy between prediction performance on the evaluation set and on the sampled training and validation sets. This can be seen in Figure 6 <ImgLink imgNo="6" imgType="figure"/> and is taken to be an overfitting problem. The overfitting problem is assumed to be caused by the process of finding the efficient number of principal components. That is, the number of principal components were chosen based on the positive effect on the results recorded during development stage. This method was not representative on the official test set. On the other hand, supplementing visual image representation with corresponding text representation proved to be a beneficial strategy regarding classification accuracy. Omitting any of the described features apart from the PHOG feature, resulted in a decrease on the official evaluation accuracy. </Pgraph><SubHeadline>4.1 Future work</SubHeadline><Pgraph>Regarding the overfitting problem, a data augmentation approach will be done to increase the training set to tackle the problem of the unbalanced dataset. Convolutional neural networks are currently a common approach used for prediction tasks, hence an approach using this method will be computed in order to compare accuracies and detect limitations. The process of finding efficient numbers of principal components should be computed less dependently on the training set. This will lead to a more reliable and independent classification approach. In addition, more features for shape representation will be extracted.</Pgraph></TextBlock> <TextBlock linked="yes" name="5 Conclusion"> <MainHeadline>5 Conclusion</MainHeadline><Pgraph>In this work, the modelling approaches applied to predict the modality of biomedical literature images are presented. Several state-of-the-art features for visual and text representation of all images were extracted. These features were selected in order to distinguish between <TextGroup><PlainText>30 modalities</PlainText></TextGroup>. To reduce computational load, the principal component analysis was applied. Two classifiers, Random Forests with 100–500 deep trees and a multi class linear kernel SVM with C=0.05, were used for training and prediction. The proposed approach was applied on the ImageCLEF 2015 Medical Classification Task and outperformed all other participants.</Pgraph></TextBlock> <TextBlock linked="yes" name="Notes"> <MainHeadline>Notes</MainHeadline><SubHeadline>Competing interests</SubHeadline><Pgraph>The authors declare that they have no competing interests.</Pgraph></TextBlock> <References linked="yes"> <Reference refNo="1"> <RefAuthor>G. Seco de Herrera A</RefAuthor> <RefAuthor>Kalpathy Cramer J</RefAuthor> <RefAuthor>Demner Fushman D</RefAuthor> <RefAuthor>Antani S</RefAuthor> <RefAuthor>Müller H</RefAuthor> <RefTitle>Overview of the ImageCLEF 2013 Medical Tasks</RefTitle> <RefYear></RefYear> <RefBookTitle>Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26, 2013</RefBookTitle> <RefPage></RefPage> <RefTotal>G. Seco de Herrera A, Kalpathy Cramer J, Demner Fushman D, Antani S, Müller H. Overview of the ImageCLEF 2013 Medical Tasks. In: Forner P, Navigli R, Tufis D, Ferro N, editors. Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26, 2013. (CEUR Workshop Proceedings; 1179). Available from: http://ceur-ws.org/Vol-1179/CLEF2013wn-ImageCLEF-SecoDeHerreraEt2013b.pdf</RefTotal> <RefLink>http://ceur-ws.org/Vol-1179/CLEF2013wn-ImageCLEF-SecoDeHerreraEt2013b.pdf</RefLink> </Reference> <Reference refNo="2"> <RefAuthor>G. Seco de Herrera A</RefAuthor> <RefAuthor>Müller H</RefAuthor> <RefAuthor>Bromuri S</RefAuthor> <RefTitle>Overview of the ImageCLEF 2015 Medical Classification Task</RefTitle> <RefYear></RefYear> <RefBookTitle>Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015</RefBookTitle> <RefPage></RefPage> <RefTotal>G. Seco de Herrera A, Müller H, Bromuri S. Overview of the ImageCLEF 2015 Medical Classification Task. In: Cappellato L, Ferro N, Jones GJF, San Juan E, editors. Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015. (CEUR Workshop Proceedings; 1391). Available from: http://ceur-ws.org/Vol-1391/172-CR.pdf</RefTotal> <RefLink>http://ceur-ws.org/Vol-1391/172-CR.pdf</RefLink> </Reference> <Reference refNo="3"> <RefAuthor>Pelka O</RefAuthor> <RefAuthor>Friedrich CM</RefAuthor> <RefTitle>FHDO Biomedical Computer Science Group at Medical Classification Task of ImageCLEF 2015</RefTitle> <RefYear>2015</RefYear> <RefBookTitle>Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015</RefBookTitle> <RefPage></RefPage> <RefTotal>Pelka O, Friedrich CM. FHDO Biomedical Computer Science Group at Medical Classification Task of ImageCLEF 2015. In: Cappellato L, Ferro N, Jones GJF, San Juan E, editors. Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015. (CEUR Workshop Proceedings; 1391). Available from: http://ceur-ws.org/Vol-1391/14-CR.pdf</RefTotal> <RefLink>http://ceur-ws.org/Vol-1391/14-CR.pdf</RefLink> </Reference> <Reference refNo="4"> <RefAuthor>Rodríguez-Sánchez AJ</RefAuthor> <RefAuthor>Fontanella S</RefAuthor> <RefAuthor>Piater J</RefAuthor> <RefAuthor>Szedmak S</RefAuthor> <RefTitle>IIS at ImageCLEF 2015: Multi-label classification task</RefTitle> <RefYear></RefYear> <RefBookTitle>Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015</RefBookTitle> <RefPage></RefPage> <RefTotal>Rodríguez-Sánchez AJ, Fontanella S, Piater J, Szedmak S. IIS at ImageCLEF 2015: Multi-label classification task. In: Cappellato L, Ferro N, Jones GJF, San Juan E, editors. Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015. (CEUR Workshop Proceedings; 1391). Available from: http://ceur-ws.org/Vol-1391/67-CR.pdf</RefTotal> <RefLink>http://ceur-ws.org/Vol-1391/67-CR.pdf</RefLink> </Reference> <Reference refNo="5"> <RefAuthor>Cirujeda P</RefAuthor> <RefAuthor>Binefa X</RefAuthor> <RefTitle>Medical Image Classification via 2D color feature based Covariance Descriptors</RefTitle> <RefYear></RefYear> <RefBookTitle>Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015</RefBookTitle> <RefPage></RefPage> <RefTotal>Cirujeda P, Binefa X. Medical Image Classification via 2D color feature based Covariance Descriptors. In: Cappellato L, Ferro N, Jones GJF, San Juan E, editors. Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015. (CEUR Workshop Proceedings; 1391). Available from: http://ceur-ws.org/Vol-1391/44-CR.pdf</RefTotal> <RefLink>http://ceur-ws.org/Vol-1391/44-CR.pdf</RefLink> </Reference> <Reference refNo="6"> <RefAuthor>Lyndon D</RefAuthor> <RefAuthor>Kumar A</RefAuthor> <RefAuthor>Kim J</RefAuthor> <RefAuthor>Leong P</RefAuthor> <RefAuthor>Feng D</RefAuthor> <RefTitle>Convolutional Neural Networks for Subfigure Classification</RefTitle> <RefYear></RefYear> <RefBookTitle>Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015</RefBookTitle> <RefPage></RefPage> <RefTotal>Lyndon D, Kumar A, Kim J, Leong P, Feng D. Convolutional Neural Networks for Subfigure Classification. In: Cappellato L, Ferro N, Jones GJF, San Juan E, editors. Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, September 8-11, 2015. (CEUR Workshop Proceedings; 1391). Available from: http://ceur-ws.org/Vol-1391/53-CR.pdf</RefTotal> <RefLink>http://ceur-ws.org/Vol-1391/53-CR.pdf</RefLink> </Reference> <Reference refNo="10"> <RefAuthor>Chen Ch</RefAuthor> <RefTitle></RefTitle> <RefYear>2013</RefYear> <RefBookTitle>Computer vision in medical imaging</RefBookTitle> <RefPage></RefPage> <RefTotal>Chen Ch. Computer vision in medical imaging. Singapore: World Scientific; 2013. (Series in Computer Vision; 2).</RefTotal> </Reference> <Reference refNo="7"> <RefAuthor>Lazebnik S</RefAuthor> <RefAuthor>Schmid C</RefAuthor> <RefAuthor>Ponce J</RefAuthor> <RefTitle>Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories</RefTitle> <RefYear>2006</RefYear> <RefBookTitle>2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’06); 2006 Jun 17-22; Volume 2</RefBookTitle> <RefPage>2169-78</RefPage> <RefTotal>Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’06); 2006 Jun 17-22; Volume 2. 2006. p. 2169-78. DOI: 10.1109/cvpr.2006.68</RefTotal> <RefLink>http://dx.doi.org/10.1109/cvpr.2006.68</RefLink> </Reference> <Reference refNo="8"> <RefAuthor>Kitanovski I</RefAuthor> <RefAuthor>Dimitrovski I</RefAuthor> <RefAuthor>Loskovska S</RefAuthor> <RefTitle>FCSE at medical tasks of ImageCLEF 2013</RefTitle> <RefYear>2013</RefYear> <RefBookTitle>Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26, 2013. (CEUR Workshop Proceedings; 1179)</RefBookTitle> <RefPage></RefPage> <RefTotal>Kitanovski I, Dimitrovski I, Loskovska S. FCSE at medical tasks of ImageCLEF 2013. In: Forner P, Navigli R, Tufis D, Ferro N, editors. Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26, 2013. (CEUR Workshop Proceedings; 1179). Available from: http://ceur-ws.org/Vol-1179/CLEF2013wn-ImageCLEF-KitanovskiEt2013.pdf</RefTotal> <RefLink>http://ceur-ws.org/Vol-1179/CLEF2013wn-ImageCLEF-KitanovskiEt2013.pdf</RefLink> </Reference> <Reference refNo="9"> <RefAuthor>Abedini M</RefAuthor> <RefAuthor>Cao L</RefAuthor> <RefAuthor>Codella N</RefAuthor> <RefAuthor>Connell JH</RefAuthor> <RefAuthor>Garnavi R</RefAuthor> <RefAuthor>Geva A</RefAuthor> <RefAuthor>Merler M</RefAuthor> <RefAuthor>Nguyen QB</RefAuthor> <RefAuthor>Pankanti SU</RefAuthor> <RefAuthor>Smith JR</RefAuthor> <RefAuthor>Sun X</RefAuthor> <RefAuthor>Tzadok A</RefAuthor> <RefTitle>IBM Research at ImageCLEF 2013 Medical Tasks</RefTitle> <RefYear>2013</RefYear> <RefBookTitle>American Medical Informatics Association (AMIA) ImageCLEF; Medical Image Retrieval Workshop; 2013</RefBookTitle> <RefPage></RefPage> <RefTotal>Abedini M, Cao L, Codella N, Connell JH, Garnavi R, Geva A, Merler M, Nguyen QB, Pankanti SU, Smith JR, Sun X, Tzadok A. IBM Research at ImageCLEF 2013 Medical Tasks. In: American Medical Informatics Association (AMIA) ImageCLEF; Medical Image Retrieval Workshop; 2013. Available from: http://www.cs.columbia.edu/~mmerler/IBM_CLEF13_WN.pdf</RefTotal> <RefLink>http://www.cs.columbia.edu/~mmerler/IBM_CLEF13_WN.pdf</RefLink> </Reference> <Reference refNo="11"> <RefAuthor>Müller H</RefAuthor> <RefAuthor>Kalpathy-Cramer J</RefAuthor> <RefAuthor>Demner-Fushman D</RefAuthor> <RefAuthor>Antani S</RefAuthor> <RefTitle>Creating a classification of image types in the medical literature for visual categorization</RefTitle> <RefYear>2012</RefYear> <RefBookTitle>Proc. SPIE 8319, Medical Imaging 2012: Advanced PACS-based Imaging Informatics and Therapeutic Applications, 83190P (February 23, 2012)</RefBookTitle> <RefPage></RefPage> <RefTotal>Müller H, Kalpathy-Cramer J, Demner-Fushman D, Antani S. Creating a classification of image types in the medical literature for visual categorization. In: Proc. SPIE 8319, Medical Imaging 2012: Advanced PACS-based Imaging Informatics and Therapeutic Applications, 83190P (February 23, 2012). DOI:10.1117/12.911186</RefTotal> <RefLink>http://dx.doi.org/10.1117/12.911186</RefLink> </Reference> <Reference refNo="12"> <RefAuthor>Zhang H</RefAuthor> <RefAuthor>Berg AC</RefAuthor> <RefAuthor>Maire M</RefAuthor> <RefAuthor>Malik J</RefAuthor> <RefTitle>SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition</RefTitle> <RefYear>2006</RefYear> <RefBookTitle>2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’06); 2006 Jun 17-22; Volume 2</RefBookTitle> <RefPage>2126-36</RefPage> <RefTotal>Zhang H, Berg AC, Maire M, Malik J. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’06); 2006 Jun 17-22; Volume 2. 2006. p. 2126-36. DOI: 10.1109/CVPR.2006.301</RefTotal> <RefLink>http://dx.doi.org/10.1109/CVPR.2006.301</RefLink> </Reference> <Reference refNo="13"> <RefAuthor>Csurka G</RefAuthor> <RefAuthor>Dance CR</RefAuthor> <RefAuthor>Fan L</RefAuthor> <RefAuthor>Willamowski J</RefAuthor> <RefAuthor>Bray C</RefAuthor> <RefTitle>Visual categorization with bags of keypoints</RefTitle> <RefYear>2004</RefYear> <RefBookTitle>Workshop on Statistical Learning in Computer Vision, 8th European Conference on Computer Vision (ECCV); 2004; Prague, Czech Republic</RefBookTitle> <RefPage>1-22</RefPage> <RefTotal>Csurka G, Dance CR, Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, 8th European Conference on Computer Vision (ECCV); 2004; Prague, Czech Republic. p. 1-22.</RefTotal> </Reference> <Reference refNo="14"> <RefAuthor>Vedaldi A</RefAuthor> <RefAuthor>Fulkerson B</RefAuthor> <RefTitle>Vlfeat: An Open and Portable Library of Computer Vision Algorithms</RefTitle> <RefYear>2010</RefYear> <RefBookTitle>Proceedings of the 18th ACM International Conference on Multimedia; MM ’10; New York, NY, USA: ACM; 2010</RefBookTitle> <RefPage>1469-72</RefPage> <RefTotal>Vedaldi A, Fulkerson B. Vlfeat: An Open and Portable Library of Computer Vision Algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia; MM ’10; New York, NY, USA: ACM; 2010. p. 1469-72. DOI: 10.1145/1873951.1874249</RefTotal> <RefLink>http://dx.doi.org/10.1145/1873951.1874249</RefLink> </Reference> <Reference refNo="15"> <RefAuthor>Indyk P</RefAuthor> <RefAuthor>Motwani R</RefAuthor> <RefTitle>Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality</RefTitle> <RefYear>1998</RefYear> <RefBookTitle>Proceedings of the 30th Annual ACM Symposium on Theory of Computing; STOC ’98; New York, NY, USA: ACM; 1998</RefBookTitle> <RefPage>604-13</RefPage> <RefTotal>Indyk P, Motwani R. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing; STOC ’98; New York, NY, USA: ACM; 1998. p. 604–613. DOI: 10.1145/276698.276876</RefTotal> <RefLink>http://dx.doi.org/10.1145/276698.276876</RefLink> </Reference> <Reference refNo="16"> <RefAuthor>Hartigan JA</RefAuthor> <RefAuthor>Wong MA</RefAuthor> <RefTitle>Algorithm AS 136: A K-Means Clustering Algorithm</RefTitle> <RefYear>1979</RefYear> <RefJournal>J R Stat Soc Ser C Appl Stat</RefJournal> <RefPage>100-8</RefPage> <RefTotal>Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. J R Stat Soc Ser C Appl Stat.1979;28(1):100-8. DOI: 10.2307/2346830</RefTotal> <RefLink>http://dx.doi.org/10.2307/2346830</RefLink> </Reference> <Reference refNo="17"> <RefAuthor>Lux M</RefAuthor> <RefAuthor>Chatzichristofis SA</RefAuthor> <RefTitle>Lire: Lucene Image Retrieval An Extensible Java CBIR Library</RefTitle> <RefYear></RefYear> <RefBookTitle>Proceedings of the 16th ACM International Conference on Multimedia; MM ’08; New York, NY, USA: ACM; 2008</RefBookTitle> <RefPage>1085-88</RefPage> <RefTotal>Lux M, Chatzichristofis SA. Lire: Lucene Image Retrieval An Extensible Java CBIR Library. In: Proceedings of the 16th ACM International Conference on Multimedia; MM ’08; New York, NY, USA: ACM; 2008. p. 1085-88. DOI: 10.1145/1459359.1459577</RefTotal> <RefLink>http://dx.doi.org/10.1145/1459359.1459577</RefLink> </Reference> <Reference refNo="18"> <RefAuthor>Chatzichristofis SA</RefAuthor> <RefAuthor>Boutalis YS</RefAuthor> <RefTitle></RefTitle> <RefYear>2011</RefYear> <RefBookTitle>Compact Composite Descriptors for Content Based Image Retrieval: Basics, Concepts, Tools</RefBookTitle> <RefPage></RefPage> <RefTotal>Chatzichristofis SA, Boutalis YS. Compact Composite Descriptors for Content Based Image Retrieval: Basics, Concepts, Tools. Saarbrücken, Germany: VDM Verlag Dr. Müller; 2011.</RefTotal> </Reference> <Reference refNo="19"> <RefAuthor>Manjunath BS</RefAuthor> <RefAuthor>Ma WY</RefAuthor> <RefTitle>Texture features for browsing and retrieval of image data</RefTitle> <RefYear>1996</RefYear> <RefJournal>IEEE Trans Pattern Anal Mach Intell</RefJournal> <RefPage>837-42</RefPage> <RefTotal>Manjunath BS, Ma WY. Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell. 1996;18(8):837-42. DOI: 10.1109/34.531803</RefTotal> <RefLink>http://dx.doi.org/10.1109/34.531803</RefLink> </Reference> <Reference refNo="20"> <RefAuthor>Tamura H</RefAuthor> <RefAuthor>Mori S</RefAuthor> <RefAuthor>Yamawaki T</RefAuthor> <RefTitle>Textural features corresponding to visual perception</RefTitle> <RefYear>1978</RefYear> <RefJournal>IEEE Trans Syst Man Cybern</RefJournal> <RefPage>460-73</RefPage> <RefTotal>Tamura H, Mori S, Yamawaki T. Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern. 1978;(8)6:460-73. DOI: 10.1109/TSMC.1978.4309999</RefTotal> <RefLink>http://dx.doi.org/10.1109/TSMC.1978.4309999</RefLink> </Reference> <Reference refNo="21"> <RefAuthor>Han J</RefAuthor> <RefAuthor>Ma KK</RefAuthor> <RefTitle>Fuzzy color histogram and its use in color image retrieval</RefTitle> <RefYear>2002</RefYear> <RefJournal>IEEE Trans Image Process</RefJournal> <RefPage>944-52</RefPage> <RefTotal>Han J, Ma KK. Fuzzy color histogram and its use in color image retrieval. IEEE Trans Image Process. 2002;11(8):944-52. DOI: 10.1109/TIP.2002.801585</RefTotal> <RefLink>http://dx.doi.org/10.1109/TIP.2002.801585</RefLink> </Reference> <Reference refNo="22"> <RefAuthor>Konstantinidis K</RefAuthor> <RefAuthor>Gasteratos A</RefAuthor> <RefAuthor>Andreadis I</RefAuthor> <RefTitle>Image retrieval based on fuzzy color histogram processing</RefTitle> <RefYear>2005</RefYear> <RefJournal>Opt Commun</RefJournal> <RefPage>375-86</RefPage> <RefTotal>Konstantinidis K, Gasteratos A, Andreadis I. Image retrieval based on fuzzy color histogram processing. Opt Commun. 2005; 248(4-6):375-86. DOI: 10.1016/j.optcom.2004.12.029</RefTotal> <RefLink>http://dx.doi.org/10.1016/j.optcom.2004.12.029</RefLink> </Reference> <Reference refNo="23"> <RefAuthor>Bosch A</RefAuthor> <RefAuthor>Zisserman A</RefAuthor> <RefAuthor>Munoz X</RefAuthor> <RefTitle>Representing Shape with a Spatial Pyramid Kernel</RefTitle> <RefYear>2007</RefYear> <RefBookTitle>Proceedings of the 6th ACM International Conference on Image and Video Retrieval; New York, NY, USA: ACM; CIVR '07; 2007</RefBookTitle> <RefPage>401-8</RefPage> <RefTotal>Bosch A, Zisserman A, Munoz X. Representing Shape with a Spatial Pyramid Kernel. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval; New York, NY, USA: ACM; CIVR '07; 2007. p. 401-8. DOI: 10.1145/1282280.1282340</RefTotal> <RefLink>http://dx.doi.org/10.1145/1282280.1282340</RefLink> </Reference> <Reference refNo="24"> <RefAuthor>Dalal N</RefAuthor> <RefAuthor>Triggs B</RefAuthor> <RefTitle>Histograms of oriented gradients for human detection</RefTitle> <RefYear>2005</RefYear> <RefBookTitle>2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05); Volume 1; 2005 Jun 20-25; San Diego, CA, USA</RefBookTitle> <RefPage>886-93</RefPage> <RefTotal>Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05); Volume 1; 2005 Jun 20-25; San Diego, CA, USA. p. 886-93. DOI: 10.1109/cvpr.2005.177</RefTotal> <RefLink>http://dx.doi.org/10.1109/cvpr.2005.177</RefLink> </Reference> <Reference refNo="25"> <RefAuthor>Salton G</RefAuthor> <RefAuthor>McGill MJ</RefAuthor> <RefTitle></RefTitle> <RefYear>1983</RefYear> <RefBookTitle>Introduction to modern information retrieval</RefBookTitle> <RefPage></RefPage> <RefTotal>Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1983. (McGraw-Hill computer science series).</RefTotal> </Reference> <Reference refNo="26"> <RefAuthor>Porter MF</RefAuthor> <RefTitle>An algorithm for suffix stripping</RefTitle> <RefYear>1980</RefYear> <RefJournal>Program</RefJournal> <RefPage>130-7</RefPage> <RefTotal>Porter MF. An algorithm for suffix stripping. Program. 1980;14(3):130-7. DOI: 10.1108/eb046814</RefTotal> <RefLink>http://dx.doi.org/10.1108/eb046814</RefLink> </Reference> <Reference refNo="27"> <RefAuthor>Dunteman GH</RefAuthor> <RefTitle></RefTitle> <RefYear>1989</RefYear> <RefBookTitle>Principal Components Analysis. A Sage University paper</RefBookTitle> <RefPage></RefPage> <RefTotal>Dunteman GH. Principal Components Analysis. A Sage University paper. Newbury Park, London, New Delhi: Sage publications; 1989. (Quantitative applications in the social sciences; vol. 69).</RefTotal> </Reference> <Reference refNo="28"> <RefAuthor>Jolliffe IT</RefAuthor> <RefTitle>Principal component analysis</RefTitle> <RefYear>2002</RefYear> <RefTotal>Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer-Verlag; 2002. (Springer Series in Statistics). DOI: 10.1007/b98835</RefTotal> <RefLink>http://dx.doi.org/10.1007/b98835</RefLink> </Reference> <Reference refNo="29"> <RefAuthor>Efron B</RefAuthor> <RefAuthor>Tibshirani RJ</RefAuthor> <RefTitle></RefTitle> <RefYear>1993</RefYear> <RefBookTitle>An Introduction to the Bootstrap</RefBookTitle> <RefPage></RefPage> <RefTotal>Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.</RefTotal> </Reference> <Reference refNo="30"> <RefAuthor>Breiman L</RefAuthor> <RefTitle>Random Forests</RefTitle> <RefYear>2001</RefYear> <RefJournal>Mach Learn</RefJournal> <RefPage>5-32</RefPage> <RefTotal>Breiman L. Random Forests. Mach Learn. 2001;45(1):5-32. DOI: 10.1023/A:1010933404324</RefTotal> <RefLink>http://dx.doi.org/10.1023/A:1010933404324</RefLink> </Reference> <Reference refNo="31"> <RefAuthor>Anonym</RefAuthor> <RefTitle></RefTitle> <RefYear>2015</RefYear> <RefBookTitle>MATLAB. version 8.5.0.197613 (R2015a)</RefBookTitle> <RefPage></RefPage> <RefTotal>MATLAB. version 8.5.0.197613 (R2015a). Natick, Massachusetts: The MathWorks Inc.; 2015.</RefTotal> </Reference> <Reference refNo="32"> <RefAuthor>Chang CC</RefAuthor> <RefAuthor>Lin CJ</RefAuthor> <RefTitle>LIBSVM: A library for support vector machines</RefTitle> <RefYear>2011</RefYear> <RefJournal>ACM Trans Intell Syst Technol</RefJournal> <RefArticleNo>27</RefArticleNo> <RefTotal>Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol. 2011;2(3):1-27. Article No. 27. DOI: 10.1145/1961189.1961199</RefTotal> <RefLink>http://dx.doi.org/10.1145/1961189.1961199</RefLink> </Reference> <Reference refNo="33"> <RefAuthor>Kalpathy-Cramer J</RefAuthor> <RefAuthor>de Herrera AG</RefAuthor> <RefAuthor>Demner-Fushman D</RefAuthor> <RefAuthor>Antani S</RefAuthor> <RefAuthor>Bedrick S</RefAuthor> <RefAuthor>Müller H</RefAuthor> <RefTitle>Evaluating performance of biomedical image retrieval systems – an overview of the medical image retrieval task at ImageCLEF 2004-2013</RefTitle> <RefYear>2015</RefYear> <RefJournal>Comput Med Imaging Graph</RefJournal> <RefPage>55-61</RefPage> <RefTotal>Kalpathy-Cramer J, de Herrera AG, Demner-Fushman D, Antani S, Bedrick S, Müller H. Evaluating performance of biomedical image retrieval systems – an overview of the medical image retrieval task at ImageCLEF 2004-2013. Comput Med Imaging Graph. 2015 Jan;39:55-61. DOI: 10.1016/j.compmedimag.2014.03.004</RefTotal> <RefLink>http://dx.doi.org/10.1016/j.compmedimag.2014.03.004</RefLink> </Reference> <Reference refNo="34"> <RefAuthor>Villegas M</RefAuthor> <RefAuthor>Müller H</RefAuthor> <RefAuthor>Gilbert A</RefAuthor> <RefAuthor>Piras L</RefAuthor> <RefAuthor>Wang J</RefAuthor> <RefAuthor>Mikolajczyk K</RefAuthor> <RefAuthor></RefAuthor> <RefTitle>General Overview of ImageCLEF at the CLEF 2015 Labs</RefTitle> <RefYear>2015</RefYear> <RefBookTitle>Experimental IR Meets Multilinguality, Multimodality, and Interaction</RefBookTitle> <RefPage>444–61</RefPage> <RefTotal>Villegas M, Müller H, Gilbert A, Piras L, Wang J, Mikolajczyk K, et al. General Overview of ImageCLEF at the CLEF 2015 Labs. In: Mothe J, Savoy J, Kamps J, Pinel-Sauvagnat K, Jones GJF, SanJuan E, Cappellato L, Ferro N, editors. Experimental IR Meets Multilinguality, Multimodality, and Interaction. Springer International Publishing; 2015. p. 444–61. (Lecture Notes in Computer Science; Volume 9283). DOI: 10.1007/978-3-319-24027-5_45</RefTotal> <RefLink>http://dx.doi.org/10.1007/978-3-319-24027-5_45</RefLink> </Reference> <Reference refNo="35"> <RefAuthor>Lowe DG</RefAuthor> <RefTitle>Distinctive Image Features from Scale-Invariant Keypoints</RefTitle> <RefYear>2004</RefYear> <RefJournal>Int J Comput Vision</RefJournal> <RefPage>91-110</RefPage> <RefTotal>Lowe DG. Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vision. 2004;60(2):91-110. DOI: 10.1023/B:VISI.0000029664.99615.94</RefTotal> <RefLink>http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94</RefLink> </Reference> </References> <Media> <Tables> <Table format="png"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <Caption><Pgraph><Mark1>Table 1: Descriptors with original and reduced vector sizes</Mark1></Pgraph></Caption> </Table> <Table format="png"> <MediaNo>2</MediaNo> <MediaID>2</MediaID> <Caption><Pgraph><Mark1>Table 2: Submitted runs for evaluating prediction accuracy</Mark1></Pgraph></Caption> </Table> <Table format="png"> <MediaNo>3</MediaNo> <MediaID>3</MediaID> <Caption><Pgraph><Mark1>Table 3: Prediction accuracy (%) obtained using deep hierarchy interpretation on the DataSet</Mark1><Mark1><Subscript>4</Subscript></Mark1><Mark1> training set</Mark1></Pgraph></Caption> </Table> <NoOfTables>3</NoOfTables> </Tables> <Figures> <Figure format="png" height="530" width="746"> <MediaNo>1</MediaNo> <MediaID>1</MediaID> <Caption><Pgraph><Mark1>Figure 1: Modality hierarchy to be used for prediction [11]</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="412" width="420"> <MediaNo>2</MediaNo> <MediaID>2</MediaID> <Caption><Pgraph><Mark1>Figure 2: Explorative analysis on the training set of the distributed image collection</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="476" width="805"> <MediaNo>3</MediaNo> <MediaID>3</MediaID> <Caption><Pgraph><Mark1>Figure 3: Examples of distributed images</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="427" width="512"> <MediaNo>5</MediaNo> <MediaID>5</MediaID> <Caption><Pgraph><Mark1>Figure 5: Iterative step to detect efficient number of principal components</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="621" width="829"> <MediaNo>4</MediaNo> <MediaID>4</MediaID> <Caption><Pgraph><Mark1>Figure 4: Early fusion classifier setup using Random Forest or Support Vector Machines</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="418" width="465"> <MediaNo>6</MediaNo> <MediaID>6</MediaID> <Caption><Pgraph><Mark1>Figure 6: Official evaluation prediction performance of the submitted runs in blue bars and difference in performance at development stage in purple bars</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="490" width="711"> <MediaNo>7</MediaNo> <MediaID>7</MediaID> <Caption><Pgraph><Mark1>Figure 7: Official prediction performance. Colored bars represent the performance of BCSG and gray bars of other participants.</Mark1></Pgraph></Caption> </Figure> <Figure format="png" height="500" width="683"> <MediaNo>8</MediaNo> <MediaID>8</MediaID> <Caption><Pgraph><Mark1>Figure 8: Feature contribution of all extracted visual and text representation used for modality prediction</Mark1></Pgraph></Caption> </Figure> <NoOfPictures>8</NoOfPictures> </Figures> <InlineFigures> <NoOfPictures>0</NoOfPictures> </InlineFigures> <Attachments> <NoOfAttachments>0</NoOfAttachments> </Attachments> </Media> </OrigData> </GmsArticle>