Overview
The Interactive Medical Miner is a feature-rich tool for classification and model drill-down, designed to study epidemiological data. The tool encompasses supervised learning (with decision trees and classification rules), utilities for data selection, and a rich panel with options for inspecting individual classification rules and single nodes, and for studying the distribution of variables in each of the target classes. Further, the Interactive Medical Miner also supports the juxtaposition of labeled and unlabeled data which can be insightful if some of the epidemiological data available to the medical researcher may be still unlabeled. We presented the set of methods and scientific workflow supported with our tool in a preceded article in the international journal Expert Systems with Applications (available at ScienceDirect).
Features
One of the major goals of personalized medicine is the discovery of subpopulations that share some risk factors or symptoms associated with a certain disease. The Interactive Medical Miner supports the medical researcher's mining task to identify these subgroups by offering the following features:
- data-driven instead of the in medicine obligatory hypothesis-driven approach,
- intuitive GUI and comprehensible algorithm parameter specification,
- generation of classification rules and decision trees (more classification algorithms are coming...),
- histogram for visualization of class distribution for labeled and unlabeled cohort probands,
- table with summary statistics for the dataset and model elements,
- selecting of cohort probands to investigate specific subgroups,
- filtering of variables which have already been identified to be predictive towards the target variable,
- single model aspects, e.g. a classification rule or a tree node, can be deeper studied by clicking on the respective element in the tree view,
- user-defined selection of variables for histogram generation.
Publications
- Interactive Medical Miner - Interactively exploring subpopulations in epidemiological datasetsBibTex PDF (author's manuscript)
- Uli Niemann, Myra Spiliopoulou, Henry Völzke, Jens-Peter Kühn
- Submitted as Demo paper at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Demo Track (ECMLPKDD 2014), September 15 - 19, 2014, Nancy, France. Accepted 06/2014.
Implementation
The Java implementation of our Interactive Medical Miner can be downloaded here. All required libraries, i.e. Weka and JFreeChart, are included in the download. To run the jar application, just start the InteractiveMedicalMiner.jar in the main directory (Java has to be installed on your computer). For best user experience, Windows is recommended as operating system.
Components
Click on a specific spot in the screenshot to get more information about the component (HTML image map).
To see the screenshot in full size, click here.
- Algorithm tabs: Currently, the Interactive Medical Miner offers two tabs: one for classification rule discovery and another for decision tree generation.
- Algorithm settings: In the upper left panel, the algorithm parameters can be specified. The mouse over of each element gives you additional information about each single parameter.
- Load dataset: This button opens a file chooser where you can specifiy the input dataset. Currently, the Interactive Medical Miner is limited to arff files. We provide four datasets in the download archive: The Breast Cancer Wisconsin dataset and the Diabetes dataset are both available at the UCI machine learning repository. The suffix _unlabeled-instances-added refer to the addition of ten artificial, unlabeled instances for demo purpose.
- Select class label: Specify the class label for which classification rules should be generated.
- Specify subpopulations: You can restrict the model learning to a certain subpopulation. When you click on the button "Specify Subpopulation...", a pop up window opens where you can easily filter probands/instances according to one or more criteria.
- Generate model: Click on the button "Build Rules" to generate a set of classification rules for the given parameter values. Please note that before clicking on the button you have to specify a class label.
- Sorting preference: The list of classification rules can be sorted according to several criteria, i.e. confidence, support, alphabetically or min.value count.
- Select class label: Specify the class label for which classification rules should be generated.
- Summary statistics: The first row of the summary statistics table depicts the class distribution of the total dataset the second row shows the class distribution of the selected rule. Support and confidence values are given below the table.
- Variable Selection: Select a variable from the list to receive the class distribution histogram for this particular variable.
- Tree view of classification rules: The list of discovered classification rules. Click on a single rule to update the summary statistics table and the histogram.
- Histogram: The histogram shows the class distribution of the total dataset or the selected rule while juxtaposing unlabeled and labeled instances.
People
- People who contributed to this work are:
- Uli Niemann, Research Assistant at KMD Lab, Otto-von-Guericke-University Magdeburg, Germany
- Myra Spiliopoulou, Head of KMD Lab, Otto-von-Guericke-University Magdeburg, Germany
- Henry Völzke, Institute for Community Medicine, Ernst-Moritz-Arndt-University of Greifswald, Germany
- Jens-Peter Kühn, Institute for Diagnostic Radiology and Neuroradiology, Ernst-Moritz-Arndt-University of Greifswald, Germany