ECML PKDD 2014 Tutorial on
"Medical Mining for Clinical Knowledge Discovery"
By: Pedro Pereira Rodrigues, Myra Spiliopoulou,
Ernestina Menasalvas
http://kmd.cs.ovgu.de/tutorial_ecmlpkdd2014.html
Short
Description:
Medical data mining is a mature area of research,
characterized by both simple and very elaborate methods, mostly dedicated to
solving a concrete problem of disease diagnosis, disease description or success
prediction for a treatment. Clinical knowledge discovery encompasses analysis
of epidemiological data, and of clinical and administrative data on patients;
clinical decision support builds upon findings on these data. We elaborate on
how data mining can contribute to such findings, we enumerate challenges of
model learning, data availability and data provenance, and identify challenges
on Big Medical Data.
AFFILIATIONS:
Prof. Ernestina Menasalvas
Centro de Tecnologia Biomedica, Universidad Politecnica
de Madrid,
Campus de Montegancedo, Pozuelo de Alarcon, Spain
Email: ernestina.menasalvas@upm.es
URL: http://midas.ctb.upm.es/midas/ernestina-menasalvas
Prof. Pedro Pereira Rodrigues
CINTESIS & LIAAD, Health
Information and Decision Sciences Department,
Faculty of Medicine of the
University of Porto,
Alameda Prof. Hernani Monteiro, 4200-319 Porto, Portugal
Email: pprodrigues@med.up.pt
Web:
http://users.med.up.pt/pprodrigues/
Prof. Myra Spiliopoulou
Research Group on Knowledge
Management and Discovery (KMD),
Faculty of Computer Science,
Otto-von-Guericke-University Magdeburg,
PO Box 4120, 39016 Magdeburg,
Germany
Email:
myra@iti.cs.uni-magdeburg.de
URL: http://omen.cs.uni-magdeburg.de/itikmd/
Outline:
(+) Self-presentation of the Tutorial
and Overview of the Domain (all)
(1) Knowledge Discovery from Epidemiological
Data - Myra Spiliopoulou
(2) Knowledge Discovery from
Clinical and Administrative Data - Pedro Pereira Rodrigues
(3) Knowledge Discovery
Challenges on Big Medical Data - Ernestina Menasalvas
(4) Knowledge Discovery for
Clinical Decision Support - Pedro Pereira Rodrigues
(+) Concluding Remarks
Part 1 -
Knowledge discovery from epidemiological data
We start the tutorial with knowledge discovery from
epidemiological data: clinical diagnosis and treatment prescriptions are based
on the findings of epidemiological research. Epidemiological data come from
population-based studies with randomly selected participants, from
cross-sectional studies and from clinical trials. Epidemiological research is
largely hypothesis-driven; mining studies are rare. We elaborate on what
epidemiological data look like, discuss how mining can contribute to their
analysis and highlight inherent challenges of data provenance, big feature
spaces, data reliability and novel types of concept drift.
Download Slides of Part 1 (.pdf)
Part 2 -
Knowledge discovery from clinical and administrative data
Electronic Health Records (EHR) and
Admission-Discharge-Transfer (ADT) systems are valuable data sources for
medical data mining focusing both on clinical research and health services
research. However, these sources are also usually prone to erroneous, bogus,
missing and default data. We will present and discuss case studies where these
data quality problems yielded incorrect data mining results. Furthermore, we
will present success cases where mining these sources resulted in relevant
knowledge discovery in the fields of clinical and health services research.
Download Slides of Part 2 (.pdf)
Part 3 -
Knowledge discovery challenges on Big medical data
Big Data in the Healthcare Sector for improving the
overall efficiency and quality of care delivery has still to address several
technical requirements such as: i) Generalized use of
Electronic Health Records (EHR) and its implications; ii) ) preprocessing of natural text contained
in reports, notes, etc.; iii) annotation of images; iv) dealing with data silos
and building of solutions avoiding them and v) data quality mechanisms. On the top of it one important issue is
access to data and related to these aspects legal aspects have to be taken into
account. We will analyze al this challenges with special emphasis on text and
images processing.
Download Slides of Part 3 (.pdf)
Part 4 -
Knowledge discovery for clinical decision support
Clinical decision support is usually seen as the final
goal of knowledge discovery and modeling for clinical practice, as it aims to
apply developed models to individual patients. However, the real-world
application of learning-based models for clinical decision support is hindered
by the need to integrate with evidence-based medicine and the acceptance by the
clinicians that the model includes quality evidence regarding the particular
patient. We will discuss the main issues regarding this struggle, addressing
the advantages of probabilistic methods, and present success cases of
probabilistic learning-based decision support systems.
Download Slides of Part 4 (.pdf)
Target
Audience:
The target groups are: postgraduate students with
solid background in data mining; research scholars who are interested in
medical mining and need some guidance through the subfields of this huge
research area; research scholars who work on one of the medical mining areas
and are interested in transferring their methods in other areas.
The
Presenters:
Ernestina
Menasalvas is Professor at the Department of Computer Systems Languages
and Sw Engineering, Faculty of Computer Science of Universidad Politecnica de
Madrid (UPM) and a member of the MIDAS, Data Mining and data simulation group
at the Center of Biotechnology at UPM. Her subject area is Data Mining, and
most recently using medical data. She has also participated in a range of
projects related to data integration and mining on mobile devices. She has
published three international books on web mining (edited by Springer in 2003,
2004 and 2009 respectively) as well as in several key international journals.
Pedro Pereira
Rodrigues is Professor at the Department of Health Information
and Decision Sciences, Faculty of Medicine of the University of Porto, and a researcher
at the Biostatistics and Intelligent Data Analysis group of the Center for
Health Technologies and Services Research. His main research area is machine
learning, currently devoted to Bayesian networks applications to clinical
research and decision support. He has edited 4 conference proceedings, and
published articles in indexed peer-reviewed journals and conference
proceedings. He helped organizing events as general chair (CBMS 2013) and PC
chair (CBMS 2014, ECMLPKDD 2015, and several thematic tracks and workshops
since 2007), is a member of the steering committee of CBMS, and was a member of
the program committee for more than 20 editions of international conferences
(e.g. IJCAI, ECMLPKDD, ICML, CBMS). He has also co-organized a tutorial in IBERAMIA
2012.
Myra Spiliopoulou is Professor of Business Information Systems at the
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany.
Her main research interest is knowledge discovery and adaptation. She has
publications in international journals and conferences on web mining, text
mining, model monitoring and adaptation over evolving data. She served as PC
Co-Chair of ECML PKDD 2006 and NLDB 2008, as Tutorials Chair at ICDM 2010 and
Workshops Chair at ICDM 2011. In 2012, she is PC Chair of the 36th Annual
Conference of the German Classification Society (GfKl 2012, Hildesheim, August
2012). Next to several tutorials at ECML PKDD, she has given tutorials at User
Modeling 2007 and at KDD 2009.
LITERATURE,
as of April 2014 (own papers marked with a *):
The literature below comes
from the time of tutorial submission. For the updated literature list, please
consult the slides of the tutorial.
Part 1a -
Mining Epidemiological Data
1. S.E. Baumeister,
H. Voelzke, P. Marschall, (...), C. Schmidt,
S. Flessa, D. Alte. Impact of fatty liver disease
on health care utilization and costs in a general population: A 5-year
observation. Gastroenterology 134 (1), 85-94, (2008)
2. * U. Niemann, H.
Voelzke, J.-P. Kuehn, M. Spiliopoulou. Learning and Inspecting Classification Rules from Longitudinal
Epidemiological Data to Identify Predictive Features on Hepatic Steatosis.
Journal of Expert Systems with Applications, accepted (02/2014)
3. B. Preim, P.
Klemm, H. Hauser, K. Hegenscheid, S. Oeltze, K. Toennies, H. Voelzke. Visualization in Medicine and Life Sciences III.
Springer, Ch. "Visual Analytics of Image-Centric Cohort Studies in
Epidemiology" (2014)
4. H. Y. Shi, S. L. Hwang, K. T. Lee, and C. L. Lin.
In-hospital mortality after traumatic brain injury surgery: a nationwide
population-based comparison of mortality predictors used in artificial neural
network and logistic regression models. Journal of Neurosurgery, 118, 746-752,
(2013)
5. C. Zhanga, R.L.
Kodell. Subpopulation-specific confidence designation for more
informative biomedical classification. Artificial Intelligence in Medicine 58
(3), 155-163, (2013)
Part 1b -
Dealing with Evolution in Epidemiological Data
1. S. Ebadollahi, J. Sun, D. Gotz, J. Hu, D. Sow, and C.
Neti. Predicting patient trajectory of physiological data using temporal trends
in similar patients: A system for near-term prognostics,. AMIA Annu. Symp.
Proc., vol. 2010, pp. 192-196, (2010)
2. * G. Krempl, Z. F. Siddiqui, and M. Spiliopoulou.
Online clustering of high-dimensional trajectories under concept drift. In
Proc. of ECML PKDD 2011, ser. LNAI, vol. 6912. Athens, Greece: Springer, (2011)
3. * Z. Siddiqui, M. Oliveira, J. Gama, and M.
Spiliopoulou. Where are we going? predicting the evolution of individuals. In
Proc. of the IDA 2012 Conf. on Intelligent Data Analysis, vol. LNCS 7619.
Helsinki, Finland: Springer, Oct. 2012, pp. 357-368, (2012)
4. H. Wang, F. Nie,
H. Huang, J. Yan, S. Kim, S. Risacher, A. Saykin, and L. Shen. High-order multi-task feature learning to identify
longitudinal phenotypic markers for Alzheimer's disease progression prediction.
In Adv. in Neural Inf. Processing Systems 25, eds., P. Bartlett, F.C.N.
Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, 1286-1294, (2012)
5. J. Zhou, J. Liu, V. A. Narayan, and J. Ye. Modeling
disease progression via fused sparse group lasso. In Proc. of KDD 2012, pages
1095-1103. ACM, (2012)
Part 2 -
Clinical and Administrative Data Mining
1. * Cruz-Correia, R., Rodrigues, P. P., Freitas, A.,
Almeida, F., Chen, R., & Costa-Pereira, A. (2009). Data Quality and Integration Issues in Electronic Health Records. In V.
Hristidis (Ed.), Information Discovery on Electronic Health Records (pp.
55-95). CRC Press.
2. Cismondi, F., Fialho, A. S., Vieira, S. M., Reti, S.
R., Sousa, J. M. C., & Finkelstein, S. N. (2013). Missing data in medical
databases: Impute, delete or classify? Artificial Intelligence in Medicine,
1–10. doi:10.1016/j.artmed.2013.01.003
3. Jiang, X., & Cooper, G. F. (2009). A real-time
temporal Bayesian architecture for event surveillance and its application to
patient-specific multiple disease outbreak detection. Data Mining and Knowledge
Discovery, 20(3), 328–360. doi:10.1007/s10618-009-0151-4
4. * Rodrigues, P. P., Dias, C. C., Rocha, D., Boldt, I.,
Teixeira-Pinto, A., & Cruz-Correia, R. (2013). Predicting visualization of hospital clinical reports using survival
analysis of access logs from a virtual patient record. In Proceedings of the
26th IEEE International Symposium on Computer-Based Medical Systems (pp.
461-464). Porto, Portugal. doi:10.1109/CBMS.2013.6627841
5. * Vasco, D., Rodrigues, P. P., & Gama, J. (2013). Contextual anomalies in medical data. In Proceedings
of the 26th IEEE International Symposium on Computer-Based Medical Systems (pp.
544-545). Porto, Portugal. doi:10.1109/CBMS.2013.6627869
6. Lian Duan, L., Khoshneshin, M., Street, W. N., &
Liu, M. (2013). Adverse drug effect detection. IEEE Journal of Biomedical and
Health Informatics, 17(2), 305–11. doi:10.1109/TITB.2012.2227272
Part 3: Big
Medical Data
1. Cusack CM, H. G. (2012). The future state of clinical
data capture and documentation: a report from AMIA's 2011 Policy Meeting.
Journal of the American Medical Informatics Association, 1-7.
2. Hani Neuvirth, M. O.-F. (2012). Toward Personalized
Care Management of Patients at Risk--the Diabetes Case Study.
3. Raghupathi W: Data Mining in Health Care. In
Healthcare Informatics: Improving Efficiency and Productivity. Edited by Kudyba
S. Taylor & Francis; 2010:211-223.
4. Raghupathi W, Kesh S: Interoperable electronic health
records design: towards a service-oriented architecture. e-Service
Journal 2007, 53-57.
5. IBM: Data Driven Healthcare Organizations Use Big Data
Analytics for Big Gains; 2013.
http://www03.ibm.com/industries/ca/en/healthcare/documents/Data_driven_healthcare_organizations_use_big_data_analytics_for_big_gains.pdf.
6. Ikanow: Data
Analytics for Healthcare: Creating Understanding from Big Data.
http://info.ikanow.com/Portals/163225/docs/data-analytics-for-healthcare.pdf.
7. jStart: How Big Data Analytics Reduced Medicaid
Readmissions. A Start Case Study; 2012.
http://www-01.ibm.com/software/ebusiness/jstart/portfolio/uncMedicaidCaseStudy.pdf.
Part 4 -
Clinical decision support
1. * Cardoso, T., Teixeira-Pinto, A., Rodrigues,
P. P., Aragao, I., Costa-Pereira, A., & Sarmento, A. E. (2013). Predisposition, Insult/Infection, Response and Organ Dysfunction (PIRO):
A Pilot Clinical Staging System for Hospital Mortality in Patients with
Infection. PLoS ONE, 8(7), e70806.
doi:10.1371/journal.pone.0070806
2. * Sebastiao, R., Gama, J., Rodrigues, P. P., & Bernardes,
J. (2010). Monitoring Incremental Histogram Distribution for
Change Detection in Data Streams. In M. M. Gaber, R.
R. Vatsavai, O. A. Omitaomu,
J. Gama, N. V Chawla, & A. R. Ganguly (Eds.),
Knowledge Discovery from Sensor Data (Vol. 5840, pp. 25-42). Springer Verlag. Doi:10.1007/978-3-642-12519-5_2
3. Celi, L. A.,
Hinske, L. C., Alterovitz, G., & Szolovits, P. (2008). An artificial intelligence tool to predict fluid
requirement in the intensive care unit: a proof-of-concept study. Critical Care
(London, England), 12(6), R151. doi:10.1186/cc7140
4. Nee, O., & Hein, A. (2010). Clinical Decision
Support with Guidelines and Bayesian Networks. In Advances in Decision Support
Systems. INTECH.
5. Sesen, M. B., Nicholson, A. E., Banares-Alcantara,
R., Kadir, T., & Brady, M. (2013). Bayesian networks for clinical decision support in
lung cancer care. PloS One, 8(12), e82349.
doi:10.1371/journal.pone.0082349