AI for oncology Program


Pr Anita Burgun
Biomedical informatics Prairie fellow
Dr. Bastien Rance
Biomedical informatics
Pr. Laure Fournier
Radiologist                                       Prairie fellow
Dr. Eric Letousé
Researcher, DR Inserm
Prairie fellow

Steering Committee

Pr Stephanie Allassonniere - Mathematician, Prairie fellow
Pr. Marie-France Mamzer - Ethics
Pr. Sandrine Katsahian - Clinical research, Biostatistics
Dr. Sarah Zohar - Researcher (DR Inserm) in Biostatistics
Pr. Philippe Giraud - Radiation oncology
Dr. Raphael Porcher - Biostatistics, Prairie fellow
Pr. Cecile Badoual - Digital pathology
Pr. Guillaume Assié - Physician, Paris University AI educational program
Dr. Brigitte Sabatier - Pharmacist
Dr Juliette Djadi-Pratt - Clinical research
Dr Anne-Sophie Jannot - Biostatistics


Starting in 2013 with the development of a translational research data platform dedicated to cancer research, the SIRIC CARPEM has gained expertise in several methodological and technical aspects. These development have been transferred to the data repository developed in the Paris Cancer Institute: CARPEM program which integrates electronic health records (EHR) data from the participating AP-HP departments along with research data and biobank information to support studies on various kinds of cancers, including renal lung, blood, colorectal, and gynecological cancers (Rance 2016). State-of-the art approaches have been implemented to integrate heterogeneous data (e.g., Zapletal 2018), to link such data with open access environmental information, to find patient cohorts for studies, to identify subpopulations of patients, Solutions based on common data models have been proposed to harmonize data from disparate observational databases. The concept behind this approach is to transform data contained within the data sources into a common model, namely the OSIRIS model developed by the inter-SIRIC group and the French National Cancer Institute (INCa), and the OMOP Common Data Model developed by the Observational Health Data Sciences and Informatics (OHDSI) Oncology Subgroup (Belenkaya 2019) (Warner 2019). Common vocabularies and ontologies have also been proposed such as the Radiation Oncology Structures ontology designed to standardize description in radiation therapy (Bibault 2018). All the research activities are structured within the Cordeliers Research Center, which gathers groups doing basic research in cancer and researchers with backgrounds in mathematics, statistics, and computer science, from the “Information Science to support Personalized Medicine”.

Moreover, following the data warehousing step, researchers and physicians involved in Paris Cancer Institute CARPEM are now developing Artificial Intelligence for precision oncology. This initiative is supported by the PaRis Artificial Intelligence Research InstitutE (Prairie).  Prairie is an institute for interdisciplinary Research and Education in AI, founded by academic and industrial members, with focus on health applications, specifically in oncology. Four members of the AI for oncology program of the Cancer Institute are also fellows of Prairie (S. Allassonniere, A. Burgun, L. Fournier, E. Letouzé). They will develop AI to support cancer detection, optimize the care trajectory of cancer patients, suggest optimal therapies, reduce medical errors, and improve subject enrollment into clinical trials.

Clinical research

A major achievement in the Paris Cancer institute CARPEM program has been the translational data warehouse. This data warehouse can be used to accelerate prospective clinical research. These data can be used to generate hypotheses for testing in traditional trials, identify potential biomarkers, perform feasibility studies, identify eligible patients, and assess the safety of drugs or devices after they are approved. With the objective of accelerating patient enrollment in clinical trials, the eligibility criteria can be aligned with EHR data models and mapped to common terminologies. However, several studies demonstrated that a significant percentage of those criteria could not be mapped to structured EHR data but were present only in text or images (e.g., (Girardeau 2017)). By leveraging natural language processing and image processing, IT technologies can dramatically increase the trial screening efficiency of oncologists

In addition, data collected in the Paris Cancer Institute will be reused to generate real world evidence (RWE) and, finally, fill knowledge gaps related to effectiveness, safety, and cost of treatment in “real life”. Regarding the medicines regulatory system, the European Medicines Agency (EMA) has recognized the importance of RWE, especially in oncology where they claimed that it was “our only hope to come to grips with combinatorial complexity”. The “AI in oncology” group has developed expertise on this field.

Regarding drug safety, we plan to mine to other data sources like social media to analyze adverse events associated with cancer treatments. We will rely on the expertise gained during projects like ADR-PRISM (Adverse drug Reactions from Patient Reports in Social Media). Indeed, ADR-PRISM was not focused on cancer but the results were quite encouraging, therefore similar methods can be applied to detect adverse events and analyze compliance to drug treatment in oncology.

Research in AI for oncology

In oncology, several authors (including (Bibault 2018) REF FOURNIER) showed that machine learning algorithms, especially convolutional neural networks and radiomics, could be used for detecting and evaluating cancer lesions, identify subgroups of patients (REF LETOUZE), facilitating treatment, and predicting treatment response. All machine learning algorithms require huge sets of high-quality data for training. The most successful applications of deep learning until now have been in image classification and text mining, with performance equivalent to expert.  More precisely, in the Institute, research has focused on:

  1. Deep phenotyping, including radiomics
  2. Identification of subpopulations
  3. Prediction of response to treatment
  4. AI for clinical decision making
  5. AI for patient safety

More recently, other topics have been investigated, including explainable hybrid models and guarantees for unbiased systems. The integrity of unbiased, clinically useful data depends upon the reliability of the data sources. Yet, data quality must be systematically assessed, and other risks related to sampling bias (data sets) and observation bias (measurement errors) must be systematically evaluated. All these considerations show that an interdisciplinary approach is needed to realize the potential of AI for precision oncology.

The “AI for oncology” group has developed several courses on AI and health. All are part of the Université de Paris curriculum and some benefit from the support of Prairie Institute:

  1. Introduction to medical informatics and AI in the medical curriculum (Paris Descartes University)
  2. Master class related to AI and oncology (Université de Paris AI and health program (G. Assié))
    • Master class on AI in oncology
    • Master class on AI in medical imaging
    • Master class on AI and genomics
    • Master class on AI and digital pathology
  3. Master Degree in biomedical informatics (A. Burgun)
  4. Master Degree in Big data in health (A.S. Jannot)
  5. Excellence program (double degree program in science and medicine) on AI iand health (starting in 2020) (S. Allassonniere)

Besides data collection and data integration, the Cancer Institute program will develop algorithms based on these data as well as frameworks that enable external validation of AI.

  • Regarding AI development, ongoing projects will be reinforced like, for example:
    • In radiation oncology to predict response to treatment
    • In imaging to support early diagnosis and refined classification of tumors based on radiomics (REF FOURNIER)
    • In personalized medicine to adapt the dose (dose finding based on RWE) (Boulet) (Lee)
  • New applications of AI will be explored, starting with:
    • Digital pathology
    • Patient safety and adverse events.
  • Transfer of AI algorithms to clinical settings requires rigorous clinical validation. Model design and training must be totally separated from clinical evaluation. Besides the training data set on which the model was trained and the validation set used during the development phase, external validation requires data sets that are totally independent from the initial ones (Allen 2019). These recommendations must be implemented for all the applications in oncology, in order to deliver high-quality AI models intended for use in clinical practice. Of note, this axis has been approved at the AP-HP level by the EDS steering committee. It will be developed through our participation in the PeCAN ERaPerMed project (starting in 2020, A. Burgun WP leader)


  • Moreover, unsupervised approaches like similarity metrics on data warehouse cases can be used to support clinical decision. Similarity metrics have already been tested to help diagnosis of rare diseases with encouraging results (Garcelon 2017) (Chen 2019). The same approach should be developed in oncology that suggest personalized treatment in the context of tumor board meetings. However, it requires to overcome obstacles related to data reuse, including possible data quality issues, and restrictive regulations.
Name Surname Title/Position Speciality Research Unit Resarch Team
Stéphanie Allassonniere Full Prof Applied mathematics UMRS 1138 Centre de Recherche des Cordeliers – Prairie Institute Information science to support personalized medicine
Guillaume Assié Full Prof Endocrinology Paris Descartes AI program
Cécile Badoual Full Prof Pathology
Anita Burgun Full Prof Biomedical informatics UMRS 1138 Centre de Recherche des Cordeliers – Prairie Institute Information science to support personalized medicine
Michaela Fontenay Full Prof Haematology
Laure Fournier Full Prof Radiology Prairie Institute
Philippe Giraud Full Prof Radiation oncology
Anne-Sophie Jannot Associate Prof Biostatistics UMRS 1138 Centre de Recherche des Cordeliers Information science to support personalized medicine
Sandrine Katsahian Full Prof Biostatistics UMRS 1138 Centre de Recherche des Cordeliers Information science to support personalized medicine
Eric Letouzé Researcher Bioinformatics UMRS 1138 Centre de Recherche des Cordeliers – Prairie Institute Functional Genomics of Solid Tumors (FunGeST)
Marie-France Mamzer Full Prof Medical ethics UMRS 1138 Centre de Recherche des Cordeliers
Raphael Porcher Full Prof Biostatistics CRESS – Prairie Institute
Bastien Rance Associate prof Bioinformatics UMRS 1138 Centre de Recherche des Cordeliers Information science to support personalized medicine
Brigitte Sabatier Full time Pharmacist UMRS 1138 Centre de Recherche des Cordeliers Information science to support personalized medicine
Sarah Zohar Researcher Biostatistics UMRS 1138 Centre de Recherche des Cordeliers Information science to support personalized medicine
Juliette Djadi-Prat Full time Physician Clinical research URC EGP

The selected following publications highlight the strength of clinical and translational research developed in the AI for Oncology program:

Clinical research:

  1. Rance B, Canuel V, Countouris H, Laurent-Puig P, Burgun A. Integrating Heterogeneous Biomedical Data for Cancer Research: the CARPEM infrastructure. Appl Clin Inform. 2016;7(2):260–274. Published 2016 May 4. doi:10.4338/ACI-2015-09-RA-0125
  2. Jannot AS, Zapletal E., Avillach P., Mamzer MF, Burgun A., Degoulet P. The Georges Pompidou University Hospital Clinical Data Warehouse: A 8 years follow-up experience. Int J Med Inform. 2017 Jun;102:21-28. doi: 10.1016/j.ijmedinf.2017.02.006 PMID: 28495345
  3. Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Brief Bioinform. 2015 Mar;16(2):280-90. doi: 10.1093/bib/bbu006. Epub 2014 Mar 7.
  4. Girardeau Y, Doods J, Zapletal E, et al. Leveraging the EHR4CR platform to support patient inclusion in academic studies: challenges and lessons learned. BMC Med Res Methodol. 2017;17(1):36. Published 2017 Feb 28. doi:10.1186/s12874-017-0299-3
  5. Burgun A., Bernal-Delgado E., Kuchinke W, van Staa T., Cunningam J., Lettieri E, Mazzali C, Oksen D, Estupiñan F, Barone A, Chêne G. Health data for public health: towards new ways of combining data sources to support research efforts in Europe, Yearb Med Inform. 2017 Aug;26(1):235-240. doi: 10.15265/IY-2017-034 .PMID: 29063571
  6. Digan W., Contournis H., Barritault M., Baudoin D., Laurent-Puig P., Blons H., Burgun A., Rance B. An architecture for genomics analysis in a clinical setting using Galaxy and Docker. Gigascience. 2017 Oct 18. doi: 10.1093/gigascience/gix099. PMID: 29048555
  7. Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, Jaulent MC, Beyens MN, Burgun A, Bousquet C. Adverse Drug Reaction Identification and Extraction in Social Media: A Scoping Review. J Med Internet Res. 2015 Jul 10;17(7):e171. doi: 10.2196/jmir.4304. Review. PubMed PMID: 26163365.
  8. Abdellaoui R., Schuck S., Texier N., Burgun A. Filtering entities to optimize ADR identification from social media: how can the number of words between entities in the messages help? JMIR Public Health Surveill. 2017 Jun 22;3(2):e36. doi: 10.2196/publichealth.6577. PubMed PMID: 28642212
  9. Abdellaoui R, Foulquie P., Texier N., Faviez C., Burgun A., Schuck S. Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model Approach. J Med Internet Res. 2018 Mar 14;20(3):e85. doi: 10.2196/jmir.9222. PubMed PMID: 29540337.
  10. Arnoux-Guenegou A., Girardeau Y., Chen X., Deldossi M., Aboukhamis R., Faviez C., Dahama B., Karapetiantz P., Guillemin-Lanne S., Lillo-Le louet A., Texier N., Burgun A., Katsahian S. The Adverse Drug Reactions From Patient Reports in Social Media Project: Protocol for an Evaluation Against a Gold Standard. JMIR Res Protoc. 2019 May 7;8(5):e11448. doi: 10.2196/11448. PubMed PMID: 31066711.
  11. Zapletal E., Bibault JE., Giraud P., Burgun A. Integrating multimodal radiation therapy data into i2b2. Appl Clin Inform. 2018 Apr;9(2):377-390. doi: 10.1055/s-0038-1651497. Epub 2018 May 30. PMID: 29847842
  12. Bibault J.E., Zapletal E., Rance B., Giraud P, Burgun A. Labeling for Big Data in radiation oncology: The Radiation Oncology Structures Ontology. PLoS One. 2018 Jan 19;13(1):e0191263. doi: 10.1371/journal.pone.0191263. eCollection 2018.PubMed PMID: 29351341.

Translational Research

  1. Bibault J.E., Giraud P., Durdux C., Taieb J., Berger A., Coriat R., Chaussade S., Dousset B., Nordlinger B., Burgun A. Deep Learning and Radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep. 2018 Aug 22;8(1):12611. doi: 10.1038/s41598-018-30657-6. PMID: 30135549
  2. Mamzer MF, Duchange N, Sylviane D, Marvanne P, Rambaud C, Marsico G, Cerisey C, Scotté F, Burgun A, Badoual C, Laurent-Puig P, Hervé C. Partnering with patients in translational oncology research: ethical approach. J Transl Med. 2017 Apr 8;15(1):74. doi: 10.1186/s12967-017-1177-9. PubMed PMID: 28390420.
  3. Garcelon N, Neuraz A, Benoit V, Salomon R, Kracker S, Suarez F, Bahi-Buisson N, Hadj-Rabia S, Fischer A, Munnich A, Burgun A. Finding patients using similarity measures in a rare diseases-oriented clinical data warehouse: Dr.Warehouse and the needle in the needle stack. J Biomed Inform. 2017 Jul 25. pii: S1532-0464(17)30176-4. doi: 10.1016/j.jbi.2017.07.016. PubMed PMID: 28754522.
  4. Bibault JE, Giraud P, Burgun A. Big Data and machine learning in radiation oncology: state of the art and future prospects Cancer Lett. 2016 Nov 1;382(1):110-117. pii: S0304-3835(16)30346-9. doi: 10.1016/j.canlet.2016.05.033. PMID: 27241666
  5. Pham AD, Névéol A, Lavergne T, Yasunaga D, Clement O, Meyer G, Morello R, Burgun A. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings. BMC Bioinformatics. 2014 Aug 7;15:266. doi: 10.1186/1471-2105-15-266. PubMed PMID: 25099227; PubMed Central PMCID: PMC4133634.
  6. Looten V., Kong Win Chang L., Neuraz A., Landau-Loriot M.A., Vedie B., Paul J.L., Mauge L., Rivet N., Bonifati A., Chatellier G., Burgun A., Rance B. What can millions of laboratory test results tell us about the temporal aspect of data quality? Study of data spanning 17 years in a clinical data warehouse. Comput Methods Programs Biomed. 2018 Dec 29. pii: S0169-2607(18)30708-9. doi: 10.1016/j.cmpb.2018.12.030. [Epub ahead of print] PubMed PMID: 30612785.
  7. Gruson D, Petrelluzzi J, Mehl J, Burgun A, Garcelon N. [Ethical, legal and operational issues of artificial intelligence]. Rev Prat. 2018 Dec;68(10):1145-1148. French. PubMed PMID: 30869229.
  8. Giraud P., Giraud Ph., Gasnier A., El Ayachi R., Kreps S., Foy J.P., Durdux C., Huguet F., Burgun A., Bibault J.E. Radiomics and machine learning for radiotherapy in head and neck cancers. Front Oncol. 2019 Mar 27;9:174. doi: 10.3389/fonc.2019.00174. eCollection 2019. Review. PubMed PMID: 30972291; PubMed Central PMCID: PMC6445892.
  9. Boulet S., Ursino M., Thall P., Landi B., Lepère C., Pernot S., Burgun A., Taieb J., Zaanan A., Zohar S., Jannot AS. Integration of elicited expert information via a power prior in Bayesian variable selection: application to colon cancer data. Stat Methods Med Res. 2019 Apr 9:962280219841082. doi: 10.1177/0962280219841082. PubMed PMID: 30963815.
  10. Lee, SM; Ursino, M; Cheung, YK; Zohar, S. Dose-finding designs for cumulative toxicities using multiple constraints. Biostatistics, 2019 vol. 20(1) pp. 17-29
  11. Neuraz A., Rance B., Garcelon N., Llanos L.C., Burgun A., Rosset S. The impact of specialized corpora for word embeddings in natural language understanding. Stud Health Technol Inform. 2020. Accepted
  12. Neuraz A, Looten V, Rance B, Daniel N, Garcelon N, Llanos LC, Burgun A, Rosset S. Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task? Stud Health Technol Inform. 2019 Aug 21;264:1558-1559. doi: 10.3233/SHTI190533. PubMed PMID: 31438230.
  13. Digan W, Wack M, Looten V, Neuraz A, Burgun A, Rance B. Evaluating the Impact of Text Duplications on a Corpus of More than 600,000 Clinical Narratives in a French Hospital. Stud Health Technol Inform. 2019 Aug 21;264:103-107. doi: 10.3233/SHTI190192. PubMed PMID: 31437894.
  14. Lai M.C., Brun M., Mamzer M.F. Perceptions of Artificial Intelligence in healthcare: findings from a qualitative survey study among actors in France. J Transl Med. 2020 in press
If you are interested by this program and want to candidate to a PhD, post doctoral position, contact the leader
Pr Stéphane Oudard

Contact us

Centre Universitaire des Saints-Pères Etage 4 – Pièce 446B 45 rue des Saints-Pères -75006 Paris

Carina Binet : Secrétaire Général du CARPEM
Tél. : 01 76 53 43 85 –

Aurore Hattabi, PhD : Coordinatrice Scientifique du CARPEM
Tél : 01 76 53 43 85 –