CFP

autods2021: ECMLPKDD Workshop on Automating Data Science 2021

Virtual

Bilbao, Spain, September 17, 2021

Conference website	https://sites.google.com/view/autods
Submission link	https://easychair.org/conferences/?conf=autods2021

Topics: predictive modeling meta learning data integration data wrangling

ECMLPKDD WORKSHOP ON AUTOMATING DATA SCIENCE (ADS2021)

Virtual, 17 (TBC) September 2021

Progress in data science automation can have important implications for a democratisation of data science and related disciplines such as machine learning and statistics. This is especially critical as the diversity of data and techniques in these areas is accelerating, and data scientists are in urgent need for more powerful tools helping them in the data science process. While there has been significant progress in the core stages of the process, exemplified by the success of AutoML, other areas, such as data understanding, data preparation, and deployment still need fundamental research breakthroughs to really make a significant impact on data science automation overall. The workshop will cover all areas of data science automation, but will especially welcome research that focuses on steps before and after modelling, deals with “messy data”, or extends the AutoML paradigm beyond supervised tasks.

The program will consists of invited talks, a panel discussion, contributed talks and spotlights, and a poster session. As the meeting will be virtual, we will encourage interaction via Q&A after talks, in the panel discussion, and in the poster session. The workshop would be the third in a series, following on from Dagstuhl (2018) and ECML-PKDD (2019).

Submission Guidelines

The workshop will welcome submissions in the following formats:

Extended abstracts that report on novel and preliminary ideas. Extended abstracts can be at most 6* pages in LNCS format.
Short position statements on automating data science, at most 6* pages in LNCS format.
Presentations of relevant work that has recently been published or has already been accepted for publication in journals such as DMKD, MLJ, JMLR, AIJ, JAIR, and major conferences such as SIGKDD, NeurIPS, ICML, IJCAI, etc. The submission should in this case only consist of a copy of the paper.

(*) References and optional supplementary material following the references don't count for the number of pages.

The program committee will review all submissions. It will also decide which accepted submissions can be presented orally, as spotlights, and/or as posters. Submissions of types 1. and 2. are intended as non-archival.

We have also had discussions with the editor-in-chief of Machine Learning Journal about a special issue on the topic of Automating Data Science. More information about this soon.

List of Topics

This ECMLPKDD workshop wants to bring together researchers from all areas concerned with data science in order to study whether, to what extent, and how data science can be automated. It will focus on the following Data Science topics:

Automating data wrangling
Data integration via AI techniques (e.g., NLP)
Merging the preparation of data into the statistical learning
Handling missing and anomalous values semi-automatically
Using NLP for generating explanations and reports.
Incorporating domain knowledge into the automation of data science.
Semi-automating visualization
Semi-automated machine learning
Learning with non-normalized data
Impact of data science automation on the work of data scientists

Committees

Program Committee

Oana Balalau (INRIA)
Felix Biessmann (Beuth Hochschule fuer Technik, Berlin),
Pavel Brazdil (U Porto),
Marcos Bueno (Eindhoven University of Technology),
Remco Chang (Tufts),
Jesse Davis (KU Leuven),
Luc De Raedt (KU Leuven),
Cèsar Ferri (Universitat Politècnica de València),
Peter Flach (U Bristol),
Ernesto Jimenez-Ruiz (City University),
Jefrey Lijffijt (Ghent University),
Pierre-Alexandre Mattei (INRIA),
Marine le Morvan (INRIA),
Tomas Petricek (U Kent),
Padhraic Smyth (UC Irvine),
Isabel Valera (Saarland University),
Gerrit van den Burg (Alan Turing Institute).

More TBC

Organizing committee

Tijl De Bie (UGent, Belgium)
Jose Hernandez-Orallo (Universitat Politecnica de Valencia, Spain)
Joaquin Vanschoren (Eindhoven University of Technology)
Gaël Varoquaux (INRIA)
Chris Williams (University of Edinburgh)

Programme

Detailed programme TBD

Panel discussion: Messy data: More wrangling and cleaning, or more flexible modelling techniques?

Some modelling techniques are very powerful but require highly curated data (no missing values, full numerization, scaling, outlier elimination, consistency, data enhancement, etc.) while others are more versatile by dealing with low-quality data but still producing reasonably good models. In some areas, such as NLP, some architectures (e.g., transformers) are able to deal with data that is noisy, non-structured, and still display some good functionality (although limited robustness). In areas of machine learning dealing with images, audio, tabular data or multimodal data, what is the best tradeoff for automation, more data wrangling tools or more flexible models? Does this trade-off depend on the desired quality of the models and the expertise of the data scientists? We will ask panelists and attendees to discuss on the pros and cons of the two suggested approaches (with emphasis on the automation of data wrangling and cleaning, or on the automation of more flexible modelling techniques).

List of panelists: Michael Betancourt (Stan developer), Zachary Lipton (CMU), more TBC.

Keynote speakers

Neil Lawrence (Cambridge)
Madeleine Udell (Cornell)
Other keynote speakers TBC

Dates

Submission deadline: Wed June 23, 2021
Acceptance notification: Fri July 23, 2021
Camera-ready deadline: TBD

Contact

All questions about submissions should be emailed to Marcos Bueno (workflow master).