autods2021: ECMLPKDD Workshop on Automating Data Science 2021 Virtual Bilbao, Spain, September 17, 2021 |
Conference website | https://sites.google.com/view/autods |
Submission link | https://easychair.org/conferences/?conf=autods2021 |
ECMLPKDD WORKSHOP ON AUTOMATING DATA SCIENCE (ADS2021)
Virtual, 17 (TBC) September 2021
Progress in data science automation can have important implications for a democratisation of data science and related disciplines such as machine learning and statistics. This is especially critical as the diversity of data and techniques in these areas is accelerating, and data scientists are in urgent need for more powerful tools helping them in the data science process. While there has been significant progress in the core stages of the process, exemplified by the success of AutoML, other areas, such as data understanding, data preparation, and deployment still need fundamental research breakthroughs to really make a significant impact on data science automation overall. The workshop will cover all areas of data science automation, but will especially welcome research that focuses on steps before and after modelling, deals with “messy data”, or extends the AutoML paradigm beyond supervised tasks.
The program will consists of invited talks, a panel discussion, contributed talks and spotlights, and a poster session. As the meeting will be virtual, we will encourage interaction via Q&A after talks, in the panel discussion, and in the poster session. The workshop would be the third in a series, following on from Dagstuhl (2018) and ECML-PKDD (2019).
Submission Guidelines
The workshop will welcome submissions in the following formats:
-
Extended abstracts that report on novel and preliminary ideas. Extended abstracts can be at most 6* pages in LNCS format.
-
Short position statements on automating data science, at most 6* pages in LNCS format.
-
Presentations of relevant work that has recently been published or has already been accepted for publication in journals such as DMKD, MLJ, JMLR, AIJ, JAIR, and major conferences such as SIGKDD, NeurIPS, ICML, IJCAI, etc. The submission should in this case only consist of a copy of the paper.
(*) References and optional supplementary material following the references don't count for the number of pages.
The program committee will review all submissions. It will also decide which accepted submissions can be presented orally, as spotlights, and/or as posters. Submissions of types 1. and 2. are intended as non-archival.
We have also had discussions with the editor-in-chief of Machine Learning Journal about a special issue on the topic of Automating Data Science. More information about this soon.
List of Topics
This ECMLPKDD workshop wants to bring together researchers from all areas concerned with data science in order to study whether, to what extent, and how data science can be automated. It will focus on the following Data Science topics:
-
Automating data wrangling
-
Data integration via AI techniques (e.g., NLP)
-
Merging the preparation of data into the statistical learning
-
Handling missing and anomalous values semi-automatically
-
Using NLP for generating explanations and reports.
-
Incorporating domain knowledge into the automation of data science.
-
Semi-automating visualization
-
Semi-automated machine learning
-
Learning with non-normalized data
-
Impact of data science automation on the work of data scientists
Committees
Program Committee
-
Oana Balalau (INRIA)
-
Felix Biessmann (Beuth Hochschule fuer Technik, Berlin),
-
Pavel Brazdil (U Porto),
-
Marcos Bueno (Eindhoven University of Technology),
-
Remco Chang (Tufts),
-
Jesse Davis (KU Leuven),
-
Luc De Raedt (KU Leuven),
-
Cèsar Ferri (Universitat Politècnica de València),
-
Peter Flach (U Bristol),
-
Ernesto Jimenez-Ruiz (City University),
-
Jefrey Lijffijt (Ghent University),
-
Pierre-Alexandre Mattei (INRIA),
-
Marine le Morvan (INRIA),
-
Tomas Petricek (U Kent),
-
Padhraic Smyth (UC Irvine),
-
Isabel Valera (Saarland University),
-
Gerrit van den Burg (Alan Turing Institute).
More TBC
Organizing committee
-
Tijl De Bie (UGent, Belgium)
-
Jose Hernandez-Orallo (Universitat Politecnica de Valencia, Spain)
-
Joaquin Vanschoren (Eindhoven University of Technology)
-
Gaël Varoquaux (INRIA)
-
Chris Williams (University of Edinburgh)
Programme
Detailed programme TBD
Panel discussion: Messy data: More wrangling and cleaning, or more flexible modelling techniques?
Some modelling techniques are very powerful but require highly curated data (no missing values, full numerization, scaling, outlier elimination, consistency, data enhancement, etc.) while others are more versatile by dealing with low-quality data but still producing reasonably good models. In some areas, such as NLP, some architectures (e.g., transformers) are able to deal with data that is noisy, non-structured, and still display some good functionality (although limited robustness). In areas of machine learning dealing with images, audio, tabular data or multimodal data, what is the best tradeoff for automation, more data wrangling tools or more flexible models? Does this trade-off depend on the desired quality of the models and the expertise of the data scientists? We will ask panelists and attendees to discuss on the pros and cons of the two suggested approaches (with emphasis on the automation of data wrangling and cleaning, or on the automation of more flexible modelling techniques).
List of panelists: Michael Betancourt (Stan developer), Zachary Lipton (CMU), more TBC.
Keynote speakers
- Neil Lawrence (Cambridge)
-
Madeleine Udell (Cornell)
-
Other keynote speakers TBC
Dates
-
Submission deadline: Wed June 23, 2021
-
Acceptance notification: Fri July 23, 2021
-
Camera-ready deadline: TBD
Contact
All questions about submissions should be emailed to Marcos Bueno (workflow master).