CFP

SafeML2019: Safe Machine Learning: Specification, Robustness, and Assurance

ICLR 2019, Room R6, Ernest N. Morial Convention Centre.

New Orleans, LA, United States, May 6, 2019

Conference website	https://sites.google.com/view/safeml-iclr2019
Submission link	https://easychair.org/conferences/?conf=safeml2019
Submission deadline	February 22, 2019

Topics: machine learning fairness alignment privacy

The ultimate goal of ML research should be to have a positive impact on society and the world. As the number of applications of ML increases, it becomes more important to address a variety of safety issues; both those that already arise with today's ML systems and those that may be exacerbated in the future with more advanced systems.

Current ML algorithms tend to be brittle and opaque, reflect undesired bias in the data and often optimize for objectives that are misaligned with human preferences. We can expect many of these issues to get worse as our systems become more advanced (e.g. finding more clever ways to optimize for a misspecified objective). This workshop aims to bring together researchers in diverse areas such as reinforcement learning, formal verification, value alignment, fairness, and security to further the field of safety in machine learning.

We will focus on three broad categories of ML safety problems: specification, robustness and assurance. Specification is defining the purpose of the system. Robustness is designing the system to withstand perturbations. Assurance is monitoring, understanding and controlling system activity before and during its operation.

Topics

We encourage all researchers to submit work that falls into one or more of the areas of the workshop: specification, robustness and/or assurance. Some example research topics within each area are:

Specification
- Reward Hacking: Systems may behave in ways unintended by the designers, because of discrepancies between the specified reward and the true intended reward. How can we design systems that don’t exploit these misspecifications, or figure out where they are? (Over 40 examples of specification gaming by AI systems can be found here: http://tinyurl.com/specification-gaming .)
- Side effects: How can we give artificial agents an incentive to avoid unnecessary disruptions to their environment while pursuing the given objective? Can we do this in a way that generalizes across environments and tasks and does not introduce bad incentives for the agent in the process?
- Fairness: ML is increasingly used in core societal domains such as health care, hiring, lending, and criminal risk assessment. How can we make sure that historical prejudices, cultural stereotypes, and existing demographic inequalities contained in the data, as well as sampling bias and collection issues, are not reflected in the systems?
Robustness
- Adaptation: How can machine learning systems detect and adapt to changes in their environment (e.g. low overlap between train and test distributions, poor initial model assumptions, or shifts in the underlying prediction function)? How should an autonomous agent act when confronting radically new contexts, or identify that the context is new in the first place?
- Verification: How can we scalably verify meaningful properties of ML systems? What role can and should verification play in ensuring robustness of ML systems?
- Worst-case robustness: How can we train systems which never perform extremely poorly, even in the worst case? Given a trained system, can we ensure it never fails catastrophically, or bound this probability?
- Safe exploration: Can we design reinforcement learning algorithms which never fail catastrophically, even at training time?
Assurance
- Interpretability: How can we robustly determine whether a system is working as intended (i.e. is well specified and robust) before large-scale deployment, even when we do not have a formal specification of what it should do?
- Monitoring: How can we monitor large-scale systems to identify whether they are performing well? What tools can help diagnose and fix the found issues?
- Privacy: How can we ensure that the trained systems do not reveal sensitive information about individuals contained in the training set?
- Interruptibility: An artificial agent may learn to avoid interruptions by the human supervisor if such interruptions lead to receiving less reward. How can we ensure the system behaves safely even under the possibility of shutdown?

Submission Guidelines

Submissions should be extended abstracts of up to 4 pages in PDF format and in ICLR format (use the relevant LaTeX style files). The references can take as many pages as necessary and do not count towards the 4 page limit. Submissions may be longer than 4 pages or include supplementary material, but reviewers aren't required to read past 4 pages.

The reviewing process is not double blind, so the submissions should contain author information and not be anonymised. If the authors' work has already been published in a journal, conference or workshop, their submission should meaningfully extend their previous work. However, parallel submission (to another conference or workshop) is allowed.

If your paper is accepted, you will be invited to present a poster at the workshop. Some of the accepted contributions will also be invited to give a talk. Accepted submissions will be shown on the workshop website, but there will be no formal published proceedings.

Posters should be in portrait orientation and size A0 (84 x 119 cm, or 33 x 46 inches).

Important dates

Submission deadline (extended): Friday February 22nd, midnight Anywhere on Earth (AoE).
Acceptance notification: Friday March 22nd.
Worskhop: Monday May 6th, 09:00–18:00 in Room R6, New Orleans time.

Invited speakers and panelists

Ian Goodfellow (Google Brain, adversarial examples and alignment)
Dylan Hadfield-Menell (UC Berkeley, reward design)
Catherine Olsson (Google Brain, security in Machine Learning)

Organizing committee

Silvia Chiappa (DeepMind)
Victoria Krakovna (DeepMind)
Adrià Garriga-Alonso (University of Cambridge)
Andrew Trask (University of Oxford)
Jonathan Uesato (DeepMind)
Christina Heinze-Deml (ETH Zürich)
Ray Jiang (DeepMind)
Adrian Weller (University of Cambridge, Alan Turing Institute)

Contact

All questions about submissions should be emailed to safe.ml.iclr2019@gmail.com