Machine learning (ML) models are increasingly being employed to make highly consequential decisions pertaining to employment, bail, parole, and lending. While such models can learn from large amounts of data and are often very scalable, their applicability is limited by certain safety challenges. A key challenge is identifying and correcting systematic patterns of mistakes made by ML models before deploying them in the real world.
The goal of this workshop, held at the 2019 International Conference on Learning Representations (ICLR), is to bring together researchers and practitioners with different perspectives on debugging ML models.
University of Toronto
University of Pennsylvania
Johns Hopkins University
See here for a printable version.
|9.50||Opening remarks [video]|
|Session Chair: Julius Adebayo (MIT)|
|10.00||Invited talk – Aleksander Madry (MIT): A New Perspective on Adversarial Perturbations [video]|
|10:30||Contributed talk (Best Research Paper Award) – Simon Kornblith (Google): Similarity of Neural Network Representations Revisited [video]|
|10.40||Contributed talk (Best Demo Award) – Besmira Nushi (Microsoft Research): Error terrain analysis for machine learning: Tool and visualizations [video]|
|Session Chair: Julius Adebayo (MIT)|
|11.10||Invited talk – Osbert Bastani (University of Pennsylvania): Verifiable Reinforcement Learning via Policy Extraction [video]|
|11:40||Contributed talk (Best Student Research Paper Award) – Daniel Kang (Stanford): Debugging Machine Learning via Model Assertions [video]|
|11:50||Contributed talk – Benjamin Link (Indeed): Improving Jobseeker-Employer Match Models at Indeed Through Process, Visualization, and Exploration [video]|
|Session Chair: Sarah Tan (Cornell University / UCSF)|
|12.10||Invited talk – Sameer Singh (UC Irvine): Discovering Natural Bugs Using Adversarial Data Perturbations [video]|
|12.40||Invited talk – Deborah Raji (University of Toronto): “Debugging” Discriminatory ML Systems [video]|
|1.00||Contributed talk (Best Applied Paper Award) – Tomer Arnon and Christopher Lazarus: NeuralVerification.jl: Algorithms for Verifying Deep Neural Networks [video]|
|Session Chair: D Sculley (Google)|
|3.20||Welcome back remarks|
|3.30||Invited talk – Suchi Saria (Johns Hopkins University): Safe and Reliable Machine Learning: Preventing and Identifying Failures [video]|
|4.00||Invited talk – Dan Moldovan (Google): Better Code for Less Debugging with AutoGraph [video]|
|4.20||Posters & Demos & Coffee break
Accepted posters Accepted demos
|Session Chair: Himabindu Lakkaraju (Harvard University)|
|5.20||Contributed position paper – Michela Paganini (Facebook): The Scientific Method in the Science of Machine Learning [video]|
|5.30||Invited opinion piece – Cynthia Rudin (Duke University): Don’t debug your black box, replace it [video]|
|6.00||Q&A and panel with all invited speakers – “The Future of ML Debugging” [video]
Moderator: Himabindu Lakkaraju (Harvard University)
Panelists: Aleksander Madry, Cynthia Rudin, Dan Moldovan, Deborah Raji, Osbert Bastani, Sameer Singh, Suchi Saria
Call for submissions (deadline has passed)
- Discovery of Intersectional Bias in Machine Learning Using Automatic Subgroup Generation. Angel Cabrera, Minsuk Kahng, Fred Hohman, Jamie Morgenstern and Duen Horng Chau
- Calibration of Encoder Decoder Models for Neural Machine Translation. Aviral Kumar and Sunita Sarawagi.
- Step-wise Sensitivity Analysis: Identifying Partially Distributed Representations for Interpretable Deep Learning. Botty Dimanov and Mateja Jamnik
- Handling Bias in AI Using Simulation. Daniel McDuff, Roger Cheng and Ashish Kapoor
- Inverting Layers of a Large Generator. David Bau, Jun-Yan Zhu, William Peebles, Hendrik Strobelt, Jonas Wulff, Bolei Zhou and Antonio Torralba
- MAST: A Tool for Visualizing CNN Model Architecture Searches. Dylan Cashman, Adam Perer and Hendrik Strobelt.
- Visualizations of Decision Regions in the Presence of Adversarial Examples. Grzegorz Swirszcz, Brendan O’Donoghue and Pushmeet Kohli.
- BertViz: A Tool for Visualizing Multi-Head Self-Attention in the BERT Model. Jesse Vig.
- Where To Be Adversarial Perturbations Added? Investigating and Manipulating Pixel Robustness Using Input Gradients. Jisung Hwang, Younghoon Kim, Sanghyuk Chun, Jaejun Yoo, Ji-Hoon Kim and Dongyoon Han.
- Dissecting Pruned Neural Networks. Jonathan Frankle and David Bau.
- Monitoring Opaque Learning Systems. Leilani Gilpin.
- Model Agnostic Globally Interpretable Explanations. Piyush Gupta, Nikaash Puri, Sukriti Verma, Pratiksha Agarwal and Balaji Krishnamurthy.
- Debugging Trained Machine Learning Models Using Flip Points. Roozbeh Yousefzadeh and Dianne O’Leary.
- Universal Multi-Party Poisoning Attacks. Saeed Mahloujifar, Ameer Mohammed and Mohammad Mahmoody.
- Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. Stephan Rabanser, Stephan Guennemann and Zachary Lipton.
- Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness. Xiao Zhang, Saeed Mahloujifar, Mohammad Mahmoody and David Evans.
- Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded. Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Dhruv Batra and Devi Parikh.
- Similarity of Neural Network Representations Revisited. Simon Kornblith, Mohammad Norouzi, Honglak Lee and Geoffrey Hinton (Contributed talk).
- NeuralVerification.jl: Algorithms for Verifying Deep Neural Networks. Changliu Liu, Tomer Arnon, Christopher Lazarus and Mykel Kochenderfer (Contributed talk).
- Debugging Machine Learning via Model Assertions. Daniel Kang, Deepti Raghavan, Peter Bailis and Matei Zaharia (Contributed talk).
- The Scientific Method in the Science of Machine Learning. Jessica Zosa Forde and Michela Paganini (Contributed talk).
Call for submissions (deadline has passed)
- Operationalising Risk Management for Machine Learning: Building a Protocol-Driven System for Performance, Explainability, and Fairness. Imran Ahmed, Giles L. Colclough, Daniel First and QuantumBlack contributors.
- Building Models for Mobile Video Understanding. Franck Ngamkan and Geneviève Patterson.
- Debugging Large Scale Deep Recommender Systems using uncertainty estimations and attention. Inbar Naor, Ofer Alper, Dan Friedman and Gil Chamiel
- Adversarial Examples for Electrocardiograms. Xintian Han, Yuxuan Hu, Luca Foschini, Lior Jankelson and Rajesh Ranganath.
- Debuggable Machine Learning with ConX and Comet.ml. Cecelia Shao and Douglas Blank.
- Evidence Based Debugging with DRL-Monitor. Giang Dao and Minwoo Lee.
- Black Box Attacks on Transformer Language Models. Vedant Misra
- Improving Jobseeker-Employer Match Models at Indeed Through Process, Visualization, and Exploration. Benjamin Link, Eric Lawrence, Rosemarie Scott, Aaron Pigeon and Jon Witte (Contributed talk).
- MODHILL: A framework for debugging gait in multi-factor authentication systems. Vinay Prabhu, John Whaley and Mihail D.
- Who learns? A microscope into neural network training by measuring per-parameter learning. Janice Lan, Rosanne Liu, Hattie Zhou and Jason Yosinski.
- TensorWatch: A Multifaceted System for the Deep Learning Debugging and Visualization. Shital Shah, Roland Fernandez and Steven Drucker.
- Error terrain analysis for machine learning: Tool and visualizations. Rick Barraza, Russell Eames, Yan Esteve Balducci, Josh Hinds, Scott Hoogerwerf, Eric Horvitz, Ece Kamar, Jacquelyn Krones, Josh Lovejoy, Parham Mohadjer, Ben Noah and Besmira Nushi (Contributed talk).
Debugging via interpretability: How can interpretable models and techniques aid us in effectively debugging ML models?
Program verification as a tool for model debugging: Are existing program verification frameworks readily applicable to ML models? If not, what are the gaps that exist and how do we bridge them?
Visualization tools for debugging ML models: What kind of visualization techniques would be most effective in exposing vulnerabilities of ML models?
Human-in-the-loop techniques for model debugging: What are some of the effective strategies for using human input and expertise for debugging ML models?
Novel adversarial attacks for highlighting errors in model behavior: How do we design adversarial attacks that highlight vulnerabilities in the functionality of ML models?
Theoretical correctness of model debugging techniques: How do we provide guarantees on the correctness of proposed debugging approaches? Can we take cues from statistical considerations such as multiple testing and uncertainty to ensure that debugging methodologies and tools actually detect ‘true’ errors?
Theoretical guarantees on the robustness of ML models: Given a ML model or system, how do we bound the probability of its failures?
Insights into errors or biases of real-world ML systems: What can we learn from the failures of widely deployed ML systems? What can we say about debugging for different types of biases, including discrimination?
Best practices for debugging large-scale ML systems: What are standardized best practices for debugging large-scale ML systems? What are existing tools, software, and hardware, and how might they be improved?
Domain-specific nuances of debugging ML models in healthcare, criminal justice, public policy, education, and other social good applications.
See a list of references.
Cornell University / UCSF
Open Philanthropy Project / OpenAI
Email firstname.lastname@example.org any questions.
|Samira Abnar (University of Amsterdam)||Lezhi Li (Uber)|
|David Alvarez Melis (MIT)||Anqi Liu (Caltech)|
|Forough Arabshahi (Carnegie Mellon University)||Yin Lou (Ant Financial)|
|Kamyar Azzizzadenesheli (UC Irvine)||David Madras (University of Toronto / Vector Institute)|
|Gagan Bansal (University of Washington)||Sara Magliacane (IBM Research)|
|Osbert Bastani (University of Pennsylvania)||Momin Malik (Berkman Klein Center)|
|Joost Bastings (University of Amsterdam)||Matthew Mcdermott (MIT)|
|Andrew Beam (Harvard University)||Smitha Milli (UC Berkeley)|
|Kush Bhatia (UC Berkeley)||Shira Mitchell ()|
|Umang Bhatt (Carnegie Mellon University)||Tristan Naumann (Microsoft Research)|
|Cristian Canton (Facebook)||Besmira Nushi (Microsoft Research)|
|Arthur Choi (UCLA)||Saswat Padhi (UCLA)|
|Grzegorz Chrupala (Tilburg University)||Emma Pierson (Stanford University)|
|Sam Corbett-Davies (Facebook)||Forough Poursabzi-Sangdeh (Microsoft Research)|
|Amit Dhurandhar (IBM Research)||Manish Raghavan (Cornell University)|
|Samuel Finlayson (Harvard Medical School, MIT)||Ramya Ramakrishnan (MIT)|
|Tian Gao (IBM Research)||Alexander Ratner (Stanford University)|
|Efstathios Gennatas (UCSF)||Andrew Ross (Harvard University)|
|Siongthye Goh (Singapore Management University)||Shibani Santurkar (MIT)|
|Albert Gordo (Facebook)||Prasanna Sattigeri (IBM Research)|
|Ben Green (Harvard University)||Peter Schulam (Johns Hopkins University)|
|Jayesh Gupta (Stanford University)||Ravi Shroff (NYU)|
|Satoshi Hara (Osaka University)||Camelia Simoiu (Stanford University)|
|Tatsunori Hashimoto (MIT)||Sameer Singh (UC Irvine)|
|He He (NYU)||Alison Smith-Renner (University of Maryland)|
|Fred Hohman (Georgia Institute of Technology)||Jina Suh (Microsoft Research)|
|Lily Hu (Harvard University)||Adith Swaminathan (Microsoft Research)|
|Xiaowei Huang (University of Liverpool)||Michael Tsang (University of Southern California)|
|Yannet Interian (University of San Francisco)||Dimitris Tsipras (MIT)|
|Saumya Jetley (University of Oxford)||Berk Ustun (Harvard University)|
|Shalmali Joshi (Vector Institute)||Gilmer Valdes (UCSF)|
|Yannis Kalantidis (Facebook)||Paroma Varma (Stanford University)|
|Ece Kamar (Microsoft Research)||Kush Varshney (IBM Research)|
|Madian Khabsa (Facebook)||Fulton Wang (Sandia National Labs)|
|Heidy Khlaaf (Adelard)||Yang Wang (Uber)|
|Pang Wei Koh (Stanford University)||Fanny Yang (ETH Zurich)|
|Josua Krause (Accern)||Jason Yosinski (Uber)|
|Ram Kumar (Microsoft / Berkman Klein Center)||Muhammad Bilal Zafar (Bosch Center for Artificial Intelligence)|
|Isaac Lage (Harvard University)||Xuezhou Zhang (University of Wisconsin-Madison)|
|Finnian Lattimore (Australian National University)||Xin Zhang (MIT)|
|Marco Tulio Ribeiro (Microsoft Research)|