Machine learning (ML) models are increasingly being employed to make highly consequential decisions pertaining to employment, bail, parole, and lending. While such models can learn from large amounts of data and are often very scalable, their applicability is limited by certain safety challenges. A key challenge is identifying and correcting systematic patterns of mistakes made by ML models before deploying them in the real world.
The goal of this workshop, held at the 2019 International Conference on Learning Representations (ICLR), is to bring together researchers and practitioners with different perspectives on debugging ML models.
- Aleksander Madry (MIT)
- Cynthia Rudin (Duke University)
- Dan Moldovan (Google – TensorFlow AutoGraph project)
- Deborah Raji (University of Toronto)
- Osbert Bastani (University of Pennsylvania)
- Sameer Singh (University of California, Irvine)
- Suchi Saria (Johns Hopkins University)
See here for a printable version.
|9.50 - 10:00||Opening Remarks|
|10.00 - 10.30||Invited Talk: Aleksander Madry (MIT).|
|10:30 - 10:40||Contributed Talk: Similarity of Neural Network Representations Revisited.
Simon Kornblith, Mohammad Norouzi, Honglak Lee and Geoffrey Hinton (Google).
|10.40 - 10:50||Contributed Talk: Error terrain analysis for machine learning: Tool and visualizations.
Rick Barraza, Russell Eames, Yan Esteve Balducci, Josh Hinds, Scott Hoogerwerf, Eric Horvitz, Ece Kamar, Jacquelyn Krones, Josh Lovejoy, Parham Mohadjer, Ben Noah and Besmira Nushi (Microsoft).
|10.50 - 11.10||Coffee Break|
|11.10 - 11.40||Invited Talk: Osbert Bastani (University of Pennsylvania).|
|10:40 - 10:50||Contributed Talk: Debugging Machine Learning via Model Assertions.
Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia (Stanford).
|10:50 - 12:00||Contributed Talk: Improving jobseeker-employer match models at Indeed through process, visualization, and exploration.
Benjamin Link, Eric Lawrence, & Rosemarie Scott (Indeed).
|12.00 - 12.10||Break|
|11.10 - 11.40||Invited Talk: Sameer Singh (University of California Irvine).|
|12.40 - 1.00||Invited Talk: Deborah Raji (University of Toronto).|
|1.00 - 1:10||Contributed Talk: NeuralVerification.jl: Algorithms for Verifying Deep Neural Networks.
Changliu Liu (CMU), Tomer Arnon, Christopher Lazarus and Mykel Kochenderfer (Stanford).
|1.10 - 2.20||Lunch|
|2.30 - 3.20||Break|
|3.20 - 3.30||Welcome back remarks|
|3.30 - 4.00||Invited Talk: Suchi Saria (Johns Hopkins University).|
|4.00 - 4.20||Invited Talk: Dan Moldovan (Google).|
|4.20 - 4.20||Posters & Demos & Coffee Break|
|5.20 - 5.30||Contributed Talk: The Scientific Method in the Science of Machine Learning.
Jessica Zosa Forde (Project Jupyter), Michela Paganini (Facebook).
|5.30 - 6.00||Invited Talk: Cynthia Rudin (Duke University).|
|6.00 - 6.25||Q&A/Panel with all invited speakers: “The Future of ML Debugging.”
Moderator: Rich Caruana (Microsoft Research).
Panelists: Aleksander Madry, Cynthia Rudin, Dan Moldovan, Deborah Raji, Osbert Bastani, Sameer Singh, Suchi Saria
|6.25 - 6.30||Closing Remarks.|
Contributed Posters (Research Track)
Call for submissions (deadline has passed)
- Discovery of Intersectional Bias in Machine Learning Using Automatic Subgroup Generation. Angel Cabrera, Minsuk Kahng, Fred Hohman, Jamie Morgenstern and Duen Horng Chau
- Calibration of Encoder Decoder Models for Neural Machine Translation. Aviral Kumar and Sunita Sarawagi.
- Step-wise Sensitivity Analysis: Identifying Partially Distributed Representations for Interpretable Deep Learning. Botty Dimanov and Mateja Jamnik
- Handling Bias in AI Using Simulation. Daniel McDuff, Roger Cheng and Ashish Kapoor
- Inverting Layers of a Large Generator. David Bau, Jun-Yan Zhu, William Peebles, Hendrik Strobelt, Jonas Wulff, Bolei Zhou and Antonio Torralba
- MAST: A Tool for Visualizing CNN Model Architecture Searches. Dylan Cashman, Adam Perer and Hendrik Strobelt.
- Visualizations of Decision Regions in the Presence of Adversarial Examples. Grzegorz Swirszcz, Brendan O’Donoghue and Pushmeet Kohli.
- BertViz: A Tool for Visualizing Multi-Head Self-Attention in the BERT Model. Jesse Vig.
- Where To Be Adversarial Perturbations Added? Investigating and Manipulating Pixel Robustness Using Input Gradients. Jisung Hwang, Younghoon Kim, Sanghyuk Chun, Jaejun Yoo, Ji-Hoon Kim and Dongyoon Han.
- Dissecting Pruned Neural Networks. Jonathan Frankle and David Bau.
- Monitoring Opaque Learning Systems. Leilani Gilpin.
- Detecting Deep Neural Network Data Corruption With Interpretability Methods. Maithra Raghu, Samy Bengio and Chris Olah.
- A Gray Box Interpretable Visual Debugging Approach for Deep Sequence Learning Model. Md. Mofijul Islam, Amar Debnath, Tahsin Al Sayeed, Jyotirmay Nag Setu, Md Sadman Sakib, Md Abdur Razzaque, Md. Mosaddek Khan, Swakkhar Shatabda, Anik Islam and Md Mahmudur Rahman.
- Model Agnostic Globally Interpretable Explanations. Piyush Gupta, Nikaash Puri, Sukriti Verma, Pratiksha Agarwal and Balaji Krishnamurthy.
- Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded. Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Dhruv Batra and Devi Parikh.
- Debugging Trained Machine Learning Models Using Flip Points. Roozbeh Yousefzadeh and Dianne O’Leary.
- Universal Multi-Party Poisoning Attacks. Saeed Mahloujifar, Ameer Mohammed and Mohammad Mahmoody.
- Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. Stephan Rabanser, Stephan Guennemann and Zachary Lipton.
- Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness. Xiao Zhang, Saeed Mahloujifar, Mohammad Mahmoody and David Evans.
- Similarity of Neural Network Representations Revisited. Simon Kornblith*, Mohammad Norouzi, Honglak Lee and Geoffrey Hinton (Contributed talk).
- NeuralVerification.jl: Algorithms for Verifying Deep Neural Networks. Changliu Liu, Tomer Arnon, Christopher Lazarus and Mykel Kochenderfer (Contributed talk).
- Debugging Machine Learning via Model Assertions. Daniel Kang*, Deepti Raghavan, Peter Bailis and Matei Zaharia (Contributed talk).
- The Scientific Method in the Science of Machine Learning. Jessica Zosa Forde and Michela Paganini* (Contributed talk).
Contributed Demos (Debugging-in-Practice Track)
Call for submissions (deadline has passed)
- Who learns? A microscope into neural network training by measuring per-parameter learning. Janice Lan, Rosanne Liu, Hattie Zhou and Jason Yosinski.
- Operationalising Risk Management for Machine Learning. Daniel First (Quantum Black).
- TensorWatch: A Multifaceted System for the Deep Learning Debugging and Visualization. Shital Shah, Roland Fernandez and Steven Drucker.
- Building Models for Mobile Video Understanding. Franck Ngamkan and Geneviève Patterson.
- Debugging Large Scale Deep Recommender Systems using uncertainty estimations and attention. Inbar Naor, Ofer Alper, Dan Friedman and Gil Chamiel
- Adversarial Examples for Electrocardiograms. Xintian Han, Yuxuan Hu, Luca Foschini, Lior Jankelson and Rajesh Ranganath.
- Black Box Attacks with Shadow Transformers. Vedant Misra
- Debuggable Machine Learning with ConX and Comet.ml. Cecelia Shao and Douglas Blank.
- Evidence Based Debugging with DRL-Monitor. Giang Dao and Minwoo Lee.
- MODHILL: A framework for debugging gait in multi-factor authentication systems. Vinay Prabhu, John Whaley and Mihail D.
- Error terrain analysis for machine learning: Tool and visualizations. Rick Barraza, Russell Eames, Yan Esteve Balducci, Josh Hinds, Scott Hoogerwerf, Eric Horvitz, Ece Kamar, Jacquelyn Krones, Josh Lovejoy, Parham Mohadjer, Ben Noah and Besmira Nushi* (Contributed talk).
- Improving jobseeker-employer match models at Indeed through process, visualization, and exploration. Benjamin Link*, Eric Lawrence and Rosemarie Scott (Contributed talk).
Debugging via interpretability: How can interpretable models and techniques aid us in effectively debugging ML models?
Program verification as a tool for model debugging: Are existing program verification frameworks readily applicable to ML models? If not, what are the gaps that exist and how do we bridge them?
Visualization tools for debugging ML models: What kind of visualization techniques would be most effective in exposing vulnerabilities of ML models?
Human-in-the-loop techniques for model debugging: What are some of the effective strategies for using human input and expertise for debugging ML models?
Novel adversarial attacks for highlighting errors in model behavior: How do we design adversarial attacks that highlight vulnerabilities in the functionality of ML models?
Theoretical correctness of model debugging techniques: How do we provide guarantees on the correctness of proposed debugging approaches? Can we take cues from statistical considerations such as multiple testing and uncertainty to ensure that debugging methodologies and tools actually detect ‘true’ errors?
Theoretical guarantees on the robustness of ML models: Given a ML model or system, how do we bound the probability of its failures?
Insights into errors or biases of real-world ML systems: What can we learn from the failures of widely deployed ML systems? What can we say about debugging for different types of biases, including discrimination?
Best practices for debugging large-scale ML systems: What are standardized best practices for debugging large-scale ML systems? What are existing tools, software, and hardware, and how might they be improved?
Domain-specific nuances of debugging ML models in healthcare, criminal justice, public policy, education, and other social good applications.
See a list of references.
- Himabindu Lakkaraju (Harvard University)
- Sarah Tan (Cornell University and UCSF)
- Julius Adebayo (MIT)
- Jacob Steinhardt (Open Philanthropy Project and OpenAI)
- D. Sculley (Google)
- Rich Caruana (Microsoft Research)
Email email@example.com any questions.
|Samira Abnar (University of Amsterdam)||Lezhi Li (Uber)|
|David Alvarez Melis (MIT)||Anqi Liu (Caltech)|
|Forough Arabshahi (Carnegie Mellon University)||Yin Lou (Ant Financial)|
|Kamyar Azzizzadenesheli (UC Irvine)||David Madras (University of Toronto / Vector Institute)|
|Gagan Bansal (University of Washington)||Sara Magliacane (IBM Research)|
|Osbert Bastani (University of Pennsylvania)||Momin Malik (Berkman Klein Center)|
|Joost Bastings (University of Amsterdam)||Matthew Mcdermott (MIT)|
|Andrew Beam (Harvard University)||Smitha Milli (UC Berkeley)|
|Kush Bhatia (UC Berkeley)||Shira Mitchell ()|
|Umang Bhatt (Carnegie Mellon University)||Tristan Naumann (Microsoft Research)|
|Cristian Canton (Facebook)||Besmira Nushi (Microsoft Research)|
|Arthur Choi (UCLA)||Saswat Padhi (UCLA)|
|Grzegorz Chrupala (Tilburg University)||Emma Pierson (Stanford University)|
|Sam Corbett-Davies (Facebook)||Forough Poursabzi-Sangdeh (Microsoft Research)|
|Amit Dhurandhar (IBM Research)||Manish Raghavan (Cornell University)|
|Samuel Finlayson (Harvard Medical School, MIT)||Ramya Ramakrishnan (MIT)|
|Tian Gao (IBM Research)||Alexander Ratner (Stanford University)|
|Efstathios Gennatas (UCSF)||Andrew Ross (Harvard University)|
|Siongthye Goh (Singapore Management University)||Shibani Santurkar (MIT)|
|Albert Gordo (Facebook)||Prasanna Sattigeri (IBM Research)|
|Ben Green (Harvard University)||Peter Schulam (Johns Hopkins University)|
|Jayesh Gupta (Stanford University)||Ravi Shroff (NYU)|
|Satoshi Hara (Osaka University)||Camelia Simoiu (Stanford University)|
|Tatsunori Hashimoto (MIT)||Sameer Singh (UC Irvine)|
|He He (NYU)||Alison Smith-Renner (University of Maryland)|
|Fred Hohman (Georgia Institute of Technology)||Jina Suh (Microsoft Research)|
|Lily Hu (Harvard University)||Adith Swaminathan (Microsoft Research)|
|Xiaowei Huang (University of Liverpool)||Michael Tsang (University of Southern California)|
|Yannet Interian (University of San Francisco)||Dimitris Tsipras (MIT)|
|Saumya Jetley (University of Oxford)||Berk Ustun (Harvard University)|
|Shalmali Joshi (Vector Institute)||Gilmer Valdes (UCSF)|
|Yannis Kalantidis (Facebook)||Paroma Varma (Stanford University)|
|Ece Kamar (Microsoft Research)||Kush Varshney (IBM Research)|
|Madian Khabsa (Facebook)||Fulton Wang (Sandia National Labs)|
|Heidy Khlaaf (Adelard)||Yang Wang (Uber)|
|Pang Wei Koh (Stanford University)||Fanny Yang (ETH Zurich)|
|Josua Krause (Accern)||Jason Yosinski (Uber)|
|Ram Kumar (Microsoft / Berkman Klein Center)||Muhammad Bilal Zafar (Bosch Center for Artificial Intelligence)|
|Isaac Lage (Harvard University)||Xuezhou Zhang (University of Wisconsin-Madison)|
|Finnian Lattimore (Australian National University)||Xin Zhang (MIT)|
|Marco Tulio Ribeiro (Microsoft Research)|