Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have devised a way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions. New training technique would reveal the basis for machine-learning systems’ decisions.
In recent years, the best-performing systems in artificial-intelligence research have come courtesy of neural networks, which look for
patterns in training data that yield useful predictions or classifications. A neural net might, for instance, be trained to recognize certain objects in
digital images or to infer the topics of texts.
But neural nets are black boxes. After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it’s sometimes possible to automate experiments that determine which visual features a neural net is responding to. But text-processing systems tend to be more opaque.
At the Association for Computational Linguistics’ Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.
“In real-world applications, sometimes people really want to know why the model makes the predictions it does,” says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.”
“It’s not only the medical domain,” adds Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and Lei’s thesis advisor. “It’s in any domain where the cost of making the wrong prediction is very high. You need to justify why you did it.”
“There’s a broader aspect to this work, as well,” says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and the third coauthor on the paper. “You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that’s trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model.”
A team, including researchers from MIT’s Computer Science and Artificial Intelligence Laboratory, has created a new set of algorithms that can efficiently fit probability distributions to high-dimensional data. New model-fitting technique is efficient even for data sets with hundreds of variables.
Data analysis — and particularly big-data analysis — is often a matter of fitting data to some sort of mathematical model. The most familiar
example of this might be linear regression, which finds a line that approximates a distribution of data points. But fitting data to probability distributions,
such as the familiar bell curve, is just as common.
If, however, a data set has just a few corrupted entries — say, outlandishly improbable measurements — standard data-fitting techniques can break down. This problem becomes much more acute with high-dimensional data, or data with many variables, which is ubiquitous in the digital age.
Since the early 1960s, it’s been known that there are algorithms for weeding corruptions out of high-dimensional data, but none of the algorithms proposed in the past 50 years are practical when the variable count gets above, say, 12. That’s about to change. Earlier this month, at the IEEE Symposium on Foundations of Computer Science, a team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory, the University of Southern California, and the University of California at San Diego presented a new set of algorithms that can efficiently fit probability distributions to high-dimensional data.
Remarkably, at the same conference, researchers from Georgia Tech presented a very similar algorithm.The pioneering work on “robust statistics,” or statistical methods that can tolerate corrupted data, was done by statisticians, but both new papers come from groups of computer scientists. That probably reflects a shift of attention within the field, toward the computational efficiency of model-fitting techniques.“From the vantage point of theoretical computer science, it’s much more apparent how rare it is for a problem to be efficiently solvable,” says Ankur Moitra, the Rockwell International Career Development Assistant Professor of Mathematics at MIT and one of the leaders of the MIT-USC-UCSD project. “If you start off with some hypothetical thing — ‘Man, I wish I could do this. If I could, it would be robust’ — you’re going to have a bad time, because it will be inefficient. You should start off with the things that you know that you can efficiently do, and figure out how to piece them together to get robustness.”
“The goal of all this is to present the interesting stuff to the data scientists so that they can more quickly address all these new data sets that are coming in,” says Max Kanter MEng ’15. With new algorithms, data scientists could accomplish in days what has traditionally taken months.
Last year, MIT researchers presented a system that automated a crucial step in big-data analysis: the selection of a “feature set,” or aspects
of the data that are useful for making predictions. The researchers entered the system in several data science contests, where it outperformed most of the human
competitors and took only hours instead of months to perform its analyses.
This week, in a pair of papers at the IEEE International Conference on Data Science and Advanced Analytics, the team described an approach to automating most of the rest of the process of big-data analysis — the preparation of the data for analysis and even the specification of problems that the analysis might be able to solve.
The researchers believe that, again, their systems could perform in days tasks that used to take data scientists months.“The goal of all this is to present the interesting stuff to the data scientists so that they can more quickly address all these new data sets that are coming in,” says Max Kanter MEng ’15, who is first author on last year’s paper and one of this year’s papers. “[Data scientists want to know], ‘Why don’t you show me the top 10 things that I can do the best, and then I’ll dig down into those?’ So [these methods are] shrinking the time between getting a data set and actually producing value out of it.”
Both papers focus on time-varying data, which reflects observations made over time, and they assume that the goal of analysis is to produce a probabilistic model that will predict future events on the basis of current observations.