- 19 Jan 2018
- Course Interview
What's so special about Fundamentals of Machine Learning?
Artemy Kolchinsky and Brendan Tracey are research partners working in scientific applied Machine Learning.
It's an area that's generated a lot of press and excitement, and in advance of releasing their upcoming tutorial, Fundamentals of Machine Learning, we sat them down to get to why they love it, why everyone else loves it and whether or not all this excitement is deserved.
So, why make a tutorial on machine learning in the first place?
AK: This tutorial started as a half day in-person workshop at SFI for a non-scientific audience. It seems like we got a good response; machine learning is in the news so much recently and people really want to know more about it. We hear about these breakthroughs like AlphaGo Zero, and we see machine learning in a lot of consumer technology - in smartphones and internet companies. People have a desire to understand what machine learning is, why it's becoming so prevalent and where it's going, and we wanted to provide something to address that desire.
BT: It's hard to understand what in the popular discourse on machine learning are scientific advances, what’s journalistic hype and what are things that will change the world. So we're hoping to give people a nice base to work from with this tutorial. If you understand a couple of core concepts and know some of the buzzwords it makes understanding and talking about machine learning much easier to approach, both on the level of using using it in problem solving, and just interacting with all the news.
Why is machine learning getting so popular? Is it as useful as the hype makes it out to be?
BT: Machine learning works. Five years ago smartphones didn't have speech recognition, or a search in a maps app might return the wrong thing. But these days, even searching a misspoken or misspelled entry will return the value that you want. If you talk to Siri now it'll really understand what you're saying. The advances have been so great in speech and facial recognition, which were at one point really hard for computers to handle. Seeing this success, people are asking 'what else can we solve with machine learning?’.
AK: Machine learning has made a lot of progress in automating tasks that people have wanted to engineer that we thought would be very difficult for machines, and maybe even almost impossible.
I also think that we tend to see successes in new technology, so we're maybe blinded to areas where machine learning is not having as much success, and in some ways the technology drives what people invest in and build. So now machine learning is really good at certain kinds of image recognition, and there are a lot of things that are being built that utilize that capability. There are other tasks and other domains in technology and science where machine learning isn't doing very well, especially where there's not a lot of data.
What kind of tasks are still hard for machine learning to solve?
AK: Even though machine learning has gotten better with text related tasks like summarizing or translating and even writing captions for photos, it still seems to lack a higher level understanding of longer passages of text. We're still pretty far away from having machine learning that can understand and write longer pieces of text that have an underlying idea that's larger than a couple sentences. It's this deeper level understanding of patterns is something where it has trouble.
Machine learning also has a hard time doing something that kids do: seeing one or two examples of something, and immediately understand some category or reasoning about those things. There's a lot of work being done and not a lot of success in common sense reasoning and generalizing in a conceptual sense, and other things like that.
What was your first major research project using machine learning?
AK: Actually, my very first project in grad school was a machine learning-related project. I was creating a text-mining software for analyzing research in biomedical journals . There are so many articles in the field that it's almost impossible for a single person to scan them all, so one of my first projects was working on a system to detect whether an article was discussing drug-drug interactions. This seems sort of easy, but when you look at all the articles in the field they all of course contain the words ‘drugs' and 'drug studies' and interactions, so you can't text mine for the subject. That was my first project and when I learned about classifiers and so on.
BT: I’m an aerospace engineer, so my focus is less in ‘traditional' machine learning and more in machine learning applied to engineering problems. There's a lot of similarities between the two but there are key differences. Say you want to run a wind tunnel test. It's expensive to get data points. and while Google has petabytes of data for their programs, in a wind tunnel test you get maybe 50 points. I've done a lot of small data machine learning, as opposed to big data machine learning. My first project actually involved the Stacked Monte Carlo methodology I discuss in the tutorial, in order to try to get more out of the data sources that we have. Even if we ran the simulation and we only have some data points, can we use machine learning to extract even more from our data, where using some of the more classic statistical methods might miss something.
It's pretty successful; in our Stacked Monte Carlo research we were able to find much smaller estimation errors, and our answers were more accurate using an ML procedure than a normal averaging process.
Why, as a complex systems scientist, do you like machine learning? Why is it appealing to you?
AK: I think machine learning and especially neural networks are a very central part of complex systems science. Neural networks are prototypical of systems with simple components following simple rules that iterate many times and give rise to emergent properties. In this case the simple properties are the weights and neurons that we talk about, and the rules are the learning algorithms which slightly modify the weights over time.
They're really almost prototypical examples of a distributed, system self organizing into an adaptive pattern. In this way, on a high conceptual level, they're similar to evolution or genetic algorithms or other interesting things form a complex systems perspective. It's not just in the learning, but also in that they're very resilient compared to older types of artificial intelligence architectures to things like damage and cutting parts out. They seem to fail in biological ways.
It's also a scientific field that has blown up a lot, and compared to other fields that I interact with it's by far the fastest developing. It seems like every week or every month there's a new development. It's fun to follow because it moves so fast, and it's fun to keep track of.
BT: Machine learning is an extremely effective tool for the kind of work that I do. I got started on this Stacked Monte Carlo project, which turned out to work, and that built a lot of ML skills. I then transitioned to the turbulence modeling problem, which people have been working on since the forties. There hasn't been much progress since the late 80s, so there's new ideas that were needed. Machine learning seemed like a totally different style than what was tried before.
As I go on, it seems like I'm moving more towards the math and statistics side, and less on the aerospace side. My research has been more about using ML to improve design processes.
There's usable, business-grade ML that consumers can see - you can see facial recognition software all over the internet. What is business catching up on? How big is the lag between business and science?
AK: Well, one of the main reasons why machine learning is developing so fast is because the majority of the research is done in corporate labs by companies with big research facilities, which is unusual. They're obviously interested in the outcome because they use machine learning in their products, and they can turn it into a product that a billion people can use at once, just because they have such large infrastructure.
So, the lag between scientific and industrial machine learning is quite small, especially in dot-com businesses. They're widespread but shallow impacts - we're all using tags on our friends, but machine learning isn't exactly flying an airplane yet. It's a 'limited revolutionizing' of the world. You can imagine that if Google phones get so seamless that translation is immediate, it might have a bigger impact...
There's more of a lag scientifically because the tools themselves are interesting, and they could be used to attack scientific problems, but because they're so black box and complicated and hard to interpret that they're not really well understood. Scientists don't understand how they work. Neural networks are kind of like brains in that way.
I think that beyond dot-com companies there's a lot more skepticism about the hype. Certainly one can promise to revolutionize medicine or macroeconomics, but I don't see that happening yet.
BT: There are a couple different styles of research being done, and at corporate labs in general they're focusing on engineering problems. They have a product that they want to develop. These are more practically focused, like 'what is going on and how can we make it work better?'. So there the lag is actually pretty small, because machine learning has been so effective, and the scientists who work on modifying these algorithms have a good understanding of the problem space and can find new solutions.
There's another strain of research that's mathematical and philosophical in nature, in exploring new styles of machine learning. That's more of what we're currently working on.
And is that what you two working on right now?
BT: We're looking at information bottlenecks. Imagine you're operating a space craft, and you want to know whether or not the atmosphere contains carbon, or what kind of rock is on the surface. You're very data-limited, so you can't really send all the information to a storage unit. We want to take the huge amount of input data into an immediate representation, so what we're trying to do is determine what information is valuable in an information bottleneck so that we can make determinations about what kind of rocks there are on this surface. Information bottleneck is a specific way of measuring how much data you're compressing, and how much data you're using to predict an outcome.
Comment on this article:
You must be logged in to commentComments