Lucas Liebenwein wants a scalpel

The machine learning scientist thinks a lot about AI tooling—and failure as a guide to success

January 3, 2024

At MIT CSAIL, Lucas Liebenwein earned his PhD researching how to make deep learning algorithms more efficient. Now, as a machine learning manager at Nvidia, he is helping build a platform to deploy more efficient machine learning models in production. With a background in robotics and autonomous driving, his focus is making neural networks faster and more accurate through software alone: doing more with the same amount of resources.

This interview has been edited for length and clarity.

What are you working on right now?

Since OmniML was acquired by Nvidia, we’ve been building a platform to help people train machine learning models in a way that they’re aware of the inference environment, and basically to train models that are faster and that save energy. We already had a working platform at OmniML, now we’re revamping it for what Nvidia cares about—large language models, and deeper integration into the whole Nvidia ecosystem. They have a lot of very cool solutions for running models, putting them on their graphics processing units (GPUs). We’ve been trying to spend more time understanding how—not just how people can train their models, optimize their models—but also compile them and put them into production.

Not a small project.

It’s not! The underlying problem is: How do you ingest an AI model? How do you process it? How do you make modifications on top of that? You’re basically trying to extract the underlying standardized representation of the model. Once you detect the standardized pieces, that gives you the ability to build on top of them and improve them

What does work mean to you? You are someone who has moved between academia and business, and between small and large companies.

In academia, you do all this work to tell a story. You write a paper. In industry, your deliverable—what you’re trying to achieve—is a piece of software or product that people can use. It’s tangible. It makes a difference in people’s lives and work. Going all the way from ideation, to creating something, and then distributing it: that’s what really motivates me. [In academia,] you look at the existing research and you ask “What can I add on top of that?” As I’ve moved to the industry, the same kind of mentality comes along with me. I’m really excited about projects where I see there’s a gap between how a system could work and where we are right now.

Do you have a goal for your career, or a set of ideals for the kind of work you do? Have those goals or ideals changed over time?

I want to build something that is lasting and that I can be very proud of. I don’t want to work like crazy in the sense of all the way to a burnout, but I’m very excited about work. Seeing academia, startups, and now big tech, what I realized is that everyone is trying to chase having impact. I think the way to have impact is to build something that other people can use. That’s my goal right now. How my goals have changed over time? Especially if you’re in academia, you do suffer financially a lot, right?

I’ve lived on those grad school stipends before.

I don’t consider myself rich, but I know I can afford an apartment, I can go on vacation, I can go have dinner and I don’t have to worry about the price tag. Doing the PhD, you don’t have [financial stability]—[money] is quite strongly on your mind. Now I’m on a level where I feel comfortable, where I don’t have to worry all the time. That’s a definite change: It’s not about the total compensation anymore. It’s an added benefit, but it’s not the primary motivation.

You’ve lived in several different cities, time zones, and continents: Austria, where you’re from, Switzerland, Singapore, and—in the U.S.—Boston, San Francisco, and New York. How has your work impacted where you live?

Coming from Austria—the quality of life is amazing. But with its very high incomes, and homogeneousness, and it being a small country, it can lead to close-mindedness. For everyone who fits into that stereotype, it’s a pretty cool lifestyle, right? But the moment you don’t fit into that stereotype, it gets harder. I was always excited to get rid of [that aspect] of my cultural heritage. For me living in different places, I learned so much—a form of humility. Everyone should do it. Every time I go to a new place, I learn more things, and I benefit from it every day.

In Singapore, I was amazed to see how people live very closely together and are so respectful with one another. Something I kept with me is how, if someone gives you something, it could be like a piece of paper, you always take it with two hands as a sign of respect. In the U.S., people have an insane amount of ambition and confidence—interest and excitement to change things and invent something new. Europeans are just too comfortable in their centuries-old cities to really push. In the U.S., it matters less where you come from, what your background is. What matters is if you bring something to the table.

Your first job out of academia was as the founding engineer of OmniML, which was acquired by Nvidia in February 2023. What has your acquisition experience been like?

We were about 20 people; Nvidia has over 26,000. I like to say that transitioning from a startup to a big company was the first time in my life I experienced culture shock. It’s such a big ecosystem, but you can’t access it unless you’re an employee. When we joined, they just created a new team for us—but no one on the new team had company experience. We all had to immediately learn how to propose projects, how to move things from prototype to production line.

You started out studying mechanical engineering and robotics, including some time working on driverless cars. What’s your sense of where the technology is today?

They are essentially large-language models (LLMs) on wheels. I mean, it’s not that they use LLMs, but they use the same technology, the same procedure to train these models. Data in, prediction out. You’re essentially learning distributions of possible outcomes.

ChatGPT is like a hammer. It’s good enough as a proof of concept, but at some point—in a few years maybe—you’ll want a scalpel.

Driverless car software is mostly based on neural networks, the underlying AI technology that LLMs are based on. They extract probability distributions from unstructured data. You give it a million images so it can label—predict—other cars. But once you get to the tail-end, the corner cases, the data just doesn’t tell you much any more. You can try to massage the tail end by enumerating all the corner cases you can think of, where you have 10 awkward unprotected left turns in San Francisco that you try to put into the training data. But it’s like you’ve done 99 percent [already], and then you’re going to do 99 percent again. You have this exponential increase in the amount of work it takes to increase the accuracy or reliability [of the software]. It requires the same effort to go from 90 to 99 as from 99 to 99.9 or 99.99.

We still don’t have a convincing solution to how we’re going to capture these corner cases. Do we want to build infrastructure where these vehicles can operate reliably? I don’t know if it’ll ever get to a stable state without investing in infrastructure.

AI efficiency was the subject of your PhD thesis, and it’s something on which you continue to work. How do you think about efficiency in AI today, especially vis a vis the climate impacts of these new technologies?

My PhD took on efficiency really literally—I want this AI model to run 70 percent faster without losing accuracy. If you think about it, the story about efficiency is also the story of how computing has developed. If you can solve this problem 100 times faster—you can try 100 versions of this problem in the same amount of time in which it took you to try one. 1,000 times faster results using the same amount of resources unlocks problems that you potentially weren’t thinking about before.

A lot of people talk about the end of Moore’s law, in the sense that [where once] we were getting a chip for the same price that’s all of a sudden two times faster—that’s going to stop. [Software engineers] got a lot of efficiency for free. In the future, efficiency is going to be a combination of hardware and software.

What is going on on the level of the AI model itself that should garner more attention or conversation?

People essentially treat it as a black box. ChatGPT is like a hammer. It’s good enough as a proof of concept, but at some point—in a few years maybe—you’ll want a scalpel. You want your model to be specialized to the task you are trying to achieve, with specialized data that has nothing to do with the large amount of data found on the internet—like company knowledge that isn’t publicly available. If you went with the standard model, you’d be wasting a lot of resources.

How do we get to the point where we can build more specialized models?

You can’t build a specialized model without tooling, and tooling requires standardized components. It’s a question about the maturity of the field. Right now the speed is so intense—everyone is trying to build a demo, try out new features—that everything else gets lost along the way. Certain components will standardize as the speed of innovation slows down. But, while there’s a lot of new and exciting research in the field, the fundamentals haven’t changed much in the last couple of years.

What do you want that tooling to look like?

I spend a lot of time thinking about this. Data is the foundation of AI today. What if we could have a composable, modular approach to assembling your training data, training your custom machine learning model on that data, and deploying it?

Say you have a lot of domain expertise—maybe you are a lawyer, or you’re in biotechnology—and you know that machine learning can help you whether it makes you more productive, or it aids you in your research, or changes the type of outcomes you can expect. It’d be great to have tooling that enables you to easily extract data from different sources, to compose the data in different manners, to potentially add additional features on top of that.

AI already felt like a huge community, but now it’s orders of magnitude different.

Another big issue is infrastructure and algorithms—where and how you train your models. If you want to deploy an AI in a specific industry, or if you want to customize it, you really want to make sure that your data is correct. You want to make sure the model you choose can accurately represent the relationships that you’d expect in the data. It’d be great to be able to change the data composition or change the training algorithm of your model.

All of these ideas require an immense amount of tooling that we don’t have right now. For now, it’s really about standardizing these pieces so that we can build around them.

What about your field is most surprising to you right now?

Since ChatGPT, I’ve started seeing a lot of—not spam, but very low quality content about AI. All of a sudden you have an AI expert like in every field. It used to be that when you’d go on YouTube and you typed in “neural networks” or “machine learning,” you’d find some grad student trying to start a blog—people who aren’t necessarily great presenters but whose information was really high quality. Ever since ChatGPT and the hype around the language models, it’s been harder and harder to find that kind of content. The surprising part to me is just how many people have started posting on the Internet about AI, all of it to hype up and promote the same thing.

Some of that content is also written by ChatGPT!

AI already felt like a huge community, but now it’s orders of magnitude different. I was talking to a bunch of people in venture capital—I was asking them about all the startups from the last two or three years working in the cryptocurrency space. What are they doing now? A lot of them ran out of money, but there’s also a lot pivoting to machine learning.

What’s the toughest challenge your field currently faces?

Seeing through all of the hype, the thin air.

But I’m an engineer, I’m a scientist, so I also want to point out one technical challenge, which I think is very important. I think generative AI or “copilots” will also face this tail-end challenge that robotics and autonomous driving are facing right now. The underlying fundamental problem here is that people have a really hard time thinking about the failure modes of these systems. Self-driving cars haven’t panned out in part because of a failure to understand the limitations of these models and systems.

If you want to build a successful product, you have to think deeply about its limitations.

I hope that we avoid the same mistakes for generative AI applications as robotics. [It will be easier to avoid] with copilot-like applications because it’s software that people use in front of a computer. If you program the system correctly, the computer can give you feedback, can prompt the user to enter more information or to take over. The failure mode [there] is a wrong but inconsequential operation like a hallucination, whereas the failure mode [in a car] is a wrong driving maneuver at best.

If you want to build a successful product, you have to think deeply about its limitations—and then think about how you can constrain the system (or the environment) around those limitations to minimize the chance of failure. Every technology has fundamental limits.

This is the end mark. You have reached the end!