Thursday Nights in AI
Thursday Nights in AI Podcast
Recording: Mosaic ML's Chief Scientist, Jonathan Frankle
0:00
Current time: 0:00 / Total time: -41:50
-41:50

Recording: Mosaic ML's Chief Scientist, Jonathan Frankle

Jonathan's role in MosaicML, the company's acquisition by Databricks, and some candid thoughts.

MosaicML Chief Scientist Jonathan Frankle joined Thursday Nights in AI, where he shared some hot takes on Mosaic’s recent acquisition by Databricks, open-sourced versus closed-source models, and AI policy.

To watch the full conversation, click here. You can also listen to this recording on Spotify.


Our Key Takeaways:

On the danger of centralization, and his mandate to go toe-to-toe with OpenAI:

“I hope there’s an alternative to the world we’re being presented with now. The world right now is there are a few big monolithic models, you can take them or leave them. You better hope they behave the way you want them to. You better be okay with giving away all of your data, and you better be okay with whatever prices are set. And you better be okay that they may change under you on a day-to-day basis or get deprecated six months after they come out, or what have you. You have no control. It doesn’t reflect anything you want and you better live with it. I don’t like that world. I'm not a big fan of centralization in general, and I'm not a big fan of taking away control from people, and I'm excited to present an alternative to OpenAI. I was told in this deal to go toe to toe with OpenAI my own way, and I intend to do it my own way. That's to give people choice, customization, efficiency, reduce cost. There shouldn't be four LLMs, there should be a hundred thousand. And everybody's data and everybody's opinion should get reflected in those models. It's really simple. Mosaic was pushing toward that. And now we've been given a ton more resources, a mandate, and a lot of amazing collaborators to go pull that off. So I hope there’s a choice now.”

On today’s nosebleed valuations in AI, and whether we’re going to see more $1B+ exits:

“Valuations and exiting are very different... I look at some companies that have very high valuations and I think, is that justified? How is it justified? And what happens next? How do you get acquired at a certain valuation if the investors are looking for 4x… A lot of valuations are getting to the point where I think it’s hard to contemplate acquiring a company that big…”

“The analog I look to is the AI chip space… I can name two exits in that space, both to Intel. I can’t think of any other exits in the AI chip space. Some incredible valuations and incredible technology, but so far it’s unclear whether there are going to be more exits. And it’s entirely possible we look back at the LLM space and say, wow, those were a lot of amazing companies with great technology and great valuations. Did anybody exit? I don’t know, but it’s certainly a question we should ask ourselves. It’s great to have a 4 billion valuation. It’s a lot better to actually be able to exit, and, you know, realize some valuation.”

“I definitely see a lot of pairing up. I mean, Snowflake acquired Neeva, and I think that was a really savvy move. But I think a lot of valuations, from what I understand, are getting to the point where it's hard to contemplate acquiring a company that big. It's a lot harder to acquire a company for 10 billion than for 1 billion. It was a very different animal to be fundraising on the VC market than to be talking about an acquisition.”

On why the goal shouldn’t be just trying to catch up with OpenAI:

“I would argue with the premise that the goal is to close the gap with OpenAI. My personal goal is not to close the gap, my personal goal is to do something different. I don’t think that just because a ladder has been defined means we all have to climb it. There are lots of other ways to solve problems and lots of other ways to come at this. My goal is not to go to Databricks and then build a GPT 4 scale model. That is decidedly not the plan. The plan is to continue doing exactly what we’ve been doing at Mosaic and try to find ways to build specialized models that can be useful to anyone.”

On why Mosaic’s $1.2B price tag is good for Databricks:

“I actually think it's fully justified. I think they got a good deal to be completely frank. I mean that, I mean that seriously, there were all these interesting articles that came out of like 21 million per employee then it was, yeah, and they had three salespeople and they were heading toward, however many million or tens of millions in ARR. Yeah, we were because when you're at a startup like this, everybody's on the sales team. I'm on the sales team. Our CEO is in the sales team. My researchers are on the sales team and people really want custom models. So I honestly think the valuation was fully justified and perhaps they got a good deal in the process and good for them.”

On why Databricks’ acquisition of MosaicML is good for the world:

“I think it's really one of those situations where one plus one equals three. They are really good at data. We're really good at training models … In general, I think this is going to be really good for the world. Everybody knows that data is what powers deep learning these days. There were all these leaks of the GPT 4 architecture and my response is: "who cares?" That you could have done it a thousand different ways and gotten a model just as good. The data was what mattered. And I'm really excited to hook up to data and take advantage of our data, have the world take advantage of their own data, and build really great models.”

On why transformers may continue to be the dominant architecture:

“Honestly, my belief about this is we love to think that the technology is changing really quickly and in many ways it is, but by and large, the underlying good inductive biases and good models are really hard to come by… Good architecture seemed to last for decades, not years and transformers, I think, are going to be around for a while because it's really hard to replace them.”

On why Jonathan doesn’t believe in existential risk:

“I care about the world, and I care about actually making an impact on the world. And I think all this existential risk bullshit is a great way for a lot of people to make themselves feel like they’re doing something good and just keep doing what they were doing before…”

There are real tangible society and policy issues that can be addressed today, and existential risk is a way to distract people whose time is valuable and who actually make these decisions from working on the right problems.

I’m a technologist. My job is to share what I know and help not to try to make things up and reinvent the wheel for a bunch of people who have been working in this field and know how to actually operate the levers of policy and think about all of society.”

“In AI and tech in general, we have this belief that we can reinvent or disrupt anything and do it better than the people who have been doing it for a long time. When it comes to policy and society, we need to stop acting like we are experts in this space just to make ourselves feel good.”

On how technologists can get involved in policy:

“Ask to join a conversation and go where you’re invited. Don’t go shouting about it. White House meetings are very fun and pretty, but nothing gets done there, that’s not where it happens. I work with congressional staffers on this stuff that's where the work gets done. And there's no recognition. You don't get in the New York Times for that, but it's useful.”

“The thing about policy is it is slow and boring and incremental, and you have to build relationships and build trust over time. I'm still building those very slowly, and I've been building them for almost 10 years. So the first step is to just have any conversation. Talk to me, and I'm happy to put you in touch with someone, and then earn their trust, and then they'll put you in touch with other people but it is a gradual process. The thing about technical subjects is you tend to be able to measure results for the most part. I mean, we all talk about how publications are impossible to replicate, but at the end of the day, you can kind of evaluate what a good piece of science is and what a good piece of science isn't, at least if you take a step back. Policy is not measurable. It's really completely in the eye of the beholder and it comes down to values and these really fuzzy things. And so you can't measure "Is someone good at policy?" It's really about trust and you have to earn that trust over time. You can't get trust immediately just by saying I'm a founder. Trust me. That's not how this works.”

On why it’s not as simple as open- versus closed-source:

“I get a lot of questions at these events about open source versus closed source, and I really don’t like that. It's about transparency versus lack of transparency and control versus lack of control. Imagine that OpenAI told you literally everything they did in GPT 4. You still have no control over that. Imagine I released a model tomorrow and told you nothing about how I built it. There's no transparency. Those are really the two axes that matter. Open source can get you some amount of control and get you transparency if someone's willing to share how they built it. So for me, open source is really about that. It's about transparency and control. I care a lot about control for my customers. I care a lot about transparency for my customers. So it's natural to open source things and tell people how I did it.”

“And honestly we take a lot from the community. Everything we do at MosaicML is built on what the community’s done, and I’m not a big fan of taking and not giving back. We need to sustain the community, especially right now, when all the big labs that have sustained the community are shutting down and closing up. Brain doesn't exist anymore. DeepMind's not going to publish anything. FAIR is very confused about whether they're going to be a very open lab or not, and we'll probably find out in the next couple of weeks which direction they're going to go, and I really hope they go the open route. I feel like overnight I was suddenly running the third-largest open industry research lab and that's really scary given that I have a 20 person team.”

On why open-source models could provide a viable alternative to proprietary LLMs:

I think it's going to be like any open source project. Linux is going to be behind Windows, perpetually, on a lot of things. But the gap remains about the same. Given how quickly things are improving, the open source models in the fall are going to be so much better than LLaMA-7B, which was the de facto best open source model a few months ago, and even that's been left in the dust at this point by things like Falcon. So I think we're going to keep seeing that curve improve and, I hope we'll be a part of that… I'm sure OpenAI is doing a bunch of crazy things right now to try to keep that gap widening, but it's hard to bet against the community. Microsoft bought GitHub and Linux now runs inside Windows. It's really hard to beat the open source community in the long run. And we're in it for the long game now. 

We would like to thank Jonathan Frankle for such an informative talk. If you are interested, join us for our upcoming firesides! See the full list here.


Transcript:

This transcript was edited for brevity.

Jonathan on MIT’s parchment shortage and negotiating with Databricks.

Ali Rohde: Okay, welcome. Jonathan, thank you so much for joining us. Thank you for having me. Some things have happened in your life recently, so let's start with the elephant in the room. Congratulations on this acquisition by Databricks. 

Jonathan Frankle: Thank you. 

Ali Rohde: Which I think you were saying went through.

Jonathan Frankle: I thought you were going to talk about the fact that I finally finished my PhD, but... 

Ali Rohde: Yes, that too. I heard you got it in the mail in January. 

Jonathan Frankle: Actually, I defended in February, finally, and the diploma arrived last week because there was a supply chain shortage of parchment, apparently. That's what MIT told me. I finally have the diploma. I'm trying to get it from my parents’ place up to New York. But yeah, it's a big accomplishment. No, the acquisition is really exciting. The acquisition actually got signed the same day the diploma arrived. So, it was a really good day.

Ali Rohde: Which one was better?

Jonathan Frankle: Honestly, the diploma. The acquisition was weeks of exhaustive negotiations. The diploma was me emailing the registrar once a week for months asking, Is the diploma coming this week? And every week they said yes. So, if anyone plans to get a diploma from MIT, good luck. That was harder than the acquisition. 

MosaicML Acquisition

Ali Rohde: I've heard Ali, the CEO of Databricks, is a tough negotiator so that's really saying something. 

Jonathan Frankle: He is a very tough negotiator. I'm so glad he's on my team and not against me. I'm terrified for anyone who goes up against him. Any Snowflake folks in the audience, by the way? Figured I'd ask. We love you very much, but you know, now we apparently have enemies.

It's really exciting. The team worked incredibly hard. I think it's really one of those situations where one plus one equals three. They are really good at data. We're really good at training models and really bad at data. We have this one machine we call "the big boy", which has 128 cores and a lot of RAM, and that is our data machine.

I'm very excited to learn what [Databrick’s] Spark is. In general, I think this is going to be really good for the world. Everybody knows that data is what powers deep learning these days. There were all these leaks of the GPT-4 architecture and my response is: "who cares?" That you could have done it a thousand different ways and gotten a model just as good. The data was what mattered. And I'm really excited to hook up to data and take advantage of our data, the world's data, and build really great models. That's the secret these days. 

Why Jonathan wants to build more LLMs and compete with major players.

Ali Rohde: Okay, so you have really great models coming. How else does the world change? You said you're excited for the world because of this acquisition. How else does that change that? 

Jonathan Frankle: I hope there's an alternative to the world that we're being presented with now. The world right now is, there are a few big monolithic models. You can take them or leave them. You better hope they behave the way you want them to. You better be okay with giving away all of your data.

And you better be okay with whatever prices are set. And you better be okay that they may change under you on a day to day basis or get deprecated six months after they come out, or what have you. You have no control. It doesn't reflect anything that you want, and you better live with it. I don't like that world.

I'm not a big fan of centralization in general, and I'm not a big fan of taking away control from people, and I'm excited to present an alternative to OpenAI. I was told in this deal to go toe to toe with OpenAI my own way, and I intend to do it my own way. That's to give people choice, customization, efficiency, reduce cost.

There shouldn't be four LLMs, there should be a hundred thousand. And everybody's data and everybody's opinion should get reflected in those models, it's really simple. Mosaic was pushing toward that. And now we've been given a ton more resources, a mandate, and a lot of amazing collaborators to go and pull that off. So, I hope there's a choice now.

On keeping models open-source

Josh Albrecht: It's really cool to see what you guys have done. I think it's great to see the MPT models, you know, 7 billion one, 30 billion parameter how do you guys think about training those the decision to open source those? How are you thinking about kind of the open-source models and making things more accessible?

Jonathan Frankle: Yeah, open-source is tricky. I get a lot of questions at these kinds of events about open-source versus closed-source, and I really don't like that. It's about transparency versus lack of transparency and control versus lack of control. Imagine that OpenAI told you literally everything they did in GPT-4.

You still have no control over that. Imagine I released a model tomorrow and told you nothing about how I built it. There's no transparency. Those are really the two axes that matter and open-source can get you some amount of control, maybe. It can get you transparency if someone's willing to share how they built it.

So, for me, open-source is really about that; it's about transparency and control. I care a lot about control for my customers. I care a lot about transparency for my customers. So, it's natural to open-source things and tell people how I did it. And honestly, we take a lot from the community. Everything we do at Mosaic is built on what the community's done, and I'm not a big fan of taking and not giving back.

We need to sustain the community, especially right now, when all the big labs that have sustained the community are shutting down and closing up. [Google] Brain doesn't exist anymore. DeepMind's not going to publish anything. FAIR is very confused about whether they're going to be a very open lab or not, and we'll probably find out in the next couple of weeks which direction they're going to go, and I really hope they go the open route.

I feel like overnight I was suddenly running the third largest industry research lab that was open and that's really scary given that I have a 20-person team. So, it's important that we give back, we sustain the community, and the community does great things for us. We released the MPT models, and two days later, GGML was out with support for MPT and you could run it on your laptop.

That's incredible. That's not something we were planning to build, but it's out there now. It's something we can give to our customers and my belief is really the more we give, the more we get back. And so, we're going to keep open sourcing and keep sharing. There are limits to that. I can't open source a hundred-million-dollar model. It's just hard to justify from a business perspective. But I can always justify open sourcing a million-dollar model. And with our work at Mosaic, that million-dollar model is going to be what a 10 million model would have been six months ago.

And so, I want to keep getting better and better artifacts out in the community. I want to leave behind the generation of models we have and get something a lot better out there. And hopefully somebody else will get there first. And if not, hopefully we'll get there. 

Josh Albrecht: Yeah, I think that makes a lot of sense. How do you think about the gap between the performance of the open-source models and the closed source models, right? And that if you're building on top of things and prototyping stuff, like a lot of people here, you can build on top of, OBT or MPT or whatever, or Bakuna or something, but you also can build on top of Claude or one of these other more proprietary ones. And if you're looking at the performance and you're thinking about prototyping, do you want to be prototyping on the one that's like at the cutting edge or this other one?

Especially if we're not necessarily going to have open-source models that are those 100 million parameter or 100 million dollar ones that are open source, how is it going to, end up being widely used by people that are prototyping? Or do you think it's more for people that after they productize it and they're like, "Oh, I'm spending a lot on this I want to switch over."?

Jonathan Frankle: I think there are a lot of paths. We see a lot of customers who prototyped on OpenAI and really want to get off of OpenAI, but GPT-4 is the best model out there, definitively. And if you're prototyping a product and you want to ask, can this work? You should do it on GPT 4 because it does, if it doesn't work on GPT-4 or it doesn't work on Claude in certain cases, it's not going to work on any other model.

So it's just not going to work in the current technology that exists right now publicly. So you should prototype on GPT-4. Doesn't mean you should build your product on GPT-4. Work your way down. Figure out, is it going to work on one of the older GPT models? Is it going to work on Claude 2, Claude 1, Cohere, and eventually you work your way down from there to, the open-source models.

But I think it's going to be like any open sources. Linux is going to be behind Windows, perpetually, on a lot of things. But the gap remains about the same. And I think we're going to see that. Given how quickly things are improving, the open-source models in the fall are going to be so much better than Llama7b, which was the de facto best open-source model a few months ago, and even that's been left in the dust at this point by things like Falcon.

So, I think we're going to keep seeing that curve improve and, I hope we'll be a part of that. But there are a lot of other great efforts out there. I'm looking at the folks at AI2. They've announced they're training a 70 something billion parameter model. Surge has announced that they're going to train, or they're going to produce a bunch of instruction, fine tuning, and chat data.

And I have to believe either the model will be open source and we can scrape that from it, or the data may be open-source and available. That's going to be huge. That starts to close the gap. I'm sure OpenAI is doing a bunch of crazy things right now to try to keep that gap widening, but it's hard to bet against the community.

Microsoft bought GitHub and Linux now runs inside Windows. It's really hard to beat the open-source community in the long run. And we're in it for the long game now. 

How open-source models get compute resource

Josh Albrecht: I think my last question on the open-source part is, I fully agree with you that it's difficult to beat open-source community for software.

I think one of the things I'm confused about and thinking about is Okay, but for training these things, you know, we can all go home to our laptop and like, yeah, we're gonna do a distributed training run, but that doesn't really work in the same way. Where does open source get the compute resources to make these larger and larger models? Or are they held back by hoping that somebody else will open source the base model they can build on top of? 

Jonathan Frankle: It's a little bit of both. I mean, we're all hoping that Facebook announces some giant open-source model in, a month or something like that and helps the whole community forward.

I'd be very grateful for that right now. But at the same time, I think it's really that those of us who care about openness have to stick together. Collectively, we have a lot of compute. I look to my friends at Eleuther a lot. They have access to good computing resources, and they're fantastic at what they do, despite the fact that nobody works on it full time. That should be insanely impressive. That nobody's full time job at Eleuther is to do Eleuther, and yet they're publishing papers and training some of the best open-source models out there.

The folks at AI2 they are training great stuff. We have a lot of compute. A lot of us are talking to each other, and hopefully we're going to work together and share resources. The folks at AI2 have been phenomenally helpful for us in figuring out tokenizers. Tokenizers are really hard, I learned, when I thought: "oh, I'll just download a new tokenizer and train my model tomorrow." Turns out that's not how you do tokenizers. Lesson learned on that. Oops but the folks at AI2 really helped with that. We helped them a lot with their compute efficiency. We're all kind of really sharing resources. Keeping things open source and working together. That's not going to replace the kind of centralized compute that you could have, but I don't think we're as far behind as we think. I've got a lot of H100s floating around now, and the folks at AI2 have access to some insane clusters, and I'll let you talk about your own compute situation, but I'm pretty optimistic about it.

The folks at Eleuther seem to continue to train really great models. So, I'm pretty optimistic, actually. I think we're not as far behind as we think, even if we're an order of magnitude smaller, in terms of compute. I don't think an order of magnitude is a huge difference in terms of model quality at this point.

An order of magnitude might be an incremental improvement in the model, not a game changer. Data is the thing that worries me much more. 

On open-source datasets

Josh Albrecht: Yeah, can you speak a little bit more about data and kind of some of the ways you're thinking that that ends up being really, really useful? 

Jonathan Frankle: Yeah, let's start with pre training data. Our pre training data in the open source is garbage. I would challenge anyone here. Go read one of the data sets that these models are trained on. Literally just sit down and read the data set. I recommend reading the Wikipedia data sets because they're very easy to compare to the source material, and they're very easy to compare to each other.

You can look at Red Pajama. You can look at The Pile. They're both incredible open-source efforts but there are a lot of issues. Red Pajama, I think, for the most part, only has the first paragraph of every Wikipedia article. The pile is missing all lists and tables and knowledge cards, and is now three or four years out of date.

It was a visionary dataset to make at that time in the open source, but it's time for some revisions and that's just the beginning. If you read a web crawl, not only is it a mess, and it's all JSON and minified JavaScript, but some of the content is, Like, I'm now thinking about how to take care of my team as they look at the data, because I think there are actual employment issues with what some people have seen.

And those are all huge issues that we need to take on so one of my jobs at Databricks, now that I have access to this magic thing called Spark and a lot of really smart data people is I want to build some new open-source datasets. That's something that we're planning to do because the existing ones are really incredible efforts, but I really want to bring industrial strength data practices to bear and see how much better we can do.

That alone would be a huge start. Then we get into instruction data, and chat data, and RLHF data. That's going to be the biggest gap to close. We have datasets like DALI, which, now to some extent I guess I own, which was a fantastic effort, but it's really hard to get diversity and it's really hard to get the right stuff if you're doing it in that way.

We have all the GPT-4 scrapes. I'll let you and your lawyer figure out what to do with that. I have my own personal opinions on that, and you won't see me release a model under a commercial license with that data right now. At least until ambiguity is settled on that. So that's I think the next frontier we need to solve.

I think the folks at AI2 are probably the furthest along there in terms of something that might be open sourced and I hope we, as a community, get to work on that. That's the big problem now. That's where I don't have a clear solution. And I don't think we can just crowdsource it. The community of people who are willing to crowdsource instruction data do not represent the distribution of instruction data we want or the world, so I think we're gonna have to do better than that. 

More on MosaicML Acquisition 

Ali Rohde: All right. Let's change course a little bit and let's talk about the acquisition. How old is Mosaic ML? 

Jonathan Frankle: We are about two and a half years old. I think official incorporation date was December of 2020. 

Ali Rohde: Okay. So, two and a half years old, just acquired for 1.3 billion and not just like a crazy valuation but actually acquired for that amount. That's kind of insane. 

Jonathan Frankle: I actually don't think it's that insane. You can look at our sales numbers and our revenue. I actually think it's fully justified. I think they got a good deal to be completely frank. I mean that, I mean that seriously, there were all these interesting articles that came out of like 21 million per employee.

And yes, when you get acquired, the money is distributed exactly evenly among the employees. And definitely the investors get nothing, that's definitely how it works but also then it was, yeah, and they had three salespeople and they were heading toward, however many million or tens of millions in ARR.

Yeah, we were because when you're at a startup like this, everybody's on the sales team. I'm on the sales team. Our CEO is in the sales team. My researchers are on the sales team and people really want custom models. So, I honestly think the valuation was fully justified and perhaps they got a good deal in the process and good for them.

Current status of the AI world

Ali Rohde: Fair enough. Are we going to see more good deals like this soon? What do you think about this crazy AI world and investors right now?

Jonathan Frankle: far be it from me to speak on this space, given that I'm an amateur who's been out of a PhD for five months, and this is my first job out of school.

So, I'm not exactly the expert on this, but one of the biggest things I learned is that valuation and exiting are very different, certainly for us they were. I think the last valuation we had was from a Series A we raised in the spring of 2021. Our valuation looked very, very different than 1.3 billion, but a lot changed.

And I then look at some companies that have very high valuations and I think, is that justified? How is it justified? And what happens next? How do you get acquired at a certain valuation if the investors are looking for 4x, or more than 4x? You can tell me, you're an investor, but my understanding is 4x is kind of the minimum that folks are looking for on return.

At least that's what smarter people than me have told me. I think the analog I look to, and this is biased given my founders, I look to the AI chip space. I can probably name 10 companies off the top of my head. I respect the hell out of what everybody has done. I can name two exits in that space, both to Intel, one Nirvana, one Havana. I can't think of any other exits in the AI chip space. Some incredible valuations and incredible technology, but so far, it's unclear whether there are going to be more exits. And it's entirely possible we look back at the LLM space and say, wow, those were a lot of amazing companies with great technology and great valuations.

Did anybody exit? I don't know, but it's certainly a question we should ask ourselves. It's great to have a 4 billion valuation. It's a lot better to actually be able to exit and, you know, realize some valuation.

Ali Rohde: Okay, so I think what you're implying here is then you're not necessarily expecting to see other acquisitions of this size any time soon.

Jonathan Frankle: Maybe. I definitely see a lot of pairing up. I mean, Snowflake acquired Neeva, and I think that was a really savvy move. But I think a lot of valuations, from what I understand, are getting to the point where it's hard to contemplate acquiring a company that big. It's a lot harder to acquire a company for 10 billion than for 1 billion. And again, I'm an amateur and I don't really know anything about how this works beyond parroting what my VCs have told me, but it was a very different animal to be fundraising on the VC market than to be talking about an acquisition. Those were very different conversations, and the things that got brought up were really different.

So, I don't know, I think there are two possible worlds. Either this looks like a really dumb decision on our part, and everybody else is a hundred billion dollar company in a few years, and we're like, oh wow, we're the Instagram and sold for a half billion dollars or a billion dollars in a world where Snapchat is now worth however much or we look at the flip side and go, wow. Mosaic was able to exit and make an impact on the world by joining forces with someone really big and a lot of companies were really valuable for a while, but only a few survived. Those are both possibilities. I have no idea which one's going to happen, but I keep both in my head and either we're going to look brilliant or we're going to look like complete idiots and I don't know, I'm happy to have impact on the world and know that I took care of my team.

AI Policy

Ali Rohde: All right. Last subject, and then we'll open it up to the audience. Let's talk about AI policy, because for a researcher, you are unusually involved in the policy world and talking to lawyers and journalists and policymakers. Obviously, there's a pretty high opportunity cost to your time, and you've decided to devote a good chunk of it to working with policy makers, working with the OECD. Why is that? 

Jonathan Frankle: I care about the world, and I care about actually making an impact on the world. And I think all this existential risk bullshit is well, bullshit. And I think it's a great way for a lot of people to make themselves feel like they're doing something good and just keep doing what they were doing before.

I think that when it comes to policy, our job as technical folks is not to go and lead the way. Our job is to talk to the people who actually touch society, who actually see these problems happening and share our expertise, not to go and claim that we have all the answers. Claim that we have all the solutions or make up problems that just justify throwing more money at the things we already wanted to do.

Not to point fingers at any organizations or any super intelligent teams or what have you, but I don't know, I think that's a waste of time, a waste of money, and honestly, we're just distracting a lot of really, a lot of people whose time is valuable and who actually make these decisions from working on the right problems. So, my job here, at least the job I learned when I was in DC was, I'm a technologist. My job is to share what I know and help not to try to make things up and reinvent the wheel for a bunch of people who have been working in this field and know how to actually operate the levers of policy and think about all of society. And God, San Francisco is not all of society. There's a reason I don't live here. Quite frankly, there's a reason I'm on the East coast. It's a different world. I go to dinners and people ask, what do you do? And I say tech and they don't ask any more questions. That is a good place to be but I think there's just a weird monoculture out here, especially in the AI world and in tech in general, we have this belief that we can reinvent or disrupt anything and do it better than the people who have been doing it for a long time. When it comes to policy and society, that's garbage and it's just distracting the press, the world, everybody else from actually doing the work.

I try to work with the people who actually get work done consistently. It's boring, it's slow, it's incremental, but it gets results and all this shouting about existential risk and all that stuff. What's it going to amount to? 

Ali Rohde: Does it get results? Just to play devil's advocate a bit. So you worked with the OECD to come up with AI principles which you're now helping them implement. What did that do? 

Jonathan Frankle: Now we have a bunch of regulatory guidance and suggestions for policymakers. The OECD is a very long, slow process, but their privacy principles are now the basis for pretty much all privacy legislation around the world. The AI principles got adopted by the G20 and the G7 and are now the basis of a lot of national legislation.

So, the OECD doesn't set law, although any new member state has to make sure that all of their laws are compatible with OECD principles and rules. So for any new OECD member, it is legally binding. But beyond that, it sets the standard that downstream everything else comes from. There are a lot of AI principles out there. These are some of the ones that have had the most impact, personally. But I look back at my work on facial recognition. Everybody remember, two or three years ago, when all the companies finally stopped selling facial recognition to law enforcement and a bunch of jurisdictions banned it, including San Francisco, if I remember right?

It was on the basis of that work and what followed. Policy is slow, change is incremental, and then it happens all at once, but if we're looking to do things really quickly, that's just not how policy works. It is a slow process purposefully. It would be really bad if the rules could change really suddenly all the time with every whim and every technology. We'd have all sorts of fascinating blockchain regulation that now doesn't matter because crypto is dead and everybody cares about AI these days. And who knows if AI is actually the thing we'll be worried about. Personally, I'm still waiting for privacy legislation in the US.

That to me is where AI regulation begins. It's not with an AI law. It's just with let's regulate the data and the data flows and data that was collected in one context and is now being used in a context that wasn't even conceivable when it was collected. Let's do that first. That's easy. It's incremental and boring, but you know, it does get results over time.

Ali Rohde: Do you recommend to other founders that they try to get involved? 

Jonathan Frankle: Only if they're willing to be humble and recognize the role they have to play. Ask to join the conversation and go where you're invited. Don't go and start shouting about it. White House meetings are very fun and pretty, but nothing gets done there, that's not where it happens. I work with congressional staffers on this stuff that's where the work gets done. And there's no recognition. You don't get in the New York Times for that, but it's useful. And we have to be willing to think about more than just our image and think about more than just hype and that's a very hard thing for people who are in the AI space and it is a big distraction. It's something I do because honestly, when I look back at the most impactful work I've done, it's the work on facial recognition, not mosaic, not lottery ticket work, not anything else. That's the thing that actually affected people's lives. That's what meant the most to me. 

Ali Rohde: Hmm. Well, then I guess to play devil's advocate again, why don't you just go do that work full time? 

Jonathan Frankle: Give me a little bit to go and wrap up this acquisition, quite frankly. It's near and dear to my heart. I mean, one of the things I learned when I was at Georgetown was I could stay in the policy world forever and I wouldn't stay technically sharp.

There's something to living in both worlds. I don't like to do interdisciplinary work. I've never been good at it. Some people are good at it. I've never been able to pull it off, but I've always been really good at being technically deep and translating. And I think you have to be in both worlds.

Being at Mosaic makes me better for the policy world. That was why I went back to do my PhD at all. I wanted to be better at machine learning so I could be better for policy. And I think there's constantly going to be the back and forth. I'm really hoping I'll get another tour of duty actually doing policy full time, but I couldn't stay forever.

Because I think at that point I'd be useless. And there are plenty of people running around in AI policy right now who are full time policy, still talking about symbolic logic and rule based systems because that's what they knew when they left the technical world and went to policy. It's really hard to stay sharp in both.

I've chosen to live in this world, visit that one, and I think that's where I'm most useful. 

Ali Rohde: Got it. For the tour of force, I'm just curious, where could you end up?

Jonathan Frankle: There are all sorts of things you can do. I'd love to go back to a think tank and be an advocate again. That was a really good, useful time.

I'd love to be in government. That would be really meaningful. My mom was at the FTC for 40 years. She was a bureaucrat, but she did a lot. It was fun to be there. My old boss at Georgetown is now an FTC commissioner. He's doing really great work. He's not loud about it, but he's doing really great work.

I don't know, that means a lot to me personally. It would be fun to get to go back. Databricks has kind of delayed that by a little bit, but hopefully not forever. 

Advice for founders who want to get involved in AI policy

Ali Rohde: Got it. All right. Last question for founders that are ready to have those long conversations and want to be involved, how do they get involved?

Jonathan Frankle: You won't get invited to go testify, so don't bother trying, the thing about policy is it is slow and boring and incremental, and you have to build relationships and build trust over time. I'm still building those very slowly, and I've been building them for almost 10 years.

So, the first step is to just have any conversation. Talk to me, and I'm happy to put you in touch with someone, and then earn their trust, and then they'll put you in touch with other people but it is a gradual process. The thing about technical subjects is you tend to be able to measure results for the most part. I mean, we all talk about how publications are impossible to replicate, but at the end of the day, you can kind of evaluate what a good piece of science is and what a good piece of science isn't, at least if you take a step back. Policy is not measurable. It's really completely in the eye of the beholder and it comes down to values and these really fuzzy things. And so, you can't measure "Is someone good at policy?" It's really about trust and you have to earn that trust over time. You can't get trust immediately just by saying I'm a founder. Trust me. That's not how this works.

Discussion about this podcast

Thursday Nights in AI
Thursday Nights in AI Podcast
Fireside chats with leaders in AI, co-hosted by Outset Capital and Generally Intelligent