Speakers:
- Zoom: Alex Waibel, Research Fellow at Zoom; Professor, CMU
- OpenAI: Boris Power, Member of Technical Staff
- Stanford: Percy Liang, Director of Stanford CRFM; Co-founder at Together.xyz
- Anthropic: Brian Krausz, Member of Technical Staff
- Fellows Fund: Vijay Narayanan, General Partner (Moderator)
Summary:
The panel discussion of Navigating the AI Landscape: Open or Closed explores the dichotomy between open-source and closed-source ecosystems in generative AI. The panelists from Zoom, OpenAI, Stanford, Anthropic, and Fellows Fund discuss how users, companies, and developers should navigate this landscape and leverage both open and closed-source models. They highlight the importance of considering a combination of open and closed approaches and emphasize the need to think about safety, ethics, and societal values in the development and deployment of AI models.
The challenges discussed include ensuring safety in AI models, developing mechanisms for trainable moral guidelines, establishing governance and coordination between different research labs and countries, and rethinking the paradigm of AI model development and deployment. The panelists also express the need for transparency, benchmarking, and modularization to address concerns related to bias, misinformation, and user control.
Overall, the panel highlights the complex and evolving nature of the AI landscape and the importance of balancing openness, safety, and ethical considerations to create a beneficial and responsible AI ecosystem.
Video:
Full Transcripts:
Vijay Narayanan:
All right, welcome everyone to this panel. Probably the most exciting panel of the day because we named it similar to the title of the summit itself. It's really about navigating the AI landscape, open or closed, hopefully with our eyes open. And really, if I think about it over the last three years, generative AI has advanced very rapidly. And in the first year or so, a lot of these advances, the technological innovations, the model architectures, and sometimes the models themselves were published in the open domain. They were publications, they were models that are open-sourced. But in the last couple of years, we have seen that increasingly many of the innovations, the architectures, the models, the innovations are starting to be only accessible behind closed APIs delivered by companies like OpenAI, Microsoft, anthropic, etc. Simultaneously, as a reaction probably to this development, we have seen a large number of new startups, a large number of startups starting to take the other approach of launching things in open source. Stability, Dolly from Databricks, and the Red Pajama models from Together have all started coming in open source as well. So clearly, there is a dichotomy emerging now between these two ecosystems, an open-source ecosystem for generative AI and a closed-source ecosystem for generative AI. So the panel today is really about how we as users, companies, and developers, how to think about how to navigate this landscape of both the closed and the open-source ecosystems for generative AI. It doesn't have to be necessarily one or the other; it could be a combination of both as well. So with that, I'd like to quickly introduce the panelists. It's a very distinguished panel. To my immediate left is Dr. Alex Waibel. Alex is a computer science faculty at CMU and at Karlsruhe Institute of Technology. He has been a longtime researcher in speech recognition, translation, and leveraging neural networks very extensively in a number of these areas. He's also one of the first to my knowledge to invent time delay neural networks in this field. So welcome, Alex. To the left of Alex is Brian Krausz. Brian is a member of the technical staff at Anthropic, working on productizing a lot of the models coming out of Anthropic. He has founded two companies before, so he has been twice an entrepreneur in the past, and he has also worked at Facebook and Stripe among a number of other places. Welcome, Brian. And to the left of Brian is Boris Power. So Boris leads an applied AI team at OpenAI, which really is pushing the boundaries of how you solve real-world problems using the technologies developed by OpenAI. He's also an advisor to the OpenAI fund. He holds an AI degree from the University of Edinburgh, and he has worked as a software engineer, data scientist, and machine learning engineer in the past. Welcome, Boris. And on the far left, we have Percy who needs no introduction. But I just wanted to add a couple of things to the general introduction of Percy this morning. Percy has been a very long-term proponent of reproducibility in machine learning research. So for those of you who know, he is the inventor of the CodaLab worksheets, which really brought in large-scale reproducibility and brought in a lot of tools for reproducibility into machine learning. Specifically for this panel, he's also a co-founder of the company Together. XYZ, which has a stated mission of making available large-scale open-source generative AI models as cloud services to all of us. So welcome, Percy. It's a very distinguished panel we have today, and I just wanted to start with a very simple question. I really would love to hear, gentlemen, your thoughts on this. How should the users, the different companies, whether they are enterprises, whether they are startups, large ones, small ones, or enterprise ones, consumer ones, how should we be really thinking about leveraging what's coming out of both the open and the closed-source ecosystems for generative AI? Percy, do you want to kick us off?
Percy Liang:
Yeah, thank you for the introduction. And I'm very happy to be here. Again, I think this is really an important and interesting question, closed or open. And I would say that it's a spectrum, and it's actually multi-dimensional. So I don't want to think about this as some sort of fundamental dichotomy. I think my recommendation is that, you know, as has been said multiple times, this field is moving incredibly quickly. What is the best model today might be supplanted by some other model right now. And I think if you're building applications downstream, it's important to build them in a bit of a future-proof way. And fortunately, a lot of the models are almost interoperable or behave similarly. And I think the crucial thing is to think about these models as each one has a certain cost and has certain performance characteristics. Again, this is why stress benchmarking is very important because things move quickly. And what might be the best decision right now might not be the best decision later. And only through rigorous benchmarking can you make intelligent decisions whether to switch and you want right now. I think the switching costs would be relatively low because needs change over time. And furthermore, I think that we shouldn't think about these models as static entities. I think one of the benefits of thinking about the open-source ecosystem is how rapidly things get adapted. For example, the Red Pajama dataset that we put out, I don't remember when, maybe three weeks ago, which feels like an eternity, within 24 hours, someone had already been training models on that dataset, and I looked at Hugging Face earlier today, there are 50 models that are trained on that dataset. And that number is probably going to increase. And not only that, there are other companies that have mixed the dataset with other datasets and are training additional models based on their particular needs. Not to mention, once you have base models, all the fine-tuning work that can be done. So you think about this kind of whole ecosystem, where these assets get produced and people can take and mix and match, sort of much like how software is done. I think software obviously is much more mature than the foundation models ecosystem. But I think that might be a useful metaphor to think about how we manage and make decisions going forward.
Vijay Narayanan:
Thank you. Boris and Brian, I would especially love to hear your thoughts on this because you're both part of companies that are offering closed-source generative AI behind APIs. So how should companies be thinking about this?
Boris Power:
Maybe I'll kick off. So I think OpenAI both open-sources some models and serves other models through the closed source. And my team in particular has open-sourced two different repositories: one is OpenEval, which makes it easier to do auto-eval of more complex capabilities, which cannot be simply assessed via rule-based approaches. And the OpenAI Cookbook, which is a repository that shows the best practices for how to use these models. So I also agree that there is an ecosystem here that we would all love to see develop. And I think that the ecosystem should be a combination of both people using open-source and closed-source models, whatever works best for their use case. But I think one big benefit of closed-source models that are served through an API is that we are seeing so many people who previously haven't done anything in the machine learning space but are extremely creative and able to build these incredible applications and iterate incredibly quickly. So I think what we're seeing is the birth of a full-stack engineer as being a very capable single-person company that can create many different possible product ideas and then see what sticks. And as soon as something sticks, then focus or double down on that.
Percy Liang:
And just to chime in here, I think API's and access are orthogonal. You can have API's for closed or open-source models. I completely agree with the power of API's because it means you don't have to set up a huge model yourself. But I think that's an orthogonal point.
Brian Krausz:
Yeah, I mean, at that point, I think of it on two different axes. One is the safety axis, which has been a lot of my personal focus lately. And I think closed-source models offer a lot more tools for safety. You've seen with open-source models, the ability to remove various safety checks with relatively little effort. And so if we're kind of in this space of quick moving and still trying to, as a community, wrap our heads around safety to the level of comfort that I know a lot of us to want, closed-source just gives us a lot more tools in the tool belt. And so, you know, clearly in the long term, having safety mostly built into the models where the model and the model weights itself layer in a lot of those things that we want is ideal. But until then, I think there will be a lot of layering of safety on top through extra queries or various other mechanisms to try and make things safe. So I think open does have the risk that the more models that are out there, the easier it is for people to hoist them up and abuse them. Though, of course, there are a lot of positives to that as well.
Boris Power:
Maybe if we can just jump in around safety. I think safety is a word that is going to be used for many different purposes. I think many people misunderstand why companies are doing this. Safety is really not censorship. And as these models get more and more powerful, the dangers will get higher and higher, reducing the amount of money or effort somebody needs to cause real-world harm. And it's not necessarily just about the models themselves being dangerous in some sci-fi scenario. It's humans using models to do worse things. And there is an argument that maybe the models that are out there today aren't as powerful. But I think even with GPT-4, the total economy starts blurring the lines. GPT-4 can be superhuman on a number of different problems, especially when it comes to combining knowledge from multiple domains that you can hardly find in a single person. So we just want to distinguish safety from simple content moderation and also more dangerous capabilities as well.
Percy Liang:
Maybe I can chime in here. I think safety is actually multifaceted. Maybe the top-level distinction I would make is between accidents and misuse. There are people who are well-intentioned but maybe slightly careless, who want to do the right thing. And for those, open source is actually great because in an open community, the majority of people are good. We should build as a community tools to do safety checks, detect bias, content moderation, and so on, with the benefit of these controls being open and auditable. And as long as the best practices are established, and it's easy for people to do the right thing, I think people more or less do the right thing. I think the second category of misuse is a little bit more challenging to deal with. I don't think this is an insurmountable challenge. I think for misuse, there is a trade-off between allowing people to use models while also putting in controls. I think this is just the nature of dual-use technology that we have to confront, whether it's generative AI models or a human doing it. It's an open question as to where regulations are and what kind of precautions need to be taken. But for the vast majority of people who are trying to do the right thing, I think open source can be very safe because you can have the open community develop these safety controls.
Brian Krausz:
Yeah, that's a great point. I think there are two questions at hand here. One is, in terms of the community as a whole, is it better to push towards developing open models or closed models? And where should the community as a whole go? And then for operators and people who are trying to integrate AI into their companies, is it better for them to use the current open or closed options? I think the former is a much larger debate compared to the practicality of what you, as founders and leaders, do today.
Vijay Narayanan:
Thank you. Thank you, gentlemen. Earlier, when we were chatting, you brought up this point that it doesn't have to be closed or open. There is a spectrum in between, partly closed, partly open.
Alex Waibel:
I think the only practical reality is that there are other factors that impinge on all of this. Because if you're totally closed, clearly, if you're interested in making money, every company has to make money. And the fact of the matter is when money is at stake, people tend to do things closed because you can sell stuff. On the other hand, I just came from Germany yesterday, where people are just itching to regulate the heck out of all of this because if you're a hospital in Germany, it is simply a nonstarter to send any data to the US to some server to do stuff there. Right? So there's a lot of discussion on having their own solutions and having different providers or even government efforts to create large language models, etc. That generates, of course, worldwide duplication of effort and doesn't really solve the underlying safety issues. So my question is, is it possible to have a half-open approach? Because if we manage as a community to modularize what language models do better, we might not be as scared as we are now. We are all buying FFT routines or basic routines, and we don't find them scary because it doesn't really bother me if they're provided by someone I can buy from. And if there are aspects of language models that can be modularized so that they can be licensed to run on-premises in some location without having to go through any service, I think there may be solutions in between. It's a bit speculative, but eventually, we'll probably be forced to move in a direction like that simply due to regulatory forces and pressures.
Brian Krausz:
Yeah, I think that's a fascinating point. We have seen a lot of market demand for at least more control of your data. I know OpenAI and Anthropics both have their foundation models, and we have our bedrock partnership with AWS. So at least in terms of hosting in the cloud, which gives better confidence in data privacy, it's something that both companies are actively exploring. I tend to treat open versus closed as whether you can get the model weights. Once your model weights are out, the cat's out of the bag, and people can start doing RL on top of them to make them operate differently, and you just lose a huge amount of control. But the idea of everything up to that or around that being open is fascinating.
Percy Liang:
Oh, I was just going to jump in. Again, I want to emphasize that it's a very multidimensional space. I think model weights are one thing. There is also data, documentation about the data, and other factors. I think there are benefits to openness, and there are obviously competitive dynamics and other safety reasons for closed models. We have to think about a menu where we can select different options and not just think about open or closed. One thing that would be nice is if there were more transparency on the data that's going into training these models, especially with issues around copyright or different types of liability. And for certain applications like children's education, you want to be extra sure that these models have certain properties. The fact that ChatGPT went off the rails a little bit was interesting to see how the safety on top may have been there, but underlying it was not. So, more transparency about what is happening, just like for many other things, for example, in the food industry, we demand to know where our food is coming from, whether it's organic, whether it's fair trade. I think having similar labels or specifications for these models would be great for the health of the ecosystem.
Vijay Narayanan:
That's great. Percy, just following up on that. I think OpenAI had this very interesting journey where they started out as being very transparent and open, publishing everything. And then over the last year and a half, they have increasingly pushed a lot of these things behind closed APIs. Could you walk us through what really led to that change in direction and why OpenAI took that route?
Boris Power:
I think OpenAI genuinely believes that we are getting closer to artificial general intelligence and superintelligence. And as we get closer, the standards need to be higher. So when GPT-2 was released, it was open source. With GPT-3, we released it closed source with some information about how it was trained. And with GPT-4, there was a paper that released the model specifications but not as much about the training details or the weights. I think the decision was influenced by the fact that as these models get more capable, it becomes much more important to ensure that safety is up to the highest standards. By serving the models through an API, we can have a better understanding of what can be done with these models. Before releasing a model, we don't have a good understanding of its true power. So the gradual deployment approach allows us to monitor and assess the safety implications. The Overton window regarding what is considered dangerous is constantly shifting. OpenAI has received feedback from both sides—one side saying we're being too open too fast, and the other saying we're not releasing enough information. It's important to look ahead and consider whether it would be responsible to open source the weights of future models like GPT-5 or GPT-6. Does it pose real-world dangers? Does it increase the risks of accelerated competition that might compromise safety? These are the considerations that drive the decision-making process.
Alex Waibel:
I find some of the discussion a bit naïve because it's a few companies now thinking about how they can create a morality that will work for everyone on the planet behind closed doors, and that's just not going to work. Even in the United States, there is so much disagreement on politics that if you have answers. The ChatGPT and other models already have limits on what they can answer. For example, when I asked ChatGPT which president was better, I received a programmed response of "I don't know." I think we need to think about having trainable moralities because different parts of the world will think differently. Attitudes towards food or other things will be very different in Europe, the US, and China. Dealing with these issues within one company was a mistake in social media, and I think it may happen again with large language models if we're not careful. We need to worry about how to formulate the problem of morality in a way that becomes specifiable and trainable locally so that people can create their own local ecosystem and provide the desired guardrails in their environment. The language model's quality depends on the provided specificity and locality, which may differ in different parts of the world. We need to modularize and localize the approach to achieve that. We can have interactive ways in which language models refuse to provide service when touching on taboo topics, such as Taiwan or Ukraine, depending on the location. We need to modularize this and allow for specificity and locality in different places. It's doable, and it's important to ensure the quality is as per the requirements in each specific context.
Brian Krausz:
I think you're absolutely right, and I think the default world is companies trying to become the arbiters of truth. And hopefully...
Alex Waibel:
That's a disaster. Yeah, hopefully...
Brian Krausz:
Yeah, I do think that the progress towards that needs to be commensurate with the safety technology. And I think while there's a lot of research going on, I just think until we have the ability to do that in a safe way and start to provide those levers leading towards closed, it just gives you more overall safety. But if you hold on to that too long, as companies often do, you end up being social media.
Boris Power:
It's a common misconception that things are hand-programmed. We... it's not that we can program anything like there's a regex that catches Trump or Biden and does something special with it. This comes from the fears and dangers around the political campaign misinformation and being able to generate that at scale. Yeah. The other thing that we agree on, which is I think we believe that the bounds should be set by what the user wants. The user should be able to set what their preferences are. But we still believe that it should be kind of very, very broad, as broad as possible societal bounds of what's the level of acceptability. And then within that, the user should be able to specify what's their worldview, what are their values, and how do they want to interact with this entity?
Percy Liang:
Where does that come from? I know it's not a regular expression, but the data of how you answer this question has to come from somewhere. And it's either annotators or some sort of constitutional specification or user feedback. And I think there's a sort of general sense that, "Oh, we're aligning to human values," but whose values are these? So, you know, can you actually say a little bit more about how, if you ask Trump or Biden, how does that decision come from?
Alex Waibel:
Let me just jump in, as this is such a trivial example, right? Because your question can be much more complex, and you would know, and your morality is even a hierarchy of complex issues, right? There was this wonderful interview on Gizmodo on this ChatGPT and then saying it was doing the black. You all heard this, right? About the blind man. It's doing the captures, right? You know, the story went viral because ChatGPT was lying, pretending that it was not a bot, it was a blind person trying to do the captures. And then the interviewer suggested, "Oh, I would never lie to be a blind person just to solve a CAPTCHA." And if you think about this hierarchy of morality, there are certain things we as humans are willing to lie over and certain things that we're not. It's not just Biden and Trump, there's just a whole hierarchy of things hiding behind it, which need to be addressed.
Vijay Narayanan:
In the interest of time, if I may step back a little bit. I think we have about five minutes from now, so if each of you can just share very quickly, what do you think? I mean, we talked about a lot of challenges, right? So what do you think are going to be the top two or three challenges, research challenges that we need to get right in these foundations?
Alex Waibel:
Let me just maybe comment exactly on this. I think we need to find a mechanism of making trainable moral guidelines that make that more automatic and local. And I think that's doable. That would be my vote.
Brian Krausz:
Right. The challenge is, I think safety as a whole is a huge challenge. And I think this ties into like you said, morality, like the safest AI is one that refuses to answer any questions, but that's also not helpful. So we're not going to train that.
Alex Waibel:
It could make it interactive, right? Could ask a question.
Brian Krausz:
Yes, then. So there's a lot of nuance in what it means to be safe and how you get safety. And I think one that needs to be built into the models. And two, we need to figure out how to operate that in the real world. You know, a safe model in a lab is not necessarily a safe model in the real world. And then people do crazy things out there. And so figuring out how to layer on, in a modular way, as you said, how to layer on safety, both built into the models and wrapped in the ecosystem around it, I think it's a huge challenge.
Alex Waibel:
Let me add watermark, by the way, to my list of challenges: safety and watermarking. Yeah, because at some point, you want to know, is this a bot? Or where does this come from, right?
Brian Krausz:
Potentially, I think there are ethical and philosophical questions on that. I think it's a very good idea. But I think it also needs research.
Boris Power:
I think how do we create incentives such that there is no race to the bottom with regards to safety for everyone? How do we coordinate between different research labs, different countries internationally? How do we establish governance around what values should be in these models? And also, what do we want? What do we think is the reasonable risk that we want to go ahead with? And I think the regulatory challenges around that as well.
Percy Liang:
I'm personally interested in thinking about a different paradigm for how all this could work. In the sense of right now, the paradigm is we have organizations that gather data from the internet, maybe supplemented with annotations, train a large model, whether it be closed or open, and deploy it to users. Right, this is a world that we're used to. But this doesn't have to be the world. I think of Wikipedia. Everyone loves Wikipedia, right? And I think this is actually a really good example of something that really shouldn't exist. Like, why does it work? You could think that, you know, I guess Britannica and all these were the kinds of right ways more centralized ways of creating knowledge. And Wikipedia was, you know, some crazy experiment, and with decentralization. And think about it, this is the world's knowledge base, but anyone can go and edit it and change it, and the edits are... and of course, you build safety on top of this ecosystem, right? You can't just have a complete free-for-all. There's strict governance. But fundamentally, it is an open ecosystem. And it's been remarkably successful and stable. I actually don't... I'm surprised that it's so, so good, despite this fear that you would have 20 years ago, and maybe you still have, that anyone can just go and vandalize and screw up, and then we would have tons of disinformation. So, LLMs are different. So I'm not saying that we just do Wikipedia and we're done. But I think it's important to remember that this is a very early stage, we just, you know, discover flight or fire electricity or wherever whatever you have, what your favorite analogy is, it's... we have existence proof that certain capabilities are possible. And now we think we need to think about how do we want this technology to develop in society? How do we want... How should people contribute to it? How should the profits of value be shared across people? And how do you keep it safe? And I think these are vastly open questions. And I think we are in a local optimum, where I think it's fundamentally... yeah, there's this trade-off between open and closed and safety and transparency, which maybe if we just step back and kind of redesign the system, you know, we would be in a different spot. And I want to emphasize that, again, it's not this... if you think open, close, you're choosing between two things, I think it's a false dichotomy. And I don't know how to do this. I mean, this is what I think about, from, you know, this is what I think academia should be thinking about, the next wave of how do you architect these socio-technical systems. But I think, you know, this is going to be a big deal. And I want to make sure we should all make sure that this technology is adopted and developed in a way that benefits us all.
Vijay Narayanan:
Thank you, Percy. Unfortunately, in the interest of time, I think we will have to wrap up by now. Alex, do you have any final thoughts?
Alex Waibel:
I think I just wanted to second that and maybe follow up with the thing. I think we need greater research on plausibility and confidence metrics, not just performance because we can maybe get performance up to 95%. It's still, you know, in speech recognition, we have the same problem or language translation. It's always a poisoned cookie, that last 1%, that hurts. But if we knew where that 1% was, you know, we can work around that as humans, and we want to be involved as humans, I would say.
Vijay Narayanan:
Thank you, gentlemen, for a very enlightening panel today. And with that, we'll wrap it up. So thank you, everyone.
Call to Action
AI enthusiasts, entrepreneurs, and founders are encouraged to get involved in future discussions by reaching out to Fellows Fund at ai@fellows.fund. Whether you want to attend our conversations or engage in AI startups and investments, don't hesitate to connect with us. We look forward to hearing from you and exploring the exciting world of AI together.