Speakers:
- Snorkel AI: Alex Ratner, Cofounder and CEO
- Spectrum Labs: Ryan Treichler, VP of Product
- Fiddler AI: Krishna Gade, Co-founder & CEO
- Jasper: Shane Orlick, President, Jasper
- Coatue: Lucas Swisher, Co-COO of Growth & General Partner (Moderator)
Summary:
During the panel discussion led by thoughtful leaders from Snorkel AI, Spectrum Labs, Fiddler AI, Jasper and Coatue, various aspects of AI were explored, including the AI stack, areas for improvement, the value of different layers, and challenges in AI deployment. The participants emphasized the significance of the data layer in achieving better performance and value in both generative and predictive AI. They also discussed the importance of trust and safety in AI, addressing concerns about biases and the need for safeguards.
The panelists highlighted the importance of solving real problems, building trust, and creating differentiated experiences to keep AI teams motivated and attract top talent. They emphasized the need for a balance between innovation and meeting the practical goals of enterprise customers. The discussion also touched upon prioritization strategies, with an emphasis on product-led growth (PLG) and demonstrating value to customers quickly. The panelists acknowledged that success can be found in both a PLG approach and focusing on enterprise solutions, emphasizing the need for a strong product-led growth engine and a robust enterprise motion.
Overall, the panel discussion provided valuable insights into hyper-scaling in the AI field, identifying areas for improvement, addressing deployment challenges, and discussing strategies for growth and innovation.
Video:
Full Transcripts:
Lucas Swisher:
Hello, everyone. Thank you for joining all of us. We are very excited to have a number of great founders and execs here on this panel to discuss hyper-scaling in this current environment. Obviously, things move very quickly in AI these days. Every 30 seconds, we get a new press release or a new product announcement. And I think we have some of the world's experts in and around this category that can help us think through some of the challenges around hyper scaling. So just to kick things off, I figured I'd open up a question for the entire group. There's obviously a stack emerging around AI. You have the infrastructure layer, the language models, the tooling layer, all the way up to the application layer. I'm curious what folks think are some of the areas that need the most improvement or where there are the most opportunities, and where folks think the value might accrue at the end of the day.
Alex Ratner:
I guess I can take a first crack at this. I'll try to be a little bit more inflammatory to make this interesting because, you know, the last few panels I've attended had too much agreement. So let me see if I can strike up some interesting disagreement. I think there's an infrastructure layer, which includes everything from the hardware to the platforms for doing what we think of as model-centered development, training, etc. And then I think there's a critical data layer. Am I supposed to introduce myself or not? Okay. If you look me up, you'll see where my biases are on what we call data-centric AI. But that data layer is relevant for both generative and predictive AI. It includes labeling, classifying, and tagging, which traditional business values are based on. And then you've got the application layer, which for predictive often looks like traditional ML Ops, and for generative, it's a whole new set of copilot-style stacks and others that we're still inventing. But to make a more declarative claim, I believe that everything but the data layer, and maybe the UI layer for generative, is effectively commoditized. The models, the algorithms, the infrastructure, for basic stuff, are all but the most specialized bespoke edge use cases. It's all about the data. These days, we're seeing tons of examples in the environment where a new language model is released every five seconds. Now we're contributing to some of the open-source ones. You can spend a couple of hundred bucks and siphon off all the learnings from a closed-source model into the open-source. Everyone's using the same models, everyone's using the same algorithms. We recently released a paper, affiliated with the faculty at the University of Washington, along with Apple, Google, and Stability AI, among others. We just released this benchmark called DataComp. This is my last point before I wrap up. But we fixed everything in the code, from basic data loading to model architecture to training a Clip model, which is a multimodal image text model for those who are familiar. And just by playing around with how the data is curated, filtered, and sampled, we achieved a new state of the art that beats OpenAI and everyone else in compute parity. So what's the point here? My claim would be that all the layers other than the data layer, especially if you're interested in enterprise and domain-specific private enterprise data knowledge, are either already or rapidly becoming commoditized. This is awesome for the space, but it's interesting, particularly for hyper-scaling when you don't have some kind of advantage around unique data or knowledge.
Krishna Gade:
Yeah, I think I second that. I think the other missing link in the generative AI world is the layer that can provide safeguards and guardrails for responsible AI usage. That's our mission at Fiddler, building trust and transparency in AI. Generative AI is even more magical than predictive AI, and we need to ensure that AI is serving society and customers in a trustworthy manner. I believe this area is underinvested and needs attention because companies won't productize AI apps without addressing this issue. We've been working on this at Fiddler for the past four years. In fact, we recently released an AI auditor to audit large language models and third-party models, looking for correctness, consistency, and safety issues. That's what we believe is the missing link.
Shane Orlick:
Cool. Yeah, I agree with everything you said as well. Today's a great day to be in the video at the end-for-layer. But I think only a handful of companies are going to make a lot of money in that space. For us, at Jasper, we're biased toward the application layer. We have the opportunity to revolutionize the way enterprise software, particularly the UI, works with customers. We've been talking about productivity gains in enterprise software for the last 20 years, but in reality, we haven't really delivered on that. We've made incremental improvements, but AI now has the opportunity to transform the way people interact with the software they use every day in their job. That's where I get most excited—how AI will transform the way people work. It's also the most noisy and challenging aspect because choosing the right solution and the right products to adopt is a big challenge right now. But that's where I see the most potential.
Ryan Treichler:
Let me bridge the first two comments. I also run a Trust and Safety company that has been in the space for a while. I definitely see the need for that in the market. But the thing that actually drives the ability to do that effectively is data. The only way to solve a lot of these problems is with data. So I think it really does come back to that very first point. Without data, everything else is moot.
Lucas Swisher:
Yeah, I think that's a really good point. Krishna, you played into Ryan's point a little bit as well on the trust and safety piece, which I thought was interesting. Shane, to touch on Alex's point and question, you obviously have a lot of models on the backend of Jasper that you either play with or put into production. I'm curious, what are your thoughts on the open-source models versus the proprietary models? What do you think about the similarities and differences? What do you all prefer? Do you think that Alex is right about the commoditization aspect?
Shane Orlick:
Yeah, I'm a huge fan of innovation, obviously. So anything that brings additional technology into the world, as long as it's safe, unbiased, and differentiated from all the other models, I'm a huge fan of that. At Jasper, we quickly realized that it wouldn't be just a single model that would win. We initially started as a wrapper on OpenAI, but then our customers started asking for different experiences and use cases. So we built a really open AI engine that can plug into all different models. So for us, the more models, the better, as long as they're safe and provide differentiated experiences. We even built our own small models that are finely tuned and outperform the large models. So for us, we're totally open. The more, the better, as long as they're headed in the right direction.
Lucas Swisher:
Great, thank you. What do you all think are the most overrated growth opportunities and underrated growth opportunities in the generative AI stack?
Krishna Gade:
I can go first. Maybe this is a controversial thing. I do think I somewhat align with what Alex was saying. I'm doubtful about the large language models that companies have been investing millions of dollars in training. Eventually, I think smaller language models trained on domain-specific datasets can achieve comparable or even better performance. So I think a lot of investments have happened in the last few months on building these large language models, and I'm not sure if there will be long-term value there.
Alex Ratner:
I'll second that. I don't say this with sadness because the companies that have been building these large language models deserve credit for pushing the field forward. But I do think there's an overestimation of how much value they will capture. I believe there will be one or two winners for general commodity or consumer use cases, soaking up data from the internet and user feedback. It will likely be a winner-takes-all situation because it's all about the data and the exponential flywheel effect. For specialized tasks, we've already reached a saturation point where taking an open-source base and fine-tuning with your own data and knowledge will surpass the performance of large models. Even logistic regression models a million times smaller can outperform a GPT if trained properly. So, in my opinion, the value lies in specialized models that are tuned and built based on user feedback. I would say the most overrated are all but maybe one or two of the big closed-source models, and the most underrated are the incumbents—large companies with specialized data and existing workflows that can leverage open-source progress and operationalize their data and knowledge. These incumbents are underrated in terms of who will benefit from this tech shift.
Lucas Swisher:
Great, thank you. Alex, you've talked a lot about how data bottlenecks can hinder enterprises from harnessing the power of AI models. In general, what do you consider to be the major barriers and the missing link in the last mile of AI deployment? We often discuss AI in production, but we haven't seen a lot of it yet. There are many impressive demos out there, but how do we actually get AI into production and live for millions of users? What are the biggest bottlenecks?
Alex Ratner:
It's a great question, and I want to start by saying that despite some of our marketing phrasing, data is not the only bottleneck. It's actually well-represented by this panel. Explainability and interpretability, the right UI interface, especially for these new generative paradigms where we're inventing new ones every day beyond just running model predictions, trust, and safety—these are all practical bottlenecks, especially in larger regulated industries like banks, government, healthcare, etc. However, data is definitely one of the bottlenecks that often prevents enterprises from reaching the proverbial 80% to 90-95% or even 99% for production. My co-founder, Chris Re is a professor at Stanford and helped found the Stanford Center for Research on foundation models, where the reasons why we say foundation model. One, they're not just language sets, any graph structure, any data, not just chains of tokens. Two, it's not just Gen AI, it's pretty boring, old predictive AI use cases get built on top of these. But number three is that these are usually first mile tools, they get you to the 80% use case for most nontrivial tasks that are not easy tasks on web data trying to do extraction from some documents in a bank or something from satellite images at a government agency. Like all these bespoke use cases, these things are an order of magnitude improvement in the first mile. However, the last mile is what often blocks enterprises from achieving production-level deployment. Most enterprises cannot deploy a model that only achieves 80% accuracy, except perhaps in a co-pilot type paradigm, which is still relatively new. So how do we go from 80% to 90% to 95% or even higher? It all comes down to teaching the model with data. You may hear terms like fine-tuning, instruction tuning, or RLHF prompting, but it's essentially about using labeled and curated data to teach the model about a specific task. This is where many teams get blocked because data science teams often rely on line-of-business partners at companies like banks, pharma companies, or health insurance companies to label more documents, which can cause significant delays. This is where our mission comes in—to accelerate this process. In general, data is an underappreciated bottleneck in AI deployment.
Lucas Swisher:
That makes sense. Another bottleneck, of course, is trust and safety. With the explosion of content generated by AI in various modalities like text, images, and videos, how do you think about trust and safety in this new era of content generation?
Ryan Treichler:
It's been really interesting to see and discuss how challenging trust and safety has become. It used to be about detecting if someone was saying something harmful, but now we have to consider if the model can be prompted to do something harmful. The difference is that we used to look for patterns indicating if an adult was trying to groom a child, and we had tools to detect that. But now, people are using generative AI to create harmful imagery or toxic content by manipulating existing content, and bypassing established protocols. In the trust and safety space, generative AI has made it easier for people to circumvent existing tools. However, generative AI can also be used to combat these issues. We can use generative AI to create synthetic datasets and larger models that help detect harmful content. While we haven't seen a significant explosion of harmful content yet, many are concerned about its rapid growth. Customers often approach us looking for ways to identify and detect harmful content generated by AI. It's a challenge that we need to address, especially considering the potential for spear phishing, radicalization, and other harmful activities. Additionally, there is a deeper level of trust and safety that involves addressing bias in the training data used for these models. Biases in data can lead to unintended harm, and it's important to proactively remove biases and ensure fairness in AI systems.
Lucas Swisher:
Ryan, I think you just made a pitch for Krishna's startup. Krishna, how do you approach the problem of trust and safety and the potential bias in generative AI?
Krishna Gade:
Yeah, absolutely. There are two things here, right? One is the productization of trustworthy generative AI apps or any AI apps, which requires companies to invest in ML infrastructure. When we started Fiddler, we were evangelizing explainability and observability for traditional machine learning models, and companies had to build all of this ML infrastructure for model versioning, governance, and performance monitoring. Fast forward four years, and there are still many companies that haven't embarked on this journey. Now with the advent of LLMs and the concept of LLM ops, companies need to build the necessary infrastructure for productization. Without addressing these issues, such as observability, setting perimeter controls, and addressing bias and safety concerns, it will be challenging, particularly in segments like financial services, healthcare, or HR, where regulations are likely to emerge to ensure trustworthy AI applications. There's no silver bullet; it requires hard work. As Peter Norvig mentioned in a recent panel discussion, building trustworthy ML applications necessitates ensuring the model performs well on test data, understanding its workings, continuous monitoring, and having alerts in place for when things go wrong. It's not a free lunch to achieve a positive return on investment.
Alex Ratner:
And just to tack on quickly, first of all, plus one to Ryan's points about the danger here. I mean, I think some of these more pedestrian dangers beneath, like Skynet taking over, are just mass-scale disinformation and other things like that. It's really exploding. And it seems like it has the potential to be explored much more than that. I was on a panel like a year and a half ago at Google Ads AI internal conference, and I kept getting questions from folks there about what happens when the web is filled with auto-generated content. I was like, what are these people asking me about? It seems very sci-fi. I thought I wasn't getting nerdy questions about loss functions and stuff like that. But no, there they are, right? This stuff is real. And then I think the point also is that it does come down to the data. I have to take the bait and stick to our stick. But that is something that's critical. And one thing I'd like to point out, which I think is a little helpful, is that people talk about biases and hallucinations as the new term du jour. And it's true. There was a paper that came out last night or two nights ago at one of the NLP conferences that look at factuality. If you ask ChatGPT or GPT-4 and then cross-check it on Wikipedia, and you look at all the atomic statements it makes and how many of them are right, GPT-4 is like 48%. And you can help with retrieval, and there are all kinds of things there. But still, biases are real. This is real. But I think it's worth accentuating that one of the reasons I don't like the term hallucination is that it seems mystical or emergent or surprising. It's not at all surprising. These models were never trained to do some kind of task or answer some kind of question in a factual and bias-free way. They were trained to present statistically plausible completions of prompts based on the data set that was randomly cobbled together, and they soaked up. So it's not at all surprising to get these things to actually work bias-free and error-free in specific settings. Surprise, surprise, you need to actually put some intentionality into the data at all stages that go into the models. They're just counting machines on top. It has been the same for decades. So I think that's a plus one on those points, but also, there's no mysticism around it. It's just a missing step. And the thing you have to do to get serious with AI deployment is get the data right.
Lucas Swisher:
Yep. Changing gears just a little bit. You're all part of companies that are extraordinarily fast-growing, both in terms of headcount and metrics and all of these things. How do you keep your teams focused? How do you keep attracting the best talent, retaining the best talent? And maybe most importantly, in this environment, how do you keep innovating when things are moving so quickly?
Shane Orlick:
Yeah, it's so exciting to hear everybody talking about this final mile problem. It's like autonomous vehicles, where the last 10% is 90%. We've been doing generative AI for the last two and a half years, which doesn't seem like a long time. But in reality, it was two years before GPT. We rolled out generative AI, and content creation was this big "wow" moment for a lot of people. But then it played out for a year, and we got bored of it, and our customers got bored of it. So we started really focusing on what our customers want and need. How do we actually make their lives better? It turns out, it's not just content creation. It's also distributing it in the right mechanism, analyzing it, improving it—this whole chain around the problem. By solving that problem, we're able to attract the most talented people. We're not just solving this high-level, generic problem of putting an API in our product and claiming to be an AI company. We're actually solving real problems for real customers and making their lives better. Right now, talent is the key. It's not about having a million ideas or money; it's about having the right people who will execute that final mile and care passionately about the problem they're solving. By focusing on our customers, we've been able to attract good people. Once you have good people and you listen to them, the rest falls into place. But it's exciting to see companies go beyond just putting an API link in their product and seeing their stock go up. The real question is, how will people actually improve the way they work or live because of this?
Alex Ratner:
Just a quick plus one. It brings up a point: Even evaluating how accurate we are with these new-gen AI models, let alone how that maps to real ROI or business value, is a legitimate technical challenge these days. I think that says a lot about the state of where we are. We're still struggling to understand how to measure that, at least. It's something we have to do before we can really get serious about improving it. We're in a phase where we have to move towards that. I'll give a quick answer. There are all the standard challenges of growing quickly, about organizational maturation. Others can speak more about it—the maturing of sales, transitioning from founder-led to a more rigorous process for assessing technical and business value and alignment, and building up the product engineering org. People like to hack, and there's a lot of excitement about doing a three-person generative AI seed startup where you can hack together a demo that looks awesome in 15 minutes or less. But how do you keep people like that excited working on real systems that take more than 15 minutes of hacking? Not everything is about glory on Twitter. It's a challenge we think about a lot, especially since we sell to enterprise customers, working with top banks, government agencies, insurers, and healthcare organizations. How do you bridge the gap between our internal teams that want to be hacking on cool stuff and our customers, who are data scientists watching the same Twitter threads? It's about finding a balance between innovation in a fast-moving space and meeting the pragmatic goals of enterprise customers. We have to stay grounded in the things that actually matter while doing the new stuff. 80% of our customer base in the Fortune 1000 isn't using foundation models in production, and they probably won't for their critical high-value applications for at least a year or two, if not longer in some regulated industries. So how do you serve those basic, pragmatic enterprise goals while being exciting enough internally and externally with all the new stuff? It's all about balance, one of the challenges for AI companies.
Krishna Gade:
I think I was talking to one of our banking customers, and on one side, we are looking into how we evaluate and audit LLMs. This banking customer has a Simple NLP-based chatbot that uses a decision tree-based routing mechanism, and they're asking if the feedback from users can be monitored. They don't have any basic monitoring to know how many times the chatbot is actually working well for them. So yeah, there is a vast difference in terms of AI maturity across the board. Especially with AI startups, there's the innovation part, but there's also the real bread and butter business where you have to do a lot of basic things.
Ryan Treichler:
Building on what Alex said, one of the things that we run into is that it's really easy, particularly with LLMs, to get an experiment out and do something in 5-10 minutes. But when we start looking at the results, it may look really good on the surface, but then we start probing and realize that there are issues. We find things that are wrong, especially when we go across languages. So the ability to take that initial experiment and turn it into something valuable can sometimes be difficult because people get sidetracked. They say, "Hey, you did the thing in five minutes and showed that it worked. Now it's going to take a month and a half to get it into production." There's not always an appetite for that. We also struggle with distinguishing between a shiny object and a paradigm shift. We need to determine if something is a major change that will be valuable and worth investing time in, or if it's just a novel thing that won't move the needle. When you have creative and talented people who like to hack, they may run away with shiny object stuff that may not be a paradigm shift. So it's important to stay focused on the big things while allowing creativity and exploration.
Lucas Swisher:
That's an important point. I'm curious, how do you prioritize from a product perspective? How do you balance your own roadmap and vision with what's happening in the ecosystem? How do those things interact and how do you prioritize?
Shane Orlick:
For us, we want to move quickly but deliberately. It doesn't have to be perfect; it just has to be fast. We look at where adoption comes from our customers, and we'll invest more there. We've shifted into a more structured and predictive environment. Before, we would run a bunch of experiments because it's so easy and quick, but then we didn't know which ones worked. Now we have a dedicated growth team that sits cross-functionally across product engineering, sales, marketing, and more. They look at funnel metrics. We have a spreadsheet with over 20 experiments, and we make people commit to what problem each experiment is trying to solve—whether it's reducing churn, increasing expansion, or improving engagement. They assign a number to project the impact, and we prioritize from there. Success mapping helps with these decisions. In this environment, as an AI company, it's easy to grow quickly and get customers, money, and traction. But it's important to go back to the fundamentals and not get too carried away with the hype. We had to slow down and focus on the basics, and it made a big impact on the business.
Alex Ratner:
I think that point about experimentation and anchoring those experiments on real metrics that tie to some end outcome or business value is really, really great. If I'm paraphrasing in an OK way. Yeah, but you know, we've tried to think about how we prioritize which use cases we build around. We have a platform called Snorkel flow, which is for doing labeling and other data operations programmatically and in a first-class way, rather than treating them as second-class manual processes. It serves a broad surface area because any model you want to build requires some aspect of data labeling or curation to get it right. So for us, it's about being experimental. We see if someone will pay for it and learn from that. We have teams that sit between go-to-market and engineering, and they continually explore and experiment. We put things out there through academic labs or open-source demos and gather proof of concept. Then we work with a few design partners for each new use case to validate if they'll pay for it. We only work with people who are willing to pay. This approach gives us better data on where we should build, considering the broad surface area of AI technology. It may be slower and more painful, but it helps us make informed decisions.
Ryan Treichler:
There's a marketing technique called painted door tests, where you create a fake version of the product and build web pages for it. You see if people get all the way to the purchase funnel, click purchase, and then it says it's out of stock. It helps validate if people will actually buy it. We use similar techniques to validate interest and willingness to purchase before actually building the product.
Alex Ratner:
I love that example. You can design your product to facilitate this kind of experimentation. We build our product with an SDK, a Python SDK, and a UI layer. We can do marketing tests with the Python SDK for more advanced users and graduate it based on data. Designing your product in this way allows for experimentation and learning.
Lucas Swisher:
This has been fantastic. We have time for one, maybe two more questions.
Audience Question 1:
Hi, my name is Konstantin Bayandin, and I'm the founder of Tomi.ai. I have a question about the hyper-growth in Gen AI. Where should a founder start in this field? I love the concept of creating value with general models for the mass market, rather than building a vertical application for existing incumbent players. Data collection and ownership seem to be key advantage in this game. There's a spectrum of things a founder can do, from quick product-led growth to long experiments within large companies. Do you believe one side of the spectrum will see more unicorns and value creation, or are both sides equal?
Alex Ratner:
It's a great question. I think both sides have advantages and disadvantages. It really depends on the specific circumstances and the opportunities available. Companies like Jasper that can demonstrate value quickly and have a product-led growth model can certainly see success. On the other hand, large companies with vertical expertise and extensive data may have longer cycles but also possess valuable assets. Ultimately, it's about finding the right fit for your product and market. It's difficult to predict which side of the spectrum will see more unicorns or value creation. Each approach has its own unique opportunities and challenges, and success can be found in both.
Shane Orlick:
So I'll take a stab at this first. Before Jasper, I was historically an enterprise person, so the organizations I was part of mostly had an enterprise motion. Regarding the long sales cycle you mentioned, with Jasper, we have a product-led growth engine that now feeds into our enterprise motion. So my short answer is both. If you can create a flywheel effect, even if it's not your full product but a way to generate leads and get people in the door, I love what you said about the trial. That's exactly what our freemium and self-service products do. The product on our website starts at $50, or it's $29 up to $99. If you want to use that product, it's a single-player motion, and it may extend to four or five seats. Once it reaches 10 seats, it transitions into the departmental enterprise motion and goes through a sales organization with a customer success team and dedicated marketing support. So if you can feed them, it's much better. At my previous company, we had 90 BDRs making cold calls to set up meetings for the sales team, and it was expensive and slow. By the time you reach that stage, the AI landscape will have advanced even further. Therefore, I believe you need to have both approaches.
Ryan Treichler:
I agree with that. In my recent research, I found that there were five or six unicorns in this space, excluding OpenAI, and they all had a product-led growth (PLG) component. If you try to focus solely on enterprise without product-led growth, you'll be selling yourself short and struggling to capture market share. Many companies have made it easy for customers to try their products before committing to an enterprise motion. It's challenging to put customers through a lengthy enterprise process when they can try a competitor's offering for a low price or even free. Speed is crucial, as by the time you build your enterprise solution, someone might have already developed a similar out-of-the-box solution that customers can download and use. Being a unicorn doesn't guarantee relevance today. The focus should be on how to remain relevant in the coming years. That's why we prioritize PLG, which keeps us connected to our customers and their needs. Listening to them helps us determine what to build. However, you still need a robust enterprise team to support customers who require a more comprehensive solution.
Lucas Swisher:
Got it. Thank you. We had time for one question, but appreciate everyone's participation. Thank you to the panelists as well. Thank you.
Call to Action
AI enthusiasts, entrepreneurs, and founders are encouraged to get involved in future discussions by reaching out to Fellows Fund at ai@fellows.fund. Whether you want to attend our conversations or engage in AI startups and investments, don't hesitate to connect with us. We look forward to hearing from you and exploring the exciting world of AI together.