Speakers:
- Zoom: Vijay Parthasarathy, Head of AI
- Fireworks.ai: Lin Qiao, Cofounder and CEO, Co-creator of Pytorch at Meta
- ServiceNow: Ravi Krishnamurthy, Sr. Director of Product
- Google: Jianchang Mao, VP of Engineering
- Instacart: Haixun Wang, VP of Engineering (Moderator)
Summary:
A panel of tech experts from Zoom, Meta, Google, ServiceNow, and Instacart explores the emerging field of Machine Learning Operations (MLOps) and the concept of large language model operations, known as foundational model operations (FMOps). They emphasize the increasing significance of MLOps in optimizing machine learning life cycles and productivity in various industries. The panelists share their experiences in implementing and managing MLOps, highlighting the challenges of integrating new components seamlessly and the complexities of foundational model operations.
The discussion focuses on understanding foundational model operations and how they differ from traditional MLOps. The panelists address challenges such as the size and complexity of foundational models, their frequent updates, version control, and the importance of governance. They also discuss the coexistence of MLOps and FMOps in a company's ML stack, with the choice depending on factors like use case, performance requirements, computing resources, and cost considerations. The panelists predict that foundational models may gradually replace specialized models over time as they become more efficient and capable.
The panel concludes by noting the importance of flexibility and continuous innovation in managing MLOps to meet evolving demands and challenges. They anticipate that the AI landscape will find a balance between a single foundational model and numerous task-specific models, with enterprises selecting cost-efficient and high-quality solutions based on their needs.
Video:
The full transcripts:
Haixun Wang:
Welcome back, everyone. At this event, we have an exceptional panel that will delve into a fascinating problem. Just a few years ago, many companies started recognizing the significance of MLOps. MLOps helps in developing and managing the ML lifecycle, greatly improving productivity in companies. As someone who has managed ML and MLOps, I understand the immense effort put into this domain. Every company has a unique ML stack, and we work with various service providers to integrate components into our stack. The challenge lies in making these components work seamlessly with the rest of the system. Just as we were about to reap the benefits of MLOps, a new concept emerged in the ML world known as large language model Ops or foundational model Ops. This raises questions like: Where does this fit in? Does it mean discarding all MLOps and starting from scratch? And most importantly, what is the definition of Large Language Model Ops or Foundational Model Ops? It's not entirely clear. That's why we have organized this distinguished panel, consisting of solid leaders and technical experts from established tech giants like Google to innovative startups. Instead of me introducing each of them, I'll let them briefly introduce themselves before we begin the panel.
Vijay Parthasarathy:
Hello, I'm Vijay Parthasarathy, Head of AI at Zoom. At Zoom, we focus on various problems ranging from computer vision to NLU and large language models. Excited to be here.
Lin Qiao:
Thank you for having me. My name is Lin. I have been working at Facebook for the past seven years, primarily focused on building our platform and fully productionizing PyTorch. We support all product needs across different surface areas, including deployment in data centers, mobile phones, and AR/VR devices. Our main challenge is connecting AI innovation with business tasks at Facebook. Now, as I look at the industry, we are going through a tidal wave of AI-first transition. Last year, we started a new company called Fireworks with the mission to connect AI innovation with enterprise products, making them better and transforming the enterprise. If you have any questions, I'd be happy to chat about them.
JC Mao:
I'm JC Mao. Thank you for having me here. Previously, I was the VP of Engineering for Google Assistant, responsible for developing the Google Assistant product and foundational technologies, including speech and natural language processing. These technologies power not only Google Assistant but also other Google services, including the cloud. Before that, I served as a corporate VP at Microsoft, responsible for advertising, including product management, engineering, and operations.
Ravi Krishnamurthy:
Thank you for having me. I'm Ravi Krishnamurthy, running the product for the platform at ServiceNow. Our MLOps face a lot of complexity as we serve a platform with numerous customers, each with their specific training needs. We've been utilizing language models, but rapid scale and speed have accelerated the evolution of our stack. Our focus lies on specific use cases within our domains, as well as enabling customers to incorporate AI into their workflows and applications. Operationalizing AI and ensuring responsible governance are key aspects we tackle in our MLOps. I look forward to sharing our experiences with all of you.
Haixun Wang:
Thank you very much. I'm from Instacart, and it seems we are about to become clients of each of the companies represented here. It's an interesting setting. Now, let's move on to the first question. We believe foundational models will be a crucial component of the modern AI stack. They offer the potential to enable a wide range of applications with a single model, through fine-tuning and other tuning methods. So, the question is, how do you see foundational model Ops differing from MLOps? What changes come into play? Anyone?
JC Mao:
I can start by addressing some of the obvious differences between FMOps and traditional MLOps. The deployment and management of foundational models in production and their lifecycle is the primary focus of FMOps. This domain presents unique challenges that set it apart from traditional MLOps. Let me highlight a couple of these challenges, and others can chime in as well. First and foremost is the size and complexity of foundational models. These models are typically very large and complex, often consisting of hundreds of billions of parameters and even reaching trillions of parameters in the near future. In fact, there are already models in the labs with over a trillion parameters. This requires substantial computational resources for deployment and management. FMOps must tackle the challenge of deploying such large models across distributed systems, leveraging specialized hardware accelerators like GPUs and TPUs.
The second challenge lies in model update and version control, which is crucial because foundational models are frequently updated. OpenAI and Google, for example, have already released multiple updates since the introduction of GPT and ChatGPT. This rapid update pace is likely to continue, especially within the open-source community. While this is positive for the ecosystem, it poses challenges for FMOps. They need to keep the production systems up to date, track versions, and manage the relationships between foundational models and fine-tuned derived models. These are just a few of the challenges FMOps faces. Does anyone else have additional insights to share?
Lin Qiao:
I fully agree with JC. As someone who has worked extensively with PyTorch, I have a deep appreciation for the open-source community. The current pace of innovation in the foundational model space is unprecedented. From an operational perspective, the traditional approach of putting constraints on production to ensure reliability and meet SLAs may not be applicable in this case. The focus should shift towards enabling innovation across different product areas that can leverage the rapid advancements in AI technology. Operational considerations should strike a balance between flexibility and traditional production operations. They should not hinder the integration of constantly evolving patterns, model qualities, trade-offs between quality and performance, or different task optimizations. The operational framework must allow for these possibilities, empowering product teams across different enterprises to continue innovating. I believe this current wave of foundational models will ultimately drive product innovation. However, it is essential to justify the higher operating costs associated with these models by delivering a significantly higher ROI compared to traditional machine learning algorithms. In summary, accelerating the velocity of innovation, maintaining flexibility, and enabling rapid product experimentation based on foundational models will be the next wave of challenges.
Haixun Wang:
Do any of the other panelists have additional insights to share?
Ravi Krishnamurthy:
From my perspective, when we think about MLOps, there is a hierarchy of models involved in operations. At the base, we have the foundational model, which can be fine-tuned. For example, in the code assist, we have a foundational model called Big Code, which we fine-tune for ServiceNow's specific JavaScript scripting language. There are also task-specific models and evaluation frameworks tailored to specific tasks. We face significant challenges in benchmarking, especially in the context of larger-scale AI models. Additionally, we have customer-specific models, and on top of it all, there are applications built using these models. This hierarchy makes the stack substantially different, with each part requiring distinct considerations. We shouldn't forget the importance of governance across this entire stack, ensuring that changes are tracked, data is used appropriately, and maintaining overall accountability. Depending on where value is derived, the stack may vary for different companies. Not everyone will build a large foundational model; some may only use it as an API. Therefore, flexibility and agility in MLOps will be crucial. Previously, we aimed to run models for several years, but with the current pace of change, it might be a matter of running them for just a few months or even a few days. These are some of the key differences that I believe set FMOps apart.
Vijay Parthasarathy:
I'd like to add to what has already been mentioned. Measurement is crucial because it enables improvement. However, measuring these models presents additional challenges. We need to ensure speed while being mindful that replacing an existing model can introduce unexpected issues elsewhere. Therefore, more work needs to be done on how to measure and provide feedback effectively, as well as how to run the service seamlessly. We can no longer rely on a single metric as we did for classification tasks. Additionally, as models grow larger, training times increase. Fine-tuning models poses its own research challenges. Overcoming these problems will bring us closer to the performance levels we had before. It often feels like we're taking a step back, even when using prompt engineering or third-party services. Thus, addressing these challenges is crucial.
JC Mao:
I completely agree with what has been mentioned. I would like to add one more point regarding producibility and benchmarking. Professor Liang highlighted the importance of benchmarking earlier today. However, if you cannot reliably reproduce results from large foundational models, benchmarking becomes challenging. Ensuring the reproducibility of foundational models is a key task for FMOps. This can be achieved through mechanisms that track experiment settings, capture model configurations, and establish relationships between foundation models and their derived models and bookkeeping. Such practices will enable developers to reliably reproduce results.
Haixun Wang:
Yes, these are all excellent points. The uniqueness of foundational models necessitates a different approach from what we have been doing. Now, let's move on to our second question. Many companies, especially smaller ones, have a small team of about 10 to 20 people managing the entire ML stack for the company. They already have MLOps in place to support their traditional machine learning models, ranging from 20 to 200 models. The question is, how do these two components come together? How do the existing ML stack and the new and important component of foundational models converge? Many companies, even those with data lakes, are now exploring whether to provide a complete ML stack, like MLflow, for managing the ML lifecycle, while also delving into foundational models. From a user perspective, how do you envision these two components eventually merging?
Ravi Krishnamurthy:
I can provide some insights. What hasn't changed are the customer problems we're solving. The fundamental problems remain the same: improving productivity, service delivery, quality, and managing workflows. At ServiceNow, we continue to address these fundamental customer problems. The question arises as to whether the new advancements solve these problems better. For instance, some of our older discriminative models cost only fractions of a cent to serve, and we tend to use them indiscriminately. Should we replace them with models that are 10 or 100 times more expensive? Are we achieving a higher ROI? These are the questions we need to answer. As Vijay and JC mentioned, benchmarking and understanding when to replace models are crucial. There will likely be a period of coexistence for these two components, and the duration of this period depends on factors such as cost curves. Predicting the exact timeframe is challenging. In the near future, I anticipate running both traditional specialized models and fine-tuned foundational models together. We are developing pipeline architectures that combine the best of both. However, these are internal developments, and there are no existing market tools or solutions available.
JC Mao:
I fully agree that these two paradigms will coexist, at least for the near future. The choice between traditional specialized models and fine-tuned foundational models depends on several factors. These include the nature of the use cases, performance requirements, computing resources availability, and cost of serving. As foundational models improve and become more capable, they can replace specialized models through fine-tuning. One significant advantage of foundational models is their horizontal scalability across a diverse set of tasks. They can scale horizontally more easily than specialized models, which reduces development costs. Currently, the computational cost is high, but it will likely decrease over time. In the short term, I see four scenarios. First, companies can leverage the best of both worlds, depending on design choices and factors. Second, foundational models can serve as building blocks and starting points for fine-tuning and customization. Third, reliability and interpretability can play a crucial role. And fourth, foundational models can gradually replace traditional specialized models as they become more capable and efficient.
Lin Qiao:
That's a great analogy. To make this conversation more interesting, let's explore two extreme possibilities. On the one hand, there's the idea of a single foundational model dominating all AI tasks, solving every problem at hand. On the other hand, we have task-specific smaller models, potentially numbering in the hundreds or thousands. Users, businesses, and enterprises can select the most suitable model to address their specific needs. To illustrate this, let's draw an analogy from the consumer market. How many cars do you have at home? Three? Perfect. Now, think about cars. You can have a truck that can handle heavy lifting and multiple tasks efficiently. At the same time, you can have small commuter cars for daily commutes of 10 miles between home and work. If you're a student on campus, a bike may be all you need. In this analogy, you have a variety of vehicles at your disposal, and you can choose the best one for each situation. However, it wouldn't make sense to have only a truck at home and use it for every task. The same principle applies to the AI world. One size does not fit all, considering the economic and efficiency aspects. Consumers and enterprises are driven by the need for extreme efficiency and cost-effectiveness. Let's take a look at our kitchens. How many specialized pans or cookware sets do we have? We often purchase sets that include a small pan specifically designed for cooking eggs, even though we may only use it once a month. This is an example of per-task specific optimization taken to the extreme. I don't believe the AI landscape will evolve into either of the two extremes, but rather find a balance where enterprises can leverage the innovations in the field and choose the most cost-efficient and high-quality solutions for their products.
Call to Action
AI enthusiasts, entrepreneurs, and founders are encouraged to get involved in future discussions by reaching out to Fellows Fund at ai@fellows.fund. Whether you want to attend our conversations or engage in AI startups and investments, don't hesitate to connect with us. We look forward to hearing from you and exploring the exciting world of AI together.