News & Media

AI2: Insights 10 – Conversational Programming, AI Assistants, Foundation Model

News & Media

AI2: Insights 10 – Conversational Programming, AI Assistants, Foundation Model

Via AI2 Incubator, by Vu Ha

AI Generated Cover Image

This is the 10th edition of the AI2 Incubator’s newsletter, aka Insights. We started this newsletter last summer to share our perspectives on AI-focused startups. Fast forward nine issues later, we are witnessing perhaps the most exciting time to start an AI company harnessing the explosive advances in foundation models (FMs) and their generative capabilities. As 2022 wraps up, we share our thoughts on opportunities in building the next iconic AI companies in 2023 and beyond. Areas we find promising include conversation programming, task-oriented virtual assistants, and foundation model operations (FMOps). We cover these opportunities in detail, highlighting the importance of building your own foundation model stack that allows customization to a domain or even a specific person, and the critical role of open, community-driven development of AI technologies.

A recap of Fall 2022 in Startups & AI

In our last issue (September 8th), we wrote that the summer of 2022 was the summer of AI image generation, with DALL-E 2, Stable Diffusion, and MidJourney drawing immense interest. On September 16th, FastAI announced its latest course on Stable Diffusion. On September 19th, Sonya Huang and Pat Grady of Sequoia Capital published a widely shared post titled Generative AI: A Creative New World.
They wrote:
“The fields that generative AI addresses—knowledge work and creative work—comprise billions of workers. Generative AI can make these workers at least 10% more efficient and/or creative: they become not only faster and more efficient, but more capable than before. Therefore, Generative AI has the potential to generate trillions of dollars of economic value.”
NFX’s James Currier pretty much agreed with Sequoia in his post titled Generative Tech Begins. On October 17th, Stability AI, the company behind Stable Diffusion, announced a $101M round. A day later, Jasper announced a $125M round at a $1.5B valuation. Stability was our startup pick in our last issue. Jasper, which went from launch to this raise in a span of mere 18 months, is our pick for this issue. In our incubator’s universe, generative AI is the topic of the day, almost every day, among entrepreneurs and investors alike. On the technology front, progress continues relentlessly across the big players from NVIDIA (MineDojo, Get3D) to Meta (Galactica, Cicero), Google (DreamFusion), and others. NeurIPS, the AI research community’s annual gala, was held in New Orleans (see Salesforce research’s Jim Fan’s tweet summary of its 15 outstanding papers for a glimpse of what AI researchers are working on). Our pick of AI research paper is Holistic Evaluation of Language Models, by Stanford Center for Research on Foundation Models. This paper compares the performance of various large language models. Fall 2022’s biggest AI news however, arguably, belongs to OpenAI’s ChatGPT, announced right in the middle of NeurIPS on November 30th. Dubbed GPT 3.5 by some, ChatGPT is simply mind blowing, generating the level of excitement that surpasses even GPT-3’s. If you have not heard about ChatGPT, start by this tweet thread. The jury is still out whether this is the beginning of the AI Singularity, but there are definitely days when I wake up to yet another groundbreaking new development while still having not fully digested yesterday’s one.

Now is the best time to start an AI company.

As we start 2023, the stars are aligned for entrepreneurs to start AI companies. Despite the economic downturn, venture capitalists are looking to deploy a mammoth $290B-worth of funding, according to The Information’s Kate Clark. The recent FTX/SBF debacle brought the crypto/web3 winter’s temperature down by a few degrees more, so much of that dry powder should land on AI. In addition, the unfortunate widespread layoff across the tech industry has created a much better climate for startups to recruit (we are witnessing this trend first hand at our portfolio companies). But more importantly, tech workers should ask themselves if they want to stay on the sideline of potentially the biggest technology opportunity since the cloud/mobile era. With increasing use of AI in automation of all sorts of workflows, including those of highly paid software engineers (think ChatGPT for writing code), there’s an added impetus to participate in inventing the future of applied AI instead of being the sitting duck in its crosshairs.
Is generative AI (GAI) hyped too much? Skeptics include Sam Lessin. His hot take (“default out” on investing in GAI) has some valid points, but misses the main one: startups and venture investing are about bold dreams (Doug Leone, in the aftermath of the FTX implosion, maintains that Sequoia is in the dream business). 95% of funded GAI startups will not win big, just like startups at various times in the past. We believe that the 5% that do will become era-defining companies. At the AI2 Incubator we work hard to help our companies have the best chance to be in that 5% bucket. Read on below for our take on where the opportunities are. Reach out to us to chat about partnering with the AI2 incubator.

 

Some Opportunity Areas Around GAI/FMs for Founders:

First, we want to clarify the term foundation models and their relationships to large language models (LLMs).

1) Foundation models, first coined by the Stanford center for research on foundation models, are large neural networks that have been trained on a large amount of data. Our rule of thumb is that foundation models have at least one billion parameters.

2) The vast majority of the training data is not manually labeled by humans.

3) Foundation models are adept at solving multiple tasks which they have not been specifically trained for, using little or no task-specific labeled data. The technical terms are few-shot and zero-shot learning, respectively.

4) Foundation models may be multimodal, trained on data of different types: text, image, audio, etc. For this reason, foundation models are a superset of LLMs which deal mostly with text.

 

Foundation Models for Automation: Conversational Programming

Before ChatGPT showed up, the limelight belonged to image generation tools such as DALL-E, Stable Diffusion, and MidJourney. Runway, the AI content creation tools startup and a collaborator with Stability AI in Stable Diffusion, raised $50M series C. It is very interesting to watch the competition between new AI-native tools such as Runway against traditional ones such as Adobe and Canva which wasted no time to incorporate SD and other GAI capabilities into their products. Lensa, the viral photo editing app with Stable Diffusion under the hood, reportedly generated $1M daily revenue in the first week of December. Suhail Doshi, the well-respected entrepreneur of Mixpanel fame abandoned his previous project, Mighty App, to work on image generation (playgroundai.com), declaring that he is all-in on AI. Twitter is teeming with stunning examples of AI generated visual arts.
While AI generated visuals are cool, we believe foundation models have a vast surface area for applications that are about automating works across industries and job functions, works that have traditionally been considered creative and not automatable. Most of these new types of automation are not visual. Writing sales and marketing content is such an example, as demonstrated by Jasper.ai and Copy.ai. We believe the most fascinating and impactful area for innovation is however writing code (GAI code generation), a topic we started to cover in our last issue, as opposed to writing text. Writing code is a creative process for sure, but there’s so much repetitiveness across the software industry, from building web sites to writing verification software for chip design to creating data analytic dashboards. We have seen early indications of strong code generation capabilities of proof-of-concept systems such as OpenAI Codex and ChatGPT. We believe that these are just the tip of the iceberg, and an increasingly larger percentage of software engineers’ and programmers’ work will be automated in the coming years. (There is a running joke in the tech industry about well-compensated Google engineers spending a good chunk of their development time manipulating protocol buffer (protobuf) payloads, a decidedly uncreative task.) Think about a future where 10% (say) of technology work is automated and this percentage number continues to increase over time for a long time. That future is very exciting.
AI is unlikely to be able to produce complex, sophisticated, unique pieces of software such as Adobe Photoshop or Microsoft Excel, at least any time soon. AI however has shown great potential to write code that is repetitive, less differentiated, given only natural language requirements and specifications. The interaction between the user and AI may be conversational as hinted by the ChatGPT demo. This led to companies such as Salesforce to use the term conversational programming (CP).
Below are some early examples, from big tech companies to early stage startups:
  • CP for devops. Red Hat and IBM are training an AI model to infuse Ansible with new capabilities. Project Wisdom will make it easier for anyone to write Ansible Playbooks with AI-generated recommendations—think pair programming with an AI in the “navigator” seat.

  • CP for cloud programming. Amazon CodeWhisperer recently added features that can help developers more easily work with its cloud services’ application programming interfaces. CodeWhisperer can generate code for interacting with the APIs of Amazon EC2, AWS Lambda and other popular AWS services.

  • CP for workflow automation. Microsoft’s Power Platform is a service that helps users create workflows between applications using natural language.

  • CP for data apps: Hal9, an AI incubator startup, is working on a product to allow data scientists to create interactive data applications without having to write code using tools such as Streamlit or Shiny.

What does it take to build a CP product? ChatGPT is impressive as a proof of concept, not as a full-featured, polished product. It can produce diverse, anecdotally compelling code examples from devops to data science apps, but it needs to be optimized much further for a specific use case to solve a customer’s problem. Entrepreneurs need to start by training foundation models to automate narrow, highly repetitive programming tasks, then gradually expand the scope of automation over time with proper data engines. They will need to train and fine tune their own foundation models instead of using blackbox ones via APIs. We will cover the reason why later in this post.

 

Task-Oriented Virtual AI Assistants: The Next Wave

The mid 2010s witnessed strong interest around chatbots in business settings (e.g. in customer support) and virtual assistants in consumer settings (Siri, Alexa, Google Home, etc.). ChatGPT’s widely praised demo suggests that we have achieved a step-function upgrade from the previous generation’s conversational AI technology. Coupled with similarly significant improvement of analytical AI enabled by foundation models, from text categorization to information extraction, we now have the ingredients to build powerful virtual assistants that can acquire relevant information, analyze and distill the acquired information into succinct summaries, and communicate them back to the user via a conversational interface, gathering requirements and asking clarifying questions from the user along the way.

In conversation with entrepreneurs, we advise them to make a much bolder assumption about what can be automated in the everyday work of their targeted customers, from sales to accounting, from student learning to shopping research. If ChatGPT can show a wide range of talent from composing song lyrics to generating devops code, imagine a version of it that is highly fine-tuned to a specific task. Originally envisioned in Apple’s Newton project, virtual assistants are close to being realized nearly thirty years later.

 

Foundation models operations (FMOps)

Foundation models are large neural networks that have been trained with large amounts of data of different types such as text, audio, images, etc. The keyword is large: large amounts of data contain a large number of patterns which FMs can learn/encode with their large number of neurons/parameters. Foundation models are effectively encyclopedic databases of patterns that can then be harnessed to solve a specific ML problem that, in many practical applications, corresponds to a subset of such patterns. Using techniques such as prompting and task demonstration with a handful of examples, we can instruct/query FMs to locate the subset of patterns that is relevant to the problem at hand. Foundation models are efficient learners, and thus attractive in practice as we can reduce and sometimes largely eliminate the data bottleneck, the heretofore expensive and time consuming process of collecting training data in building ML solutions. This learning efficiency is a game changer. Reducing the data bottleneck means reducing the time it takes to get from task requirements to a V1 solution. This enables us to take on a much larger number of tasks and to effectively deal with changing task requirements. We call this emerging paradigm task-centric ML, characterized by a large number of tasks with few training examples per task, in contrast to the currently prevailing data-centric ML where there’s a small number of tasks with many training examples per task. With task-centric ML, existing ML teams become more productive, and organizations can build ML solutions without relying on (expensive) ML teams. Foundation models may over time become the foundation (pun unintended) of a company’s ML stack, reducing the need to have fragmented and bespoke ML solutions that are expensive and difficult to maintain.
Enterprises with significant use of machine learning and/or potential for task-centric ML will take a serious look at adopting foundation models. These companies will likely want to own and operate them instead of using packaged APIs from OpenAI, Cohere, and other FM providers, due to requirements around customization, optimization, performance, (data) security, managing surprises and edge cases, etc. Foundation models are however truly cutting edge technologies that require the type of expertise that most companies don’t have. This suggests a new type of FM infrastructure startups that could emerge to fill this need. This group may increasingly become an integral part of the data and ML ecosystem, complementing companies such as Databricks, Snowflake, DataRobot, WhyLabs, etc. We use the phrase foundation model operations (FMOps) to describe this group, and believe that FMOps will significantly influence the existing MLOps space in the future.

 

AI-as-a-service? Can startups outsource AI?

Enterprises need FMOps, but how about FM-focused startups? Can they outsource AI?

Jasper is on track to hit $75M in revenue this year. This is amazing for a product that launched 20 months ago. Jasper’s rise is also the latest chapter of the story of how AI over the last decade has become exponentially more widespread and easier to use. Ten years ago, only three people behind the AlexNet paper knew how to train a neural network using CUDA. Five years later, in 2017, any sufficiently motivated engineer could learn the complicated world of Tensorflow or fight Pytorch bugs or production deploy a neural network to the cloud (Jeremy Howard and his FastAI initiative played an important role here). When Dave Rogenmoser launched Jasper in early 2021, he simply used GPT-3 via the APIs provided by OpenAI. It’s no surprise that OpenAI has been active in supporting the startup ecosystem. With their $100M startup fund, OpenAI invested in Descript, Harvey AI, Mem, Speak, and likely more companies. OpenAI also announced Converge, a five-week program for AI startup founders that offer early access to OpenAI’s technology, support, and community in addition to an $1M investment.
The road to build the next Jasper could then look something like this. First, find a niche area that involves a lot of manual labor (e.g. writing marketing materials). Second, use OpenAI’s APIs to automate away a big chunk of that labor. Third, charge for this automation. Along the way, focus on serving customers in your niche area the best beer possible. Rely on OpenAI for this age’s electricity. Just this week alone, we have seen several new companies popping up with powered-by OpenAI descriptions. OpenAI is reportedly on track for $1B revenue in 2024.
Andrew Ng’s likening AI to electricity is an imperfect one though in this context. Today, all AI is not created the same and OpenAI is not a utility company. To compete effectively, Jasper and the next wave of GAI startups need to invest in developing their own AI models that are optimized to their domains, users, and other operational considerations such as performance, latency, costs, etc. (good news: it’s OK to delay this until product-market fit, using off-the-shell AI for the MVP phase). We believe it’s not tenable in the long run to simply put a nice UX on top of a general-purpose model that everyone has access to. From this standpoint, AI is not like electricity yet. Similarly, we should perhaps not view AI in the similar way to cloud computing, the kind of technology infrastructure provided by a handful of providers. (The beer/electricity analogy is credited to Jeff Bezos when he articulated the vision of AWS to the startup world back in 2008.)
What does it take to fine-tune foundation models to a domain? As covered in our past issues, community-driven efforts spearheaded by companies such as Hugging Face, Stability AI, and others play an important role here. As of now, open-source models trail behind OpenAI’s in performance. Stanford’s center for research on foundation models recently released a report comparing performance of a number of foundation models (it is our pick for AI research paper in this issue). One conclusion of this report confirms that this gap exists but has been shrinking, albeit it’s worth noting that this report was published before OpenAI dropped the ChatGPT bomb. To match and then surpass OpenAI’s models, startups need not only open-source/open-weight pre-trained models but also expertise to build their own data engines to continuously fine-tune their models with minimal human intervention. Foundation models may need to be trained from scratch using domain-specific data (as demonstrated by Meta’s CommerceMM), or fine-tuned to individual users, to maximize their effectiveness. We call these approached domain-specific foundation models (DSFMs) and personalized foundation models (PFM), respectively. This is challenging, as the AI experts who can build such data engines, who understand the intricacies of techniques such as reinforcement learning using human feedback (RLHF) continue to be a very small group that is in great demand.
In summary, will we see AI-as-a-service for startups in the future? For startups that need to continue to develop differentiated, cutting edge capabilities based on foundation models, they need to build their own FM stacks, creating their unique Process Power. In some cases, that means training their own FMs instead of only fine-tuning pretrained FMs. Startups that need to add a limited set of FM/GAI capabilities can rely on AI providers such as OpenAI.

 

AI2 Incubator Updates

On the fundraising front, Ophir Ronen and the CalmWave team raised a $4M seed round led by Bonfire Ventures, with participation by Tau Ventures, Seachange Fund, Hike Ventures and the co-founders of PagerDuty. CalmWave uses advanced analytics and artificial intelligence to empower providers with the insights they need to deliver more data-driven, more efficient, and quieter, care. Congratulations Ophir!
On the technology side, Michael Carlon has been experimenting with neural network approach to multimedia compression, dubbed Compressinator, starting with images. A unique key requirement for this exploration is that the neural network decoder needs to run in a browser with very limited memory footprint (less than 5MB download size). Michael was able to obtain a compression efficiency that is up to 1500% better than JPEG for similar perceptual quality.
Under the hood Compressinator has a few interesting details. It is a bottlenecked autoencoder trained with a variety of perceptual and adversarial losses. It uses residually-arranged depth-separable convolutions with clipping non-linearities. The quantizer is implemented through a series of steps. First the network is trained without any quantizer. Then, a value-bound is placed within the bottleneck and uniform noise is added to the bottleneck values. This noise effectively acts as a quantizer, but with the benefit of more robust training feedback vs an inline-vector quantizer. Once the network has converged under the noise regime, an inline quantizer is trained, with K-means initialization. Michael created a demo of the decoder running in a browser.
Compressinator could potentially deliver a large amount of value to many areas within the internet data-chain. It could help CDNs to optimize their customers’ media, it could help content providers like Netflix or Hulu to save a great deal of money on bandwidth while enabling a wider customer base to enjoy HD/4K content, or it can just make your mobile data-plan less expensive. Basically, it could help anyone who downloads, shares, or views content on the internet (which is everyone!)

 

Additional Readings That We Found Interesting