In 2022, generative AI has generated a lot of excitement (and hype). DALL-E and Stable Diffusion are models used to create images for social media platforms such as Twitter and Reddit. Big Tech companies are integrating generative models into their mainstream products, while startups building products on top of generative models are attracting funding.
With a few notable exceptions, most of the technologies we’re seeing today have existed for many years. In recent years, generative models have been able to be productized and made available to everyday applications due to the convergence of several trends. Although generative AI faces many challenges in 2023, there is little doubt it will grow.
Improvements in generative artificial intelligence
Generative AI was introduced in 2014 with the advent of generative adversarial networks (GANs), a type of deep learning architecture that can create photorealistic images — such as faces. Scientists later created other variants of GANs to perform other tasks, including transferring the style of one image to another. GANs and variational autoencoders (VAE), a deep learning architecture, ushered in the era of deepfakes, an AI technique that modifies images and videos to swap one person’s face for another.
The transformer is a deep learning architecture that supports large language models (LLMs) like GPT-3, LaMDA, and Gopher. In addition to generating text, software code, and protein structures, the transformer is also used for visual tasks like image classification. A variation of the transformer, the “vision transformer,” also generates images from text, as seen in an earlier version of OpenAI’s DALL-E.
When transformers are made larger and fed more data, their performance and accuracy improve. Transformer models can be trained without or with very little human annotation, which has been one of the main bottlenecks of deep learning, via unsupervised or self-supervised learning.
Contrastive Language-Image Pre-training (CLIP) was introduced by OpenAI in 2021 and became pivotal in text-to-image generators. CLIP is effective at learning shared embeddings between images and text with image caption pairs collected from the internet, making it perfect for combining text with pictures to generate high quality images. CLIP, as well as diffusion (another deep learning technique from OpenAI), were used in DALLE-2 to generate data without limit.
Finding the right applications
Generative models were initially presented as systems that could take on big chunks of creative work. GANs became famous for generating complete images with little input. LLMs like GPT-3 made the headlines for writing full articles.
But as the field has evolved, it has become evident that generative models are unreliable when left on their own. Many scientists agree that current deep neural networks — no matter how large they are — lack some of the basic components of intelligence, which makes them prone to committing unpredictable mistakes.
Several products have used generative models in smart, human-centered ways in the past year. Product teams are learning that generative models work best when they’re implemented in a way that gives users greater control. In Copy AI, a tool that generates blog posts using GPT-3, the writer and the LLM collaborate on creating the article outline and fleshing it out.
Stable Diffusion and DALL-E 2 applications also offer user control with editing, regenerating, or configuring the generative model output. During a recent AI conference, Douglas Eck, director of Google Research, said, “It’s not just about creating a realistic picture with a generative model anymore. Making something you created yourself is what matters. Technology should allow us to exercise agency and creativity.”
Developing the right tools and infrastructure
In tandem with the algorithms and applications, the computational infrastructure and platforms for generative models have evolved. This has allowed many companies to integrate generative AI into their applications without the need for specialized skills to set up and run generative models.
The open-source generative models BLOOM and Stable Diffusion can be used by product teams with experienced machine learning engineers. The OpenAI API, Microsoft Azure, and HuggingFace Inference Endpoints are also options for teams without in-house machine learning talent. The complexity of setting up and running models at scale is abstracted away by these platforms.
The evolution of MLops platforms also makes it possible to set up complete pipelines for gathering feedback data, versioning datasets and models, and fine-tuning models for specific uses.
How will generative AI develop in the future?
The generative AI industry still has its challenges, like ethical and copyright complications. However, it is interesting to see the evolution of this space. For the moment, the main winners are large tech companies with data and an established market and products to deliver the extra value of generative models. For example, Microsoft is taking advantage of its cloud infrastructure, exclusive access to OpenAI’s technology and a huge market for their office and creativity tools to bring the power of generative models to their users.
Finally, it’s worth noting that generative AI has the potential to democratize certain fields that have traditionally been difficult to break into. By automating certain tasks or allowing individuals to create high-quality content without needing specialized training or equipment, generative AI could level the playing field and allow more people to participate in these industries. This could be particularly exciting for fields like music and art, where the barriers to entry can be high for those without access to expensive studios or equipment.
Overall, it’s clear that generative AI is poised to have a major impact in 2022 and beyond. Whether you’re a business owner, artist, or simply someone who loves technology, it’s worth keeping an eye on this exciting field as it evolves.