A quick scan of the headlines makes it seem like genetic AI is everywhere these days. In fact, some of these headlines may actually have been written by genetic AI, such as OpenAI’s ChatGPT, a chatbot that has demonstrated an uncanny ability to produce text that appears to have been written by a human.
But what do people really mean when they say “genital AI?”
Before the AI genetic boom of the past few years, when people talked about AI, they were usually talking about machine learning models that can learn to make a prediction based on data. For example, such models are trained, using millions of examples, to predict whether a particular X-ray shows signs of a tumor or whether a particular borrower is likely to default on a loan.
Generative AI can be thought of as a machine learning model that is trained to generate new data, rather than making a prediction about a specific data set. A genetic AI system is one that learns to create more objects that resemble the data it has been trained on.
“When it comes to the actual mechanism underlying generative AI and other types of artificial intelligence, the distinctions can be a little blurry. Often, the same algorithms can be used for both,” says Phillip Isola, an associate professor of electrical engineering and computer science at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).
And despite the hype that came with the launch of ChatGPT and its counterparts, the technology itself is not new. These powerful machine learning models are based on research and computational advances dating back more than 50 years.
Increasing complexity
An early example of genetic AI is a much simpler model known as a Markov chain. The technique is named after Andrey Markov, a Russian mathematician who in 1906 introduced this statistical method to model the behavior of random processes. In machine learning, Markov models have long been used for next-word prediction tasks, such as the autocomplete function in an email program.
In text prediction, a Markov model generates the next word in a sentence by looking at the previous word or a few previous words. But because these simple models can only look back so far, they’re not good at generating plausible text, says Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science at MIT, who is also a member of CSAIL and the Institute for Data , Systems and Society (IDSS).
“We’ve been creating things well before the last decade, but the main distinction here is in terms of the complexity of the objects we can create and the scale at which we can train these models,” he explains.
Just a few years ago, researchers tended to focus on finding a machine learning algorithm that makes the best use of a given data set. But that focus has shifted a bit, and many researchers are now using larger data sets, perhaps with hundreds of millions or even billions of data points, to train models that can achieve impressive results.
The basic models underpinning ChatGPT and similar systems work in the same way as a Markov model. But one big difference is that ChatGPT is much bigger and more complex, with billions of parameters. And it’s trained on a huge amount of data — in this case, much of the publicly available text on the internet.
In this huge body of text, words and sentences appear in sequences with certain dependencies. This iteration helps the model understand how to cut the text into statistical chunks that have some predictability. It learns the patterns of these blocks of text and uses that knowledge to suggest what might come next.
More powerful architectures
While larger data sets are a catalyst that has led to the genetic explosion of artificial intelligence, a variety of important research advances have also led to more complex deep learning architectures.
In 2014, a machine learning architecture known as Generative Adversarial Network (GAN) was proposed by researchers at the University of Montreal. GANs use two models that work in parallel: One learns to generate a target output (such as an image) and the other learns to distinguish the true data from the generator output. The generator tries to trick the discriminator and in the process learns to produce more realistic results. The StyleGAN image generator is based on these types of models.
Diffusion models were introduced a year later by researchers at Stanford University and the University of California, Berkeley. By iteratively improving their output, these models learn to generate new data samples that resemble samples in a training dataset and have been used to create realistic-looking images. A diffusion model is at the heart of the Stable Diffusion text-to-image generation system.
In 2017, Google researchers introduced the transformer architecture, which has been used to develop large language models such as those powering ChatGPT. In natural language processing, a transformer encodes each word in a body of text as a token and then creates an attention map, which records the relationships of each token to all other tokens. This attention map helps the transformer understand the context when creating new text.
These are just a few of the many approaches that can be used to create artificial intelligence.
A series of applications
What all these approaches have in common is that they convert the inputs into a set of tokens, which are numerical representations of pieces of data. As long as your data can be converted to this standard token format, then in theory, you could apply these methods to create new data that looks like it.
“Your mileage may vary, depending on how noisy your data is and how difficult the signal is to extract, but it’s really getting closer to how a general-purpose CPU can take any kind of data and start processing them in a unified way,” says Isola.
This opens up a huge range of applications for genetic artificial intelligence.
For example, Isola’s team is using genetic artificial intelligence to create synthetic image data that could be used to train another intelligent system, such as teaching a computer vision model how to recognize objects.
Jaakkola’s team uses genetic artificial intelligence to design new protein structures or valid crystal structures that define new materials. In the same way that a genetic model learns the dependencies of language, if it is shown crystal structures, it can learn the relationships that make the structures stable and realizable, he explains.
But while generative models can achieve incredible results, they are not the best choice for all types of data. For tasks that involve making predictions on structured data, such as tabular data in a spreadsheet, AI generative models tend to outperform traditional machine learning methods, says Devavrat Shah, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT. and member of IDSS and the Information Systems and Decisions Laboratory.
“The biggest value they have, in my mind, is to become this amazing human-friendly machine interface. Previously, humans had to speak to machines in machine language to make things happen. Now, this interface has figured out how to talk to both humans and machines,” says Shah.
Raising red flags
Generative AI chatbots are now being used in call centers to field questions from human customers, but this application highlights a potential red flag of implementing these models – employee displacement.
In addition, genetic AI can inherit and propagate biases present in training data or amplify hate speech and false statements. Models have the ability to plagiarize and can create content that looks like it was produced by a specific human creator, raising potential copyright issues.
On the other hand, Shah suggests that genetic AI could empower artists, who could use production tools to help them create creative content that they otherwise wouldn’t have the means to produce.
In the future, he sees genetic AI changing economics in many industries.
A promising future direction Isola sees for genetic AI is its use in manufacturing. Instead of having a model make a picture of a chair, maybe he could create a design for a chair that could be produced.
He also sees future uses for artificial intelligence production systems in the development of more generally intelligent artificial intelligence agents.
“There are differences in how these models work and how we think the human brain works, but I think there are also similarities. We have the ability to think and dream in our heads, come up with interesting ideas or designs, and I think genetic AI is one of the tools that will empower agents to do that,” says Isola.