Natural language conveys ideas, actions, information and intent through context and syntax. Additionally, there are volumes of it contained in databases. This makes it an excellent data source for training machine learning systems. Two engineering graduate students in the 6A MEng Thesis Program at MIT, Irene Terpstra ’23 and Rujul Gandhi ’22, are working with mentors in the MIT-IBM Watson AI Lab to use this power of natural language to build AI systems.
As computing becomes more and more advanced, researchers strive to improve the hardware they work on. This means innovation to create new computer chips. And, since there is already literature available on modifications that can be made to achieve certain parameters and performance, Terpstra and her mentors and advisors Anantha Chandrakasan, dean of MIT’s School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science and IBM researcher Xin Zhang, are developing an AI algorithm that helps design chips.
“I’m creating a workflow to systematically analyze how these language models can help the circuit design process. What reasoning powers do they have and how can it be incorporated into the chip design process?” says Terpstra. “And on the other hand, if this proves useful enough, [we’ll] see if they can automatically design the chips themselves by attaching them to a reinforcement learning algorithm.”
To do this, Terpstra’s team is creating an AI system that can iterate over different designs. It means experimenting with various pre-trained models of large languages (such as ChatGPT, Llama 2 and Bard), using an open source circuit simulator language called NGspice, which has the chip parameters in code form and a reinforcement learning algorithm. With text prompts, researchers will be able to ask how the physical chip should be modified to achieve a specific goal in the language model and generate instructions for adjustments. This is then fed into a reinforcement learning algorithm that updates the circuit design and derives new physical parameters of the chip.
“The ultimate goal would be to combine the reasoning powers and the knowledge base built on these large language models and combine that with the optimization power of reinforcement learning algorithms and design the chip itself,” says Terpstra.
Rujul Gandhi works with raw language itself. As an undergraduate at MIT, Gandhi explored linguistics and computer science, bringing them together in her MEng work. “I was interested in communication, both between humans and between humans and computers,” Gandhi says.
Robots or other interactive artificial intelligence systems are one area where communication must be understood by both humans and machines. Researchers often write instructions for robots using formal logic. This helps ensure that commands are followed safely and as intended, but formal logic can be difficult for users to understand, while natural language comes easily. To ensure this smooth communication, Gandhi and advisors Yang Zhang of IBM and MIT assistant professor Chuchu Fan are building a parser that converts natural language instructions into a machine-friendly form. By leveraging the language structure encoded by the pre-trained T5 encoder-decoder model and a dataset of annotated, basic English commands to perform certain tasks, Gandhi’s system identifies the smallest logical units, or individual sentences, present in a given command.
“Once you give your instructions, the model figures out all the smaller subtasks you want it to perform,” Gandhi says. “Then, using a large language model, each subtask can be compared to the available actions and objects in the robot’s world, and if a subtask cannot be performed because a certain object is not recognized or an action is not possible, the system can stop there to ask the user for help.”
This approach of breaking down commands into subtasks also allows her system to understand logical dependencies expressed in English, such as, “do task X until event Y occurs.” Gandy uses a data set with step-by-step instructions on robot task domains such as navigation and manipulation, with an emphasis on household tasks. Using data written the way people would talk to each other has several advantages, he says, because it means the user can be more flexible about how they phrase their instructions.
Another of Gandhi’s projects involves the development of speech models. In the context of speech recognition, some languages are considered “low resource” as they may not have much transcribed speech available or may not have a written form at all. “One of the reasons I applied to this internship at the MIT-IBM Watson AI Lab was my interest in language processing for low-resource languages,” he says. “A lot of language models today rely on a lot of data, and when it’s not that easy to get all that data, then you have to use the limited data effectively.”
Speech is just a stream of sound waves, but people having a conversation can easily tell where words and thoughts begin and end. In speech processing, both humans and language models use their existing vocabulary to recognize word boundaries and understand meaning. In low- or no-resource languages, a written vocabulary may not exist at all, so researchers cannot provide one in the model. Instead, the model can note which sound sequences occur together more often than others and infer that these may be individual words or concepts. In Gandhi’s research group, these inferred words are then collected into a pseudo-dictionary that serves as a tagging method for low-resource language, generating labeled data for further applications.
Applications for language technology are “pretty much everywhere,” Gandhi says. “You could imagine people being able to interact with software and devices in their native language, their native dialect. You could imagine the improvement of all the voice assistants we use. You could imagine it being used for translation or interpretation.’