How to Productionize Large Language Models (LLMs)

Deploying LLMs

Deployment vs productionization
Classical ML model pipeline
– Open-source tools
– AWS SageMaker Pipelines
– Different ways to deploy model on SageMaker
– BYOC (Bring your own container)
– Deploying multiple models
LLM Inference with Quantization
– Quantize with AutoGPTQ
– Quantize with llama.cpp
Deploy LLM on Local Machine
– llama.cpp
– Ollama
– Transformers
– text-generation webui by oobabooga
– Jan.ai
– GPT4ALL
– Chat with RTX by Nvidia
Deploy LLM on cloud
– Major cloud providers
– Deploy LLMs from HuggingFace on Sagemaker Endpoint
– Sagemaker Jumpstart
– SageMaker deployment of LLMs that you have pretrained or fine-tuned
Deploy using containers
– Benefits of using containers
– GPU and containers
– Using Ollama
Using specialized hardware for inference
– AWS Inferentia
– Apple Neural engine
Deployment on edge devices
– Different types of edge devices
– TensorFlow Lite
– SageMaker Neo
– ONNX
CI/CD Pipeline for LLM based applications
– Fine-tuning Pipeline
Capturing endpoint statistics
– Ways to capture endpoint statistics
– Cloud provider endpoints

What does model deployment mean?

Model deployment is the process of making a trained machine learning model available for use in a specific environment. It involves taking the model from a development or testing environment and deploying it to a production or operational environment where it can be accessed by end-users or other systems through an endpoint.

What does productionization of model mean?

Putting a model into production specifically refers to the step where a model is incorporated into the live or operational environment, and it actively influences or aids real-world processes or decisions. It also involves automating your workflow to the extent possible.

Different between model deployment and putting it in production

This is a little bit controversial because some folks look productionization as a subset of model deployment and for some model deployment is subset of the whole process of putting a model in production.