Llama Models
Meta's Llama family represents the leading open-source large language models.
Llama 3.x Series
Available Sizes
| Model |
Parameters |
Context |
Best For |
| Llama 3.1 8B |
8B |
128K |
Fast inference, edge deployment |
| Llama 3.1 70B |
70B |
128K |
General purpose, high quality |
| Llama 3.1 405B |
405B |
128K |
Research, maximum capability |
Key Features
- 128K context window - Handle long documents
- Multilingual - 8 languages supported
- Code generation - Strong programming ability
- Tool use - Native function calling
Running Llama Locally
With Ollama
# Pull model
ollama pull llama3.1:8b
# Run interactively
ollama run llama3.1:8b
# Use specific variant
ollama run llama3.1:70b
With llama.cpp
# Clone and build
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make
# Convert weights
python convert.py --outfile llama-3.1-8b.gguf
# Run
./main -m llama-3.1-8b.gguf -p "Hello, world"
API Usage
Python with Together AI
from openai import OpenAI
client = OpenAI(
api_key="your-together-api-key",
base_url="https://api.together.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct-Turbo",
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
print(response.choices[0].message.content)
With Groq (Fast Inference)
from groq import Groq
client = Groq(api_key="your-groq-api-key")
response = client.chat.completions.create(
model="llama-3.1-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}]
)
Llama 3.2 Multimodal
# Vision model usage
from ollama import chat
response = chat(
model='llama3.2-vision',
messages=[{
'role': 'user',
'content': 'Describe this image',
'images': ['./image.jpg']
}]
)
Best Practices
| Practice |
Recommendation |
| Quantization |
Use Q4_K_M for balance |
| Context |
Start with 4K, expand as needed |
| System prompt |
Be explicit about format |
| Temperature |
0.1-0.3 for factual, 0.7+ creative |
intermediate | LLM Comparison | Updated 2024-12-18
- llama
- llama 3
- meta ai
- open source llm
- local ai