HxHippy

Llama Models Guide

Complete guide to Meta's Llama family of open-source language models.

Last updated: 2024-12-18

Llama Models

Meta's Llama family represents the leading open-source large language models.

Llama 3.x Series

Available Sizes

Model Parameters Context Best For
Llama 3.1 8B 8B 128K Fast inference, edge deployment
Llama 3.1 70B 70B 128K General purpose, high quality
Llama 3.1 405B 405B 128K Research, maximum capability

Key Features

  • 128K context window - Handle long documents
  • Multilingual - 8 languages supported
  • Code generation - Strong programming ability
  • Tool use - Native function calling

Running Llama Locally

With Ollama

# Pull model
ollama pull llama3.1:8b

# Run interactively
ollama run llama3.1:8b

# Use specific variant
ollama run llama3.1:70b

With llama.cpp

# Clone and build
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Convert weights
python convert.py --outfile llama-3.1-8b.gguf

# Run
./main -m llama-3.1-8b.gguf -p "Hello, world"

API Usage

Python with Together AI

from openai import OpenAI

client = OpenAI(
    api_key="your-together-api-key",
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct-Turbo",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)
print(response.choices[0].message.content)

With Groq (Fast Inference)

from groq import Groq

client = Groq(api_key="your-groq-api-key")

response = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)

Llama 3.2 Multimodal

# Vision model usage
from ollama import chat

response = chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'Describe this image',
        'images': ['./image.jpg']
    }]
)

Best Practices

Practice Recommendation
Quantization Use Q4_K_M for balance
Context Start with 4K, expand as needed
System prompt Be explicit about format
Temperature 0.1-0.3 for factual, 0.7+ creative
intermediate LLM Comparison Updated 2024-12-18
  • llama
  • llama 3
  • meta ai
  • open source llm
  • local ai