Llama 2
StaleDescription
Meta's open-source Llama 2 foundational LLM with pretrained and fine-tuned models from 7B to 70B parameters, supporting chat and text completion as a cornerstone of the open LLM ecosystem.
Key Features
- Multiple parameter sizes — 7B, 13B, 34B, and 70B parameter variants
- Base and chat models — Both pretrained base and instruction-fine-tuned Chat versions available
- Commercial-friendly license — Free for both research and commercial use
- HuggingFace compatible — Full support for Transformers library loading and inference
- llama.cpp deployment — Efficiently run on consumer hardware via llama.cpp
- Code Llama — Specialized fine-tune for code generation
Use Cases
Categories
Quick Start
# Install dependencies
pip install torch transformers
# Load from HuggingFace
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map="auto")
# Inference
inputs = tokenizer("Hello, how are you?", return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Or use llama.cpp
# git clone https://github.com/ggerganov/llama.cpp && make
# ./main -m models/llama-2-7b-chat.gguf -p "Hello"