Llama 2

Stale
GitHub Python NOASSERTION

Description

Meta's open-source Llama 2 foundational LLM with pretrained and fine-tuned models from 7B to 70B parameters, supporting chat and text completion as a cornerstone of the open LLM ecosystem.

Key Features

  • Multiple parameter sizes — 7B, 13B, 34B, and 70B parameter variants
  • Base and chat models — Both pretrained base and instruction-fine-tuned Chat versions available
  • Commercial-friendly license — Free for both research and commercial use
  • HuggingFace compatible — Full support for Transformers library loading and inference
  • llama.cpp deployment — Efficiently run on consumer hardware via llama.cpp
  • Code Llama — Specialized fine-tune for code generation

Use Cases

💡 Open-source LLM foundation: Use as base weights for downstream fine-tuning and research
💡 On-premise deployment: Privately deploy Llama 2 on local GPUs for inference
💡 Instruction tuning research: Conduct RLHF, SFT, and other instruction tuning research on base models
💡 Code generation: Use Code Llama for code understanding and generation tasks

Quick Start

# Install dependencies
pip install torch transformers

# Load from HuggingFace
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map="auto")

# Inference
inputs = tokenizer("Hello, how are you?", return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

# Or use llama.cpp
# git clone https://github.com/ggerganov/llama.cpp && make
# ./main -m models/llama-2-7b-chat.gguf -p "Hello"

Related Projects