LLama 3.2 preview // inference API is already available
LLama is leading opensource LLM // multi-modal & on-device now
Quality of model == quality of data
Meta is releasing Llama 3.2, a set of AI models with various features and applications. Key points about Llama 3.2 include:
- Llama 3.2 includes small and medium-sized vision LLMs (11B and 90B parameters) and lightweight, text-only models (1B and 3B parameters) for edge and mobile devices.
- Models support tasks such as image reasoning, document understanding, and text generation.
- Vision models are competitive with leading models like Claude 3 Haiku and GPT4o-mini, while lightweight models excel at tasks like on-device applications.
- Meta is introducing Llama Stack distributions for simplified deployment and partnering with companies for cloud, on-premise, and on-device distributions.
- Llama 3.2 includes safety features like Llama Guard for filtering text+image inputs and outputs.
The release is part of Meta’s commitment to open-source AI development, aiming to drive innovation and democratize access to AI technology. The models are available for download on llama.com and Hugging Face and can be accessed on various partner platforms.
Introducing Llama 3.2: Lightweight models for edge devices, vision models and more!
What’s new?
• Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one.
• Llama 3.2 11B & 90B vision models deliver performance competitive with leading closed models — and can be used as drop-in replacements for Llama 3.1 8B & 70B.
• New Llama Guard models to support multimodal use cases and edge deployments.
• The first official distro of Llama Stack simplifies and supercharges the way developers & enterprises can build around Llama to support agentic applications and more.Details in the full announcement ➡️ https://go.fb.me/229ug4
Download Llama 3.2 models ➡️ https://go.fb.me/w63yfd
Llama 3.2 inference API
client = openai.OpenAI(
#base_url = "https://integrate.api.nvidia.com/v1",
#api_key = userdata.get('NVIDIA_API_KEY')
base_url="https://api.groq.com/openai/v1",
api_key=userdata.get('GROQ_API_KEY')
)
#model = "meta/llama-3.1-405b-instruct"
#model = "llama-3.1-70b-versatile"
model = "llama-3.2-90b-text-preview"
temperature=0.75
top_p=1
max_tokens=4096
def llm(prompt):
messages=[{"role": "user","content": prompt}]
completion = client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
top_p=top_p,
max_tokens=max_tokens,
stream=False
)
return completion.choices[0].message.content
temperature=1
resp=llm("Generte python code for monte-carlo search tree with example")
print(resp)
from openai import OpenAI
client = OpenAI(
base_url = "https://ai.api.nvidia.com/v1/gr/meta/llama-3.2-90b-vision-instruct",
api_key = "$API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC"
)
completion = client.chat.completions.create(
model="meta/llama-3.2-90b-vision-instruct",
messages=[{"role":"user","content":"Experiment with some images we have for you."}],
temperature=1,
top_p=1,
max_tokens=512,
stream=True
)
for chunk in completion:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")