LLama 3.2 preview // inference API is already available

LLama is leading opensource LLM // multi-modal & on-device now

sbagency
3 min readSep 26, 2024

Quality of model == quality of data

https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/

Meta is releasing Llama 3.2, a set of AI models with various features and applications. Key points about Llama 3.2 include:

- Llama 3.2 includes small and medium-sized vision LLMs (11B and 90B parameters) and lightweight, text-only models (1B and 3B parameters) for edge and mobile devices.
- Models support tasks such as image reasoning, document understanding, and text generation.
- Vision models are competitive with leading models like Claude 3 Haiku and GPT4o-mini, while lightweight models excel at tasks like on-device applications.
- Meta is introducing Llama Stack distributions for simplified deployment and partnering with companies for cloud, on-premise, and on-device distributions.
- Llama 3.2 includes safety features like Llama Guard for filtering text+image inputs and outputs.

The release is part of Meta’s commitment to open-source AI development, aiming to drive innovation and democratize access to AI technology. The models are available for download on llama.com and Hugging Face and can be accessed on various partner platforms.

https://x.com/AIatMeta/status/1838993953502515702
https://x.com/AIatMeta/status/1838993953502515702

Introducing Llama 3.2: Lightweight models for edge devices, vision models and more!

What’s new?
• Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one.
• Llama 3.2 11B & 90B vision models deliver performance competitive with leading closed models — and can be used as drop-in replacements for Llama 3.1 8B & 70B.
• New Llama Guard models to support multimodal use cases and edge deployments.
• The first official distro of Llama Stack simplifies and supercharges the way developers & enterprises can build around Llama to support agentic applications and more.

Details in the full announcement ➡️ https://go.fb.me/229ug4
Download Llama 3.2 models ➡️ https://go.fb.me/w63yfd

Llama 3.2 inference API

client = openai.OpenAI(
#base_url = "https://integrate.api.nvidia.com/v1",
#api_key = userdata.get('NVIDIA_API_KEY')
base_url="https://api.groq.com/openai/v1",
api_key=userdata.get('GROQ_API_KEY')
)

#model = "meta/llama-3.1-405b-instruct"
#model = "llama-3.1-70b-versatile"
model = "llama-3.2-90b-text-preview"
temperature=0.75
top_p=1
max_tokens=4096

def llm(prompt):
messages=[{"role": "user","content": prompt}]
completion = client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
top_p=top_p,
max_tokens=max_tokens,
stream=False
)
return completion.choices[0].message.content

temperature=1
resp=llm("Generte python code for monte-carlo search tree with example")
print(resp)
https://build.nvidia.com/meta/llama-3.2-90b-vision-instruct
from openai import OpenAI

client = OpenAI(
base_url = "https://ai.api.nvidia.com/v1/gr/meta/llama-3.2-90b-vision-instruct",
api_key = "$API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC"
)

completion = client.chat.completions.create(
model="meta/llama-3.2-90b-vision-instruct",
messages=[{"role":"user","content":"Experiment with some images we have for you."}],
temperature=1,
top_p=1,
max_tokens=512,
stream=True
)

for chunk in completion:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

Responses (1)