π Groq API Cheat Sheet #
Welcome to the Groq API cheat sheet! This guide provides a comprehensive overview of how to use the Groq API for ultra-fast large language model (LLM) inference. Whether you’re building a chatbot, a content generation tool, or any other application that requires real-time language processing, Groq offers the speed and performance you need.
π Getting Started #
1. Get Your API Key #
First, you’ll need a Groq API key. You can get one by signing up on the Groq Console.
2. Set Up Your API Key #
It’s recommended to set your API key as an environment variable to avoid hardcoding it in your application. This is more secure and makes it easier to manage your keys.
export GROQ_API_KEY='YOUR_API_KEY'
ζ ΈεΏζ¦εΏ΅ #
API Endpoint #
The base URL for the Groq API is:
https://api.groq.com/openai/v1/
The API is designed to be compatible with the OpenAI API, which makes it easy to switch between the two.
Authentication #
Authentication is handled via the Authorization header, using a Bearer token with your API key.
Authorization: Bearer $GROQ_API_KEY
Available Models #
Groq offers a variety of models across different capabilities:
π§ Reasoning Models (with explicit reasoning chains):
openai/gpt-oss-20b- OpenAI GPT-OSS 20B with reasoningopenai/gpt-oss-120b- OpenAI GPT-OSS 120B flagship modelqwen/qwen3-32b- Qwen 3 32B with reasoningdeepseek-r1-distill-llama-70b- DeepSeek R1 Distil Llama 70B
ποΈ Vision Models (multimodal image processing):
llava-v1.5-7b-4096-preview- LLaVA vision model
π΅ Audio Models (transcription):
whisper-large-v3- Whisper Large V3 for audio transcription
π Text Models (traditional chat):
llama3-8b-8192- Llama 3 8Bllama-3.1-70b-versatile- Llama 3.1 70B- And many more…
Find the complete list in the Groq documentation.
π¬ Chat Completions API #
The most common use case for the Groq API is creating chat completions. This is done by sending a POST request to the /chat/completions endpoint.
Request Parameters #
Here are some of the most important request parameters:
| Parameter | Type | Description |
|---|---|---|
model |
string | The ID of the model to use. |
messages |
array | A list of message objects, representing the conversation history. |
temperature |
number | Controls the randomness of the output. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make it more focused and deterministic. |
max_completion_tokens |
integer | π The maximum number of tokens to generate (replaces max_tokens). |
stream |
boolean | If true, the response will be streamed as a series of server-sent events. |
tool_choice |
string or object | Controls which (if any) tool is called by the model. |
tools |
array | A list of tools the model may call. Built-in: code_interpreter, browser_search. |
response_format |
object | An object specifying the format that the model must output. Use { "type": "json_object" } to enable JSON mode. |
reasoning_effort |
string | π Controls reasoning level: low, medium, high (GPT-OSS) or none, default (Qwen). |
reasoning_format |
string | π How reasoning is presented: raw, parsed, hidden. |
Example Request #
curl -X POST https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3-8b-8192",
"messages": [
{
"role": "user",
"content": "Explain the importance of low latency LLMs"
}
]
}'
Response Object #
The response object contains the generated message, as well as other information such as the model used, the number of tokens, and the finish reason.
{
"id": "chatcmpl-f51b2cd2-bef7-417e-964e-a08f0b513c22",
"object": "chat.completion",
"created": 1730241104,
"model": "llama3-8b-8192",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "..."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"queue_time": 0.037493756,
"prompt_tokens": 18,
"prompt_time": 0.000680594,
"completion_tokens": 556,
"completion_time": 0.463333333,
"total_tokens": 574,
"total_time": 0.464013927
},
"system_fingerprint": "fp_179b0f92c9",
"x_groq": {
"id": "req_01jbd6g2qdfw2adyrt2az8hz4w"
}
}
π Using the Groq TypeScript/Node.js SDK #
The official Groq TypeScript/Node.js SDK makes it easy to interact with the Groq API in your Node.js applications.
Installation #
npm install groq-sdk
Basic Usage #
import Groq from 'groq-sdk';
const client = new Groq({
apiKey: process.env.GROQ_API_KEY,
});
async function main() {
const chatCompletion = await client.chat.completions.create({
messages: [
{
role: 'user',
content: 'Explain the importance of low latency LLMs',
},
],
model: 'llama3-8b-8192',
});
console.log(chatCompletion.choices[0].message.content);
}
main();
Streaming #
To stream the response, set the stream parameter to true.
import Groq from 'groq-sdk';
const client = new Groq();
async function main() {
const stream = await client.chat.completions.create({
messages: [
{
role: 'user',
content: 'Explain the importance of low latency LLMs',
},
],
model: 'llama3-8b-8192',
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
}
main();
Error Handling #
The SDK will throw an APIError if the API returns a non-success status code.
import Groq from 'groq-sdk';
const client = new Groq();
try {
await client.chat.completions.create({
messages: [
{
role: 'user',
content: '...',
},
],
model: 'llama3-8b-8192',
});
} catch (err) {
if (err instanceof Groq.APIError) {
console.error(err.status); // 400
console.error(err.name); // BadRequestError
console.error(err.headers); // {server: 'nginx', ...}
} else {
throw err;
}
}
π§ Reasoning Models #
Groq’s reasoning models provide explicit reasoning chains for complex problem-solving:
Quick Start with Reasoning #
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[
{
"role": "system",
"content": "You are a genius problem solver!"
},
{
"role": "user",
"content": "Solve this step by step: What's 15% of 240?"
}
],
reasoning_effort="medium",
reasoning_format="parsed",
max_completion_tokens=1000
)
print(completion.choices[0].message.content)
Reasoning Parameters #
- reasoning_effort:
low,medium,high(GPT-OSS) |none,default(Qwen) - reasoning_format:
raw,parsed,hidden
ποΈ Vision & Multimodal #
Process images with LLaVA models:
import base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image("path/to/image.jpg")
completion = client.chat.completions.create(
model="llava-v1.5-7b-4096-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
},
{"type": "text", "text": "Describe this image in detail"}
]
}
]
)
π΅ Audio Transcription #
Transcribe audio with Whisper:
with open("audio.mp3", "rb") as file:
transcription = client.audio.transcriptions.create(
file=("audio.mp3", file, "audio/mpeg"),
model="whisper-large-v3",
language="en",
response_format="verbose_json"
)
print(transcription.text)
π οΈ Tool Calling & Function Calling #
Use built-in tools or custom functions:
# Built-in tools
completion = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[
{
"role": "user",
"content": "Search for the latest Python news and write some code to parse JSON"
}
],
tools=[
{"type": "browser_search"},
{"type": "code_interpreter"}
],
max_completion_tokens=4096
)
# Custom function calling
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="llama-3.1-70b-versatile",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools,
tool_choice="auto"
)
β‘οΈ Rate Limits #
The Groq API has rate limits to ensure fair usage. You can find more information about rate limits in the Groq documentation.
β¨ Best Practices #
π Security & Setup:
- Use environment variables for your API key. This is more secure than hardcoding it in your application.
- Use the official Groq SDKs. They handle many of the complexities of interacting with the API, such as authentication and error handling.
π Performance & Reliability:
- Implement retries with exponential backoff. This will help your application recover from transient errors.
- Use
max_completion_tokensinstead ofmax_tokensfor newer models. - Leverage streaming for real-time applications and better UX.
π§ Model Selection:
- Use reasoning models (GPT-OSS, Qwen, DeepSeek) for complex problem-solving.
- Choose vision models (LLaVA) for image analysis tasks.
- Select audio models (Whisper) for transcription needs.
π οΈ Tool Usage:
- Combine built-in tools (
browser_search,code_interpreter) for enhanced capabilities. - Design clear function schemas for custom tool calling.
- Test reasoning parameters to find optimal
reasoning_effortlevels.
π Monitoring:
- Monitor your usage. Keep an eye on your token usage to avoid unexpected costs.
- Track reasoning token consumption as it can be higher than standard completions.
This cheat sheet should give you a solid foundation for using the Groq API. For more detailed information, please refer to the official Groq documentation.