| Term | What it actually means | Why it matters for running locally |
|---|---|---|
| Token | ||
| Parameter | ||
| Training | ||
| Inference | ||
| Context window | ||
| Quantization (Q4) | ||
| Training cutoff |
| Free RAM | Recommended model | Size |
|---|---|---|
| 4 GB | gemma2:2b | ~1.6 GB |
| 6 GB | llama3.2:3b | ~2.0 GB |
| 8 GB | llama3.2:3b | ~2.0 GB |
| 12+ GB | gemma2:9b | ~5.4 GB |
free -hcurl http://localhost:11434/api/generate \
-d '{
"model": "llama3.2:3b",
"prompt": "your prompt here",
"stream": false
}'
| jq -r '.response'curl http://localhost:11434/api/chat \
-d '{
"model": "llama3.2:3b",
"stream": false,
"messages": [
{"role":"system","content":"..."},
{"role":"user","content":"..."}
]
}'