Model Downloads¶
Recall uses AI models for semantic search, text generation, and writing assistance. Several models are bundled with the app, while others can be downloaded for enhanced capabilities.
Bundled Models¶
These models are included with Recall and ready to use immediately:
| Model | Purpose |
|---|---|
| Llama 3.1 8B Instruct | Text generation for Clerk Edit and Assist |
| Phi-3.5 Mini Instruct | Lightweight text generation alternative |
| ModernBERT | Semantic search embeddings |
| E5-Large-Instruct | Semantic search embeddings (alternative) |
| all-MiniLM-L6-v2 | Lightweight semantic search embeddings |
Optional Downloads¶
ChatQA¶
ChatQA is an advanced model for query planning and evidence selection. It improves the quality of semantic search results and Clerk responses.
To download:
- In the setup wizard, locate the ChatQA section
- Click Download next to your preferred variant:
- Static Q4_K_M — Standard quantized version
- iMatrix Q4_K_M — Optimized quantized version
- Wait for the download to complete (progress is shown)
Download Location¶
Models are stored in:
Selecting Models¶
For Semantic Search¶
- Open Settings → Model Settings
- Under Semantic Search, choose your preferred embedder
- Click Rebuild Index if you change embedders (required to re-index documents)
For Text Generation¶
- Open Settings → Model Settings
- Under Generation, select your preferred generator
- Options include bundled models or any custom GGUF models you've added
Custom Models (Advanced)¶
Recall supports custom GGUF models for both embedding and generation.
Adding Custom Embedders¶
- Place your GGUF embedding model in the models folder
- Open Settings → Model Settings
- Click Refresh Custom Models
- Select your custom embedder from the dropdown
Adding Custom Generators¶
- Place your GGUF generation model in the models folder
- Open Settings → Model Settings
- Click Refresh Custom Models
- Select your custom generator from the dropdown
Model Status¶
The status bar in Recall shows the current state of your AI models:
- Loading — Model is being loaded into memory
- Ready — Model is loaded and available
- Error — Model failed to load (check Settings for details)
Performance Considerations¶
- Larger models produce better results but require more RAM and processing time
- On Macs with Apple Silicon, models run efficiently using Metal acceleration
- If you experience slowness, try switching to a smaller model variant