Model Downloads¶

Recall uses AI models for semantic search, text generation, and writing assistance. Several models are bundled with the app, while others can be downloaded for enhanced capabilities.

Bundled Models¶

These models are included with Recall and ready to use immediately:

Model	Purpose
Llama 3.1 8B Instruct	Text generation for Clerk Edit and Assist
Phi-3.5 Mini Instruct	Lightweight text generation alternative
ModernBERT	Semantic search embeddings
E5-Large-Instruct	Semantic search embeddings (alternative)
all-MiniLM-L6-v2	Lightweight semantic search embeddings

Optional Downloads¶

ChatQA¶

ChatQA is an advanced model for query planning and evidence selection. It improves the quality of semantic search results and Clerk responses.

To download:

In the setup wizard, locate the ChatQA section
Click Download next to your preferred variant:
Static Q4_K_M — Standard quantized version
iMatrix Q4_K_M — Optimized quantized version
Wait for the download to complete (progress is shown)

Download Location¶

Models are stored in:

~/Library/Application Support/Clericus/Models/

Selecting Models¶

For Semantic Search¶

Open Settings → Model Settings
Under Semantic Search, choose your preferred embedder
Click Rebuild Index if you change embedders (required to re-index documents)

For Text Generation¶

Open Settings → Model Settings
Under Generation, select your preferred generator
Options include bundled models or any custom GGUF models you've added

Custom Models (Advanced)¶

Recall supports custom GGUF models for both embedding and generation.

Adding Custom Embedders¶

Place your GGUF embedding model in the models folder
Open Settings → Model Settings
Click Refresh Custom Models
Select your custom embedder from the dropdown

Adding Custom Generators¶

Place your GGUF generation model in the models folder
Open Settings → Model Settings
Click Refresh Custom Models
Select your custom generator from the dropdown

Model Status¶

The status bar in Recall shows the current state of your AI models:

Loading — Model is being loaded into memory
Ready — Model is loaded and available
Error — Model failed to load (check Settings for details)

Performance Considerations¶

Larger models produce better results but require more RAM and processing time
On Macs with Apple Silicon, models run efficiently using Metal acceleration
If you experience slowness, try switching to a smaller model variant