Skip to content

Model Downloads

Recall uses AI models for semantic search, text generation, and writing assistance. Several models are bundled with the app, while others can be downloaded for enhanced capabilities.

Bundled Models

These models are included with Recall and ready to use immediately:

Model Purpose
Llama 3.1 8B Instruct Text generation for Clerk Edit and Assist
Phi-3.5 Mini Instruct Lightweight text generation alternative
ModernBERT Semantic search embeddings
E5-Large-Instruct Semantic search embeddings (alternative)
all-MiniLM-L6-v2 Lightweight semantic search embeddings

Optional Downloads

ChatQA

ChatQA is an advanced model for query planning and evidence selection. It improves the quality of semantic search results and Clerk responses.

To download:

  1. In the setup wizard, locate the ChatQA section
  2. Click Download next to your preferred variant:
  3. Static Q4_K_M — Standard quantized version
  4. iMatrix Q4_K_M — Optimized quantized version
  5. Wait for the download to complete (progress is shown)

Download Location

Models are stored in:

~/Library/Application Support/Clericus/Models/

Selecting Models

  1. Open SettingsModel Settings
  2. Under Semantic Search, choose your preferred embedder
  3. Click Rebuild Index if you change embedders (required to re-index documents)

For Text Generation

  1. Open SettingsModel Settings
  2. Under Generation, select your preferred generator
  3. Options include bundled models or any custom GGUF models you've added

Custom Models (Advanced)

Recall supports custom GGUF models for both embedding and generation.

Adding Custom Embedders

  1. Place your GGUF embedding model in the models folder
  2. Open SettingsModel Settings
  3. Click Refresh Custom Models
  4. Select your custom embedder from the dropdown

Adding Custom Generators

  1. Place your GGUF generation model in the models folder
  2. Open SettingsModel Settings
  3. Click Refresh Custom Models
  4. Select your custom generator from the dropdown

Model Status

The status bar in Recall shows the current state of your AI models:

  • Loading — Model is being loaded into memory
  • Ready — Model is loaded and available
  • Error — Model failed to load (check Settings for details)

Performance Considerations

  • Larger models produce better results but require more RAM and processing time
  • On Macs with Apple Silicon, models run efficiently using Metal acceleration
  • If you experience slowness, try switching to a smaller model variant