Docker Compose
To facilitate, speed up, and standardize the Cat's user experience, the Cat lives inside a Docker container. This allows for maximum power, flexibility and portability, but requires a little understanding from you on how docker works.
Cat standalone
services:
cheshire-cat-core:
image: ghcr.io/cheshire-cat-ai/core:latest
container_name: cheshire_cat_core
ports:
- 1865:80
volumes:
- ./static:/app/cat/static
- ./plugins:/app/cat/plugins
- ./data:/app/cat/data
Cat + Ollama
The Cat is model agnostic, meaning you can attach your preferred LLM and embedder model/provider. The Cat supports the most used ones, but you can increase the number of models/providers by plugins, here is a list of the main ones:
- OpenAI and Azure OpenAI
- Cohere
- Ollama
- HuggingFace TextInference API (LLM model only)
- Google Gemini
- Qdrant FastEmbed (Embedder model only)
- (custom via plugin)
Let's see a full local setup for Cat + Ollama.
To make this work properly, be sure your docker engine can see the GPU via NVIDIA docker.
As an alternative you can install Ollama directly on your machine and making the Cat aware of it in the Ollama LLM settings, inserting your local network IP or using host.docker.internal
.
Example compose.yml
:
services:
cheshire-cat-core:
image: ghcr.io/cheshire-cat-ai/core:latest
container_name: cheshire_cat_core
depends_on:
- ollama
ports:
- 1865:80
volumes:
- ./static:/app/cat/static
- ./plugins:/app/cat/plugins
- ./data:/app/cat/data
restart: unless-stopped
ollama:
container_name: ollama_cat
image: ollama/ollama:latest
volumes:
- ./ollama:/root/.ollama
expose:
- 11434
environment:
- gpus=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
For a solid local embedder, choose one from the FastEmbed list.
The Cat will download the embedder in cat/data
and will use it to embed text.
Congratulations, you are now free from commercial LLM providers.
Cat + Qdrant
By default the Core uses an embedded version of Qdrant, based on SQLite.
It is highly recommended to connect the Cat to a full Qdrant instance to increase performance and scalability!
services:
cheshire-cat-core:
image: ghcr.io/cheshire-cat-ai/core:latest
container_name: cheshire_cat_core
depends_on:
- cheshire-cat-vector-memory
env_file:
- .env
ports:
- 1865:80
volumes:
- ./static:/app/cat/static
- ./plugins:/app/cat/plugins
- ./data:/app/cat/data
restart: unless-stopped
cheshire-cat-vector-memory:
image: qdrant/qdrant:latest
container_name: cheshire_cat_vector_memory
expose:
- 6333
volumes:
- ./long_term_memory/vector:/qdrant/storage
restart: unless-stopped
Add this enviroment variables in your .env
file:
# Qdrant server
CCAT_QDRANT_HOST=cheshire_cat_vector_memory # <url of the cluster>
CCAT_QDRANT_PORT=6333 # <port of the cluster, usually 6333>
CCAT_QDRANT_API_KEY="" # optional <api-key>
Cat + Reverse proxy
Cat with Caddy (automatic https)
This is your compose.yml
services:
cat:
image: ghcr.io/cheshire-cat-ai/core:latest
container_name: cheshire_cat_core
env_file:
- .env
expose:
- 80
volumes:
- ./cat/static:/app/cat/static
- ./cat/plugins:/app/cat/plugins
- ./cat/data:/app/cat/data
caddy:
image: caddy:latest
depends_on:
- cat
ports:
- 80:80
- 443:443
- 443:443/udp
volumes:
- ./caddy/Caddyfile:/etc/caddy/Caddyfile
- ./caddy/data:/data
- ./caddy/config:/config
restart: unless-stopped
Add this enviroment variable in your .env
file:
And in caddy/Caddyfile
:
Now you can reach your cat at https://localhost
!