Getting Started
This guide takes you from zero to your first measured evaluation run. By the end you'll have Releval running locally, a search endpoint configured, a query set executing against it, and NDCG / MAP / Precision metrics on the results.
Releval is self-hosted. The quickest install path is Docker Compose, which is what this guide uses.
Prerequisites
- Docker and Docker Compose, recent versions; Compose v2 syntax is assumed throughout
- A search system to evaluate. Releval supports Elasticsearch, OpenSearch, Solr, Vespa, any HTTP-based search API, and any rendered search results page. See Search Endpoints for the full list.
- Roughly 2 GB of free disk for the database volumes after initial setup, plus headroom for ClickHouse if you intend to ingest user behaviour events.
Run with Docker Compose
Save the following as docker-compose.yaml:
services:
releval:
image: releval/releval:latest
container_name: releval
ports:
- "8080:8080"
environment:
# Must be set to Y to indicate acceptance of the EULA
- ACCEPT_EULA=Y
- CONNECTIONSTRINGS__POSTGRES=Host=postgres;Port=5432;Database=releval;Username=postgres;Password=password
# ClickHouse powers User Behaviour Insights and is optional. Omit this
# connection string (and the clickhouse service) to run without UBI: its
# APIs return 404 and the UI hides the Insights section, while the rest of
# Releval is unaffected.
- CONNECTIONSTRINGS__CLICKHOUSE=Host=clickhouse;Port=8123;Username=default;password=;Database=default;Compression=false
- DATAPROTECTION__APPLICATIONNAME=Releval
- DATAPROTECTION__KEYSDIRECTORY=/app/keys
# First-run only. Used to create the initial Admin user. Unset both
# values once an Admin exists and you've changed the password.
- RELEVAL_INITIAL_ADMIN_EMAIL=you@example.com
- RELEVAL_INITIAL_ADMIN_PASSWORD=ChangeMeNow1234!
volumes:
- releval-files:/app/files
- releval-keys:/app/keys
depends_on:
postgres:
condition: service_healthy
clickhouse:
condition: service_healthy
postgres:
image: postgres:18.3
container_name: postgres
ports:
- "5432:5432"
environment:
- POSTGRES_DB=releval
- POSTGRES_PASSWORD=password
volumes:
- postgres-data:/var/lib/postgresql/data
shm_size: '2gb'
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
clickhouse:
image: clickhouse/clickhouse-server:25.8.22
container_name: clickhouse
hostname: clickhouse
ports:
- "8123:8123"
volumes:
- clickhouse-data:/var/lib/clickhouse
ulimits:
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD-SHELL", "clickhouse-client --query 'SELECT 1'"]
interval: 5s
timeout: 5s
retries: 5
volumes:
releval-files:
releval-keys:
postgres-data:
clickhouse-data:
Three services back the app:
- PostgreSQL stores evaluations, query sets, judgments, and member accounts.
- ClickHouse stores high-volume user behaviour insights data.
It is optional: drop the
clickhouseservice and its connection string to run without User Behaviour Insights, and the rest of Releval is unaffected. - Releval is the API + UI server. The schema is created (and on later upgrades,
migrated) automatically on first startup. An initial Admin user is created from
RELEVAL_INITIAL_ADMIN_EMAILandRELEVAL_INITIAL_ADMIN_PASSWORDif no Admin yet exists; the variables are ignored thereafter.
Bring everything up:
docker compose up -d
The app waits for both databases' healthchecks before it starts. To watch progress, run
docker compose ps and confirm every service reaches healthy. Releval is then available
at http://localhost:8080.
Releval also exposes its own health probes for orchestration: a liveness probe at
/health/live and a readiness probe at /health/ready (the latter also checks the
PostgreSQL connection). The container image uses /health/live as its Docker healthcheck.
Sign in
Use the email and password you set in RELEVAL_INITIAL_ADMIN_EMAIL and
RELEVAL_INITIAL_ADMIN_PASSWORD. The Admin is created on the first startup where no
Admin yet exists; on later startups the bootstrap variables are ignored.
If you forget to set the variables, Releval logs a warning on every startup until an
Admin exists — set them, restart, and unset RELEVAL_INITIAL_ADMIN_PASSWORD.
Unset RELEVAL_INITIAL_ADMIN_PASSWORD after the Admin is created. Container env vars
are visible via docker inspect and /proc/PID/environ. While you're at it, configure
an authentication provider (GitHub, Google, OIDC)
and invite your team so day-to-day work
doesn't rely on the bootstrap Admin. Inviting additional members requires a Team or
Enterprise license; the default embedded license
is Individual, which permits a single member.
The home page
New members land on a home page that doubles as a setup checklist: it walks you through connecting a search endpoint, defining a query set, creating a query template, and running your first evaluation, ticking items off as you complete them. Once at least one evaluation run exists, the home page becomes a workspace dashboard — recent runs and their status, a per-status run summary, your current usage against the license limits, and the latest metric trend.
The five steps below mirror that checklist.
Your First Evaluation
The shortest path from a fresh install to numbers you can act on is five steps. Each one links to the page that covers it in depth; skim those if you want context, or follow the shortcut version here.
Connect a search endpoint
A search endpoint tells Releval where to send queries. Create one from Endpoints → New with:
- A name (anything memorable)
- The base URL of your search system
- The endpoint type:
elasticsearch,opensearch,solr,vespa,api, orpage - An authentication method if your system requires it
- A candidates mapping describing how to extract result IDs and titles from the response
Use Test Endpoint to send a sample query and inspect the parsed candidates before saving.
Define your queries
A query set is the list of searches you want to score. Create one from Query Sets → New and paste real searches users run (or upload them as JSONL).
Then create a query template: a parameterised request body that
embeds each query into the format your endpoint expects (e.g. an Elasticsearch
multi_match, a Solr q parameter). Templates are reusable across endpoints, so you can
A/B-test ranking changes by swapping templates without touching the queries themselves.
Create and run the evaluation
An evaluation ties an endpoint, a query set, and a template together. Create one, then start a run; Releval executes every query in parallel, captures the responses, and prepares them for judgment.
Judge the results
Releval can't compute relevance metrics without knowing which results are actually relevant. There are two ways to provide that signal:
- Manually: open the run, walk through the candidates, and rate each one. See Judging results.
- With an AI Judge: configure a provider (OpenAI, Anthropic, Bedrock, Azure OpenAI, Ollama, or any OpenAI-compatible endpoint) and have an LLM rate the candidates against your prompt template. Useful when query sets are too large to judge by hand.
Pick a scale (binary, graded, or detailed) that matches
how nuanced your judgments need to be.
Review the metrics
Once judgments are in, the run shows your chosen metrics (NDCG, MAP, MRR, ERR, Precision, Recall, …) at the run level and per-query. Clone the run to compare it against a tweaked endpoint or template; this is the main loop for measuring ranking changes.
Beyond the UI
Everything you can do in the UI is also available via the REST API, the gRPC API, and an MCP server that exposes the same operations to AI agents. Set up an App Client to authenticate API calls without sharing your member credentials, and you can drive evaluations entirely from CI to gate ranking changes on relevance regressions.
Next Steps
- Get production-ready: authentication providers, email, data protection keys, and data storage configuration.
- Pick the right scale and metrics for your evaluation methodology.
- Capture real user clicks via the User Behaviour Insights and use them as implicit signals.
- Add teammates from Administration → Members and Roles.