Getting Started

This guide takes you from zero to your first measured evaluation run. By the end you'll have Releval running locally, a search endpoint configured, a query set executing against it, and NDCG / MAP / Precision metrics on the results.

Releval is self-hosted. The quickest install path is Docker Compose, which is what this guide uses.

Prerequisites

Docker and Docker Compose, recent versions; Compose v2 syntax is assumed throughout
A search system to evaluate. Releval supports Elasticsearch, OpenSearch, Solr, Vespa, any HTTP-based search API, and any rendered search results page. See Search Endpoints for the full list.
Roughly 2 GB of free disk for the database volumes after initial setup, plus headroom for ClickHouse if you intend to ingest user behaviour events.

Run with Docker Compose

Save the following as docker-compose.yaml:

services:
  releval:
    image: releval/releval:latest
    container_name: releval
    ports:
      - "8080:8080"
    environment:
      # Must be set to Y to indicate acceptance of the EULA
      - ACCEPT_EULA=Y 
      - CONNECTIONSTRINGS__POSTGRES=Host=postgres;Port=5432;Database=releval;Username=postgres;Password=password
      # ClickHouse powers User Behaviour Insights and is optional. Omit this
      # connection string (and the clickhouse service) to run without UBI: its
      # APIs return 404 and the UI hides the Insights section, while the rest of
      # Releval is unaffected.
      - CONNECTIONSTRINGS__CLICKHOUSE=Host=clickhouse;Port=8123;Username=default;password=;Database=default;Compression=false
      - DATAPROTECTION__APPLICATIONNAME=Releval
      - DATAPROTECTION__KEYSDIRECTORY=/app/keys
      # First-run only. Used to create the initial Admin user. Unset both
      # values once an Admin exists and you've changed the password.
      - RELEVAL_INITIAL_ADMIN_EMAIL=you@example.com
      - RELEVAL_INITIAL_ADMIN_PASSWORD=ChangeMeNow1234!
    volumes:
      - releval-files:/app/files
      - releval-keys:/app/keys
    depends_on:
      postgres:
        condition: service_healthy
      clickhouse:
        condition: service_healthy

  postgres:
    image: postgres:18.3
    container_name: postgres
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_DB=releval
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    shm_size: '2gb'
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  clickhouse:
    image: clickhouse/clickhouse-server:25.8.22
    container_name: clickhouse
    hostname: clickhouse
    ports:
      - "8123:8123"
    volumes:
      - clickhouse-data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD-SHELL", "clickhouse-client --query 'SELECT 1'"]
      interval: 5s
      timeout: 5s
      retries: 5

volumes:
  releval-files:
  releval-keys:
  postgres-data:
  clickhouse-data:

Three services back the app:

PostgreSQL stores evaluations, query sets, judgments, and member accounts.
ClickHouse stores high-volume user behaviour insights data. It is optional: drop the clickhouse service and its connection string to run without User Behaviour Insights, and the rest of Releval is unaffected.
Releval is the API + UI server. The schema is created (and on later upgrades, migrated) automatically on first startup. An initial Admin user is created from RELEVAL_INITIAL_ADMIN_EMAIL and RELEVAL_INITIAL_ADMIN_PASSWORD if no Admin yet exists; the variables are ignored thereafter.

Bring everything up:

docker compose up -d

The app waits for both databases' healthchecks before it starts. To watch progress, run docker compose ps and confirm every service reaches healthy. Releval is then available at http://localhost:8080.

Releval also exposes its own health probes for orchestration: a liveness probe at /health/live and a readiness probe at /health/ready (the latter also checks the PostgreSQL connection). The container image uses /health/live as its Docker healthcheck.

Use the email and password you set in RELEVAL_INITIAL_ADMIN_EMAIL and RELEVAL_INITIAL_ADMIN_PASSWORD. The Admin is created on the first startup where no Admin yet exists; on later startups the bootstrap variables are ignored.

If you forget to set the variables, Releval logs a warning on every startup until an Admin exists — set them, restart, and unset RELEVAL_INITIAL_ADMIN_PASSWORD.

Caution

Unset RELEVAL_INITIAL_ADMIN_PASSWORD after the Admin is created. Container env vars are visible via docker inspect and /proc/PID/environ. While you're at it, configure an authentication provider (GitHub, Google, OIDC) and invite your team so day-to-day work doesn't rely on the bootstrap Admin. Inviting additional members requires a Team or Enterprise license; the default embedded license is Individual, which permits a single member.

The home page

New members land on a home page that doubles as a setup checklist: it walks you through connecting a search endpoint, defining a query set, creating a query template, and running your first evaluation, ticking items off as you complete them. Once at least one evaluation run exists, the home page becomes a workspace dashboard — recent runs and their status, a per-status run summary, your current usage against the license limits, and the latest metric trend.

The five steps below mirror that checklist.

Your First Evaluation

The shortest path from a fresh install to numbers you can act on is five steps. Each one links to the page that covers it in depth; skim those if you want context, or follow the shortcut version here.

Connect a search endpoint

A search endpoint tells Releval where to send queries. Create one from Endpoints → New with:

A name (anything memorable)
The base URL of your search system
The endpoint type: elasticsearch, opensearch, solr, vespa, api, or page
An authentication method if your system requires it
A candidates mapping describing how to extract result IDs and titles from the response

Use Test Endpoint to send a sample query and inspect the parsed candidates before saving.

Define your queries

A query set is the list of searches you want to score. Create one from Query Sets → New and paste real searches users run (or upload them as JSONL).

Then create a query template: a parameterised request body that embeds each query into the format your endpoint expects (e.g. an Elasticsearch multi_match, a Solr q parameter). Templates are reusable across endpoints, so you can A/B-test ranking changes by swapping templates without touching the queries themselves.

Create and run the evaluation

An evaluation ties an endpoint, a query set, and a template together. Create one, then start a run; Releval executes every query in parallel, captures the responses, and prepares them for judgment.

Judge the results

Releval can't compute relevance metrics without knowing which results are actually relevant. There are two ways to provide that signal:

Manually: open the run, walk through the candidates, and rate each one. See Judging results.
With an AI Judge: configure a provider (OpenAI, Anthropic, Bedrock, Azure OpenAI, Ollama, or any OpenAI-compatible endpoint) and have an LLM rate the candidates against your prompt template. Useful when query sets are too large to judge by hand.

Pick a scale (binary, graded, or detailed) that matches how nuanced your judgments need to be.

Review the metrics

Once judgments are in, the run shows your chosen metrics (NDCG, MAP, MRR, ERR, Precision, Recall, …) at the run level and per-query. Clone the run to compare it against a tweaked endpoint or template; this is the main loop for measuring ranking changes.

Beyond the UI

Everything you can do in the UI is also available via the REST API, the gRPC API, and an MCP server that exposes the same operations to AI agents. Set up an App Client to authenticate API calls without sharing your member credentials, and you can drive evaluations entirely from CI to gate ranking changes on relevance regressions.

Next Steps

Get production-ready: authentication providers, email, data protection keys, and data storage configuration.
Pick the right scale and metrics for your evaluation methodology.
Capture real user clicks via the User Behaviour Insights and use them as implicit signals.
Add teammates from Administration → Members and Roles.

Prerequisites​

Run with Docker Compose​

Sign in​

The home page​

Your First Evaluation​

Connect a search endpoint​

Define your queries​

Create and run the evaluation​

Judge the results​

Review the metrics​

Beyond the UI​

Next Steps​