Judgment Lists

A Judgment List is a pre-existing set of relevance judgments that you upload to Releval. Unlike inline judgments that are made interactively during an evaluation run, judgment lists come from external sources such as:

Click-through logs or analytics
Previous evaluation campaigns
Crowdsourced relevance assessments
Expert annotations

Once uploaded, a list's grades feed straight into the evaluation metrics for matching queries on the same corpus, so you can score a run against judgments you already have instead of re-judging from scratch.

File format

A judgment list is uploaded as a file with one judgment per row. The file extension selects the parser: JSON Lines (.jsonl / .json / .ndjson), CSV (.csv / .tsv), and Parquet (.parquet) are all supported.

Each row describes a single relevance judgment: a query, the document being judged, and the grade. The grade is an integer that must fall within the range of the scale you choose on upload: binary, graded, or detailed. Optionally, a row can also carry a snapshot of the document itself, which is useful when the originating system may not return the same document text later.

JSON Lines example

{ "query_id": "q1", "query": "running shoes", "doc_id": "sku-1001", "grade": 4 }
{ "query_id": "q1", "query": "running shoes", "doc_id": "sku-1002", "grade": 2 }
{ "query_id": "q2", "query": "winter coat", "doc_id": "sku-2050", "grade": 3 }

CSV example

query_id,query,doc_id,grade
q1,running shoes,sku-1001,4
q1,running shoes,sku-1002,2
q2,winter coat,sku-2050,3

The query_id is just a label you choose to group rows in the file; it is not how grades are matched to evaluation queries — that is the corpus's match mode, which keys off the query text. When round-tripping judgments exported from a Releval evaluation, a row may also carry explicit qid and/or oid columns to target a specific query object; omit them and Releval derives them from the query text.

Uploading a Judgment List

A judgment list belongs to a corpus, so create the corpus first if it doesn't already exist. On upload you also pick the type of the judgments (whether they are explicit ratings, implicit signals derived from behaviour, golden examples, or crowdsourced) and the scale the grades use.

In the UI

Navigate to Judgment Lists and click Upload.
Select the file to upload.
Enter a Name, choose the Corpus it belongs to, the Type, and the Scale the grades use.
Click Upload.

Using the API

curl -X POST "https://${RELEVAL_HOST}/api/v1/judgment-lists/upload" \
-H "Authorization: Bearer ${TOKEN}" \
-F 'name=My Judgment List' \
-F 'corpus_id=${CORPUS_ID}' \
-F 'type=explicit' \
-F 'scale=graded' \
-F 'judgments=@judgments.jsonl'

How judgment lists feed evaluation metrics

An uploaded judgment list contributes its grades to evaluation metrics automatically. There is no attach step: once a list is uploaded against a corpus, any evaluation run on that corpus picks up the grades for every query that matches, and its NDCG, MAP, Precision, and the other metrics are computed from them — exactly as if the grades had been entered by hand.

Matching follows the corpus's match mode: by default a row matches every evaluation query with the same query text. Two things have to line up for a grade to count — the row's grade must fall within the scale you pick on upload (out-of-range grades are rejected), and the document id (doc_id) must equal the candidate id the endpoint returns, or the row has nothing to attach to.

When a query also has interactive or AI judgments

Uploaded grades fill gaps; they never override curated judgments. For any one candidate the highest-priority source present wins, in this order:

Human — an interactive judgment made on the Evaluate page.
AI — a grade from an AI judging run.
List — a grade from an uploaded judgment list.

So a candidate a person judged keeps the person's grade, while a candidate no one has judged inherits the uploaded grade. Deleting a list recomputes the affected metrics back to what they were without it.

Managing Judgment Lists

List All Judgment Lists

curl "https://${RELEVAL_HOST}/api/v1/judgment-lists" \
-H "Authorization: Bearer ${TOKEN}"

You can filter by endpoint type:

curl "https://${RELEVAL_HOST}/api/v1/judgment-lists?endpoint_type=elasticsearch" \
-H "Authorization: Bearer ${TOKEN}"

Delete Judgment Lists

curl -X DELETE "https://${RELEVAL_HOST}/api/v1/judgment-lists?judgment_list_id=${ID}" \
-H "Authorization: Bearer ${TOKEN}"

Training Data Generation

Judgment lists can be used to generate training data for learning-to-rank models. This feature is available for Elasticsearch and OpenSearch endpoints.

curl -X POST "https://${RELEVAL_HOST}/api/v1/features/training" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
--data @- <<EOF
{
  "judgment_list_id": "${JUDGMENT_LIST_ID}",
  "index": "products",
  "query_template_ids": ["${TEMPLATE_ID_1}", "${TEMPLATE_ID_2}"]
}
EOF

This executes each query from the judgment list against the endpoint using the specified templates, extracting features for each judged document. The resulting training data pairs features with relevance grades, suitable for training ranking models.

File format​

JSON Lines example​

CSV example​

Uploading a Judgment List​

In the UI​

Using the API​

How judgment lists feed evaluation metrics​

When a query also has interactive or AI judgments​

Managing Judgment Lists​

List All Judgment Lists​

Delete Judgment Lists​

Training Data Generation​