Judgment Lists
A Judgment List is a pre-existing set of relevance judgments that you upload to Releval. Unlike inline judgments that are made interactively during an evaluation run, judgment lists come from external sources such as:
- Click-through logs or analytics
- Previous evaluation campaigns
- Crowdsourced relevance assessments
- Expert annotations
Once uploaded, a list's grades feed straight into the evaluation metrics for matching queries on the same corpus, so you can score a run against judgments you already have instead of re-judging from scratch.
File format
A judgment list is uploaded as a file with one judgment per row. The file extension
selects the parser: JSON Lines (.jsonl / .json / .ndjson), CSV (.csv / .tsv),
and Parquet (.parquet) are all supported.
Each row describes a single relevance judgment: a query, the document being judged, and the grade. The grade is an integer that must fall within the range of the scale you choose on upload: binary, graded, or detailed. Optionally, a row can also carry a snapshot of the document itself, which is useful when the originating system may not return the same document text later.
JSON Lines example
{ "query_id": "q1", "query": "running shoes", "doc_id": "sku-1001", "grade": 4 }
{ "query_id": "q1", "query": "running shoes", "doc_id": "sku-1002", "grade": 2 }
{ "query_id": "q2", "query": "winter coat", "doc_id": "sku-2050", "grade": 3 }
CSV example
query_id,query,doc_id,grade
q1,running shoes,sku-1001,4
q1,running shoes,sku-1002,2
q2,winter coat,sku-2050,3
The query_id is just a label you choose to group rows in the file; it is not how grades are
matched to evaluation queries — that is the corpus's match mode, which
keys off the query text. When round-tripping judgments exported from a Releval evaluation, a row
may also carry explicit qid and/or oid columns to target a specific query object; omit them and
Releval derives them from the query text.
Uploading a Judgment List
A judgment list belongs to a corpus, so create the corpus first if it doesn't already exist. On upload you also pick the type of the judgments (whether they are explicit ratings, implicit signals derived from behaviour, golden examples, or crowdsourced) and the scale the grades use.
In the UI
- Navigate to Judgment Lists and click Upload.
- Select the file to upload.
- Enter a Name, choose the Corpus it belongs to, the Type, and the Scale the grades use.
- Click Upload.
Using the API
curl -X POST "https://${RELEVAL_HOST}/api/v1/judgment-lists/upload" \
-H "Authorization: Bearer ${TOKEN}" \
-F 'name=My Judgment List' \
-F 'corpus_id=${CORPUS_ID}' \
-F 'type=explicit' \
-F 'scale=graded' \
-F 'judgments=@judgments.jsonl'
How judgment lists feed evaluation metrics
An uploaded judgment list contributes its grades to evaluation metrics automatically. There is no attach step: once a list is uploaded against a corpus, any evaluation run on that corpus picks up the grades for every query that matches, and its NDCG, MAP, Precision, and the other metrics are computed from them — exactly as if the grades had been entered by hand.
Matching follows the corpus's match mode: by default a row matches every
evaluation query with the same query text. Two things have to line up for a grade to count — the
row's grade must fall within the scale you pick on upload (out-of-range
grades are rejected), and the document id (doc_id) must equal the candidate id the endpoint
returns, or the row has nothing to attach to.
When a query also has interactive or AI judgments
Uploaded grades fill gaps; they never override curated judgments. For any one candidate the highest-priority source present wins, in this order:
- Human — an interactive judgment made on the Evaluate page.
- AI — a grade from an AI judging run.
- List — a grade from an uploaded judgment list.
So a candidate a person judged keeps the person's grade, while a candidate no one has judged inherits the uploaded grade. Deleting a list recomputes the affected metrics back to what they were without it.
Managing Judgment Lists
List All Judgment Lists
curl "https://${RELEVAL_HOST}/api/v1/judgment-lists" \
-H "Authorization: Bearer ${TOKEN}"
You can filter by endpoint type:
curl "https://${RELEVAL_HOST}/api/v1/judgment-lists?endpoint_type=elasticsearch" \
-H "Authorization: Bearer ${TOKEN}"
Delete Judgment Lists
curl -X DELETE "https://${RELEVAL_HOST}/api/v1/judgment-lists?judgment_list_id=${ID}" \
-H "Authorization: Bearer ${TOKEN}"
Training Data Generation
Judgment lists can be used to generate training data for learning-to-rank models. This feature is available for Elasticsearch and OpenSearch endpoints.
curl -X POST "https://${RELEVAL_HOST}/api/v1/features/training" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
--data @- <<EOF
{
"judgment_list_id": "${JUDGMENT_LIST_ID}",
"index": "products",
"query_template_ids": ["${TEMPLATE_ID_1}", "${TEMPLATE_ID_2}"]
}
EOF
This executes each query from the judgment list against the endpoint using the specified templates, extracting features for each judged document. The resulting training data pairs features with relevance grades, suitable for training ranking models.