1
Questions
2
Rubrics
3
Review & Grading
4
Bot URL
5
Tools Config
6
Review & Run
Step 1: Add Test Questions
Enter questions to test your chatbot
CSV can contain questions only, or questions + answers, it will automatically pick
the questions
Step 2: Select Evaluation Rubrics
Choose how to evaluate responses
π Core Quality
π― Quality & Style
π Safety & Compliance
π¬ Conversation Flow
0 rubrics selected
Need something specific?
Above rubrics not sufficient? Define your own criteria below.
π‘ Tip: You can combine predefined rubrics with custom
ones
No custom rubrics yet.
Step 3: Review & Customize Rubrics
Fine-tune your evaluation criteria and select grading strictness
Step 4: Chatbot URL
Enter your chatbot's URL
Step 5: Configure Tools & Settings
Select enabled tools and preferences
Step 6: Review & Run Evaluation
Review your configuration before running
Questions
Loading...
Rubrics
Loading...
Bot URL
Loading...
Tools
Loading...
π Preview Evaluation Prompt
See exactly what criteria and examples will be sent to the AI judge
π No Results Yet
Run an evaluation to see results here
π¬ Hallucination Inspector
Evaluate your chatbot for hallucinations by comparing answers against expected truth
This will auto-fill if you entered a URL in the Eval Workshop
Strict: Minor issues = -15pts, Major = -35pts, Critical = -65pts
Balanced: Minor = -10pts, Major = -25pts, Critical = -50pts
Lenient: Minor = -5pts, Major = -15pts, Critical = -30pts
Balanced: Minor = -10pts, Major = -25pts, Critical = -50pts
Lenient: Minor = -5pts, Major = -15pts, Critical = -30pts
| Question | Expected Answer | |
|---|---|---|
| No Q/A pairs yet. Upload CSV or add manually. | ||
β‘ GEPA Optimizer
Optimize prompts for smaller models using GEPA
These questions will be used to generate a dataset for optimization (minimum 3
recommended)
Select the small/efficient model you want to optimize the prompt for
Maximum number of optimization iterations (default: 5)