Add LLM Output Evaluator evals.py README

sahilds1 · sahilds1 · commit a2a9b1ac38b2 · 2025-06-05T15:33:02.000-04:00
diff --git a/server/api/services/README.md b/server/api/services/README.md
@@ -0,0 +1,36 @@
+
+# LLM Output Evaluator
+
+This script evaluates the outputs of Large Language Models (LLMs) and estimates the associated token usage and cost.
+
+It supports batch evalaution via a configuration CSV and produces a detailed metrics report in CSV format.
+
+## Usage
+
+Ensure you have the `lighteval` library and any model SDKs (e.g., OpenAI, Anthropic) configured properly.
+
+
+```bash
+python evals.py --config path/to/config.csv --reference path/to/reference.csv --output path/to/results.csv
+```
+
+The arguments to the script are:
+
+- Path to the config CSV file (model, query, context)
+- Path to the reference CSV file
+- Path where the evaluation resulst will be saved
+
+
+The script outputs a CSV with the following columns:
+
+* Evaluates LLM outputs for:
+
+  * Extractiveness Coverage
+  * Extractiveness Density
+  * Extractiveness Compression
+
+* Computes:
+
+  * Token usage (input/output)
+  * Estimated cost in USD
+  * Duration (in seconds)