Title
Background
Set of representative questions:
#345 (comment)
#411 (comment)
https://github.com/CodeForPhilly/balancer-main/tree/develop/evaluation
Current State
Acceptance Criteria
Approach
Start with error analysis, not infrastructure. Spend 30 minutes manually reviewing 20-50 LLM outputs whenever you make significant changes. Use one domain expert who understands your users as your quality decision maker (a “benevolent dictator”).
OpenAI API Dashboard has total duration and cost metrics
References
Risks and Rollback
Screenshots / Recordings
Related PR
Title
Background
Set of representative questions:
#345 (comment)
#411 (comment)
https://github.com/CodeForPhilly/balancer-main/tree/develop/evaluation
Current State
Acceptance Criteria
Approach
Start with error analysis, not infrastructure. Spend 30 minutes manually reviewing 20-50 LLM outputs whenever you make significant changes. Use one domain expert who understands your users as your quality decision maker (a “benevolent dictator”).
OpenAI API Dashboard has total duration and cost metrics
References
Risks and Rollback
Screenshots / Recordings
Related PR