LLM Evaluation Basics

This course led by Laurie Voss -- Head of Developer Relations at Arize AI and a former founder at NPM -- dives into the importance of evals in AI systems and how they serve as a testing mechanism for outputs that are inherently variable. It covers two types of evals: code evals for deterministic checks and LLM-as-a-Judge evals for more nuanced assessments. It then walks you through the process of setting up tracing, writing code evals, and configuring LLM judges, emphasizing the need for clear criteria and structured prompts. Start small with one code eval and one LLM eval to identify patterns and improve your outputs. Finally, explore the Arize-Phoenix documentation and evaluation tutorials to implement these strategies in your projects.

Course content

2 sections | 2 lessons