The Crucial Role of AI Evaluations

Building applications with AI and large language models (LLMs) adds a new layer of variability that makes testing difficult. Since interactions can be subtly different each time but still correct, traditional testing tools often fail. In this talk, we'll share our journey using the open-source tool deepeval to evaluate multi-turn conversations, tune prompts, and ensure our AI applications are reliable so that we could sleep at night.

Voir les 193 présentations

Michael Dawson

Red Hat

Michael Dawson is a Senior Principal Software Engineer at Red Hat, with a focus on building AI and LLM applications as part of the ecosystem engineeing team. Before moving into the AI space, he was a key contributor to Node.js as the technical lead for IBM and Red Hat's team and a member of the Node.js Technical Steering Committee.

The Crucial Role of AI Evaluations

Michael Dawson

Commandité par