Notebooks

Colab notebooks

In these notebooks, we will build a simple AutoFeedback model with ~20 examples and use it to scale human feedback.

Intro to Log10 AutoFeedback (opens in a new tab): Build a custom evaluation model (AutoFeedback) for summary grading. Walks through the flow of creating Feedback (via API or GUI), creating an AutoFeedback model (locally or in the cloud), and running inference on the AutoFeedback model on new LLM calls.

In above tutorials, you will learn how to

  1. Use Log10 AutoFeedback to scale human review of LLM outputs (for online monitoring and alerting, or for offline dataset curation for evals, fine-tuning or prompt improvement)
  2. Use Log10 AutoFeedback to curate a fine-tuning (or prompt optimization) dataset and drive accuracy improvements

An exciting extrapolation of this approach is the idea of self-improvement by alternating improvements in accuracy of base models and AutoFeedback models -- a self-improving AutoFeedback model (with dynamically selected few-shot examples, the superset of which increase over time) was used to curate data to fine-tune (or prompt optimize) the base model, and assess its accuracy improvement. As more data comes in to the AI application, AutoFeedback helps filter the high quality examples that are most likely to yield accuracy improvements.

Alternatively, both positive and negative (or high and low score) examples could be used to fine-tune using approaches such as DPO (or RLHF)

cycle

Where to go next

Depending on whether you want to start with LLM observabiliy, or LLM evaluation for your projects, click on of the following: