Colab notebooks
In these notebooks, we will build a simple AutoFeedback model with ~20 examples and use it to scale human feedback.
-
How to use Log10 for accuracy improvement using the OpenAI Fine-tuning API (opens in a new tab): We'll use the scaled feedback to fine-tune a base model to improve the accuracy of our LLM application
-
How to use Log10 for accuracy improvement using DSPy for automated prompt optimization (opens in a new tab): We'll use the scaled feedback to use DSPy's automated prompt optimizers to improve the accuracy of our LLM application
Intro to Log10 AutoFeedback (opens in a new tab): Build a custom evaluation model (AutoFeedback) for summary grading. Walks through the flow of creating Feedback (via API or GUI), creating an AutoFeedback model (locally or in the cloud), and running inference on the AutoFeedback model on new LLM calls.
In above tutorials, you will learn how to
- Use Log10 AutoFeedback to scale human review of LLM outputs (for online monitoring and alerting, or for offline dataset curation for evals, fine-tuning or prompt improvement)
- Use Log10 AutoFeedback to curate a fine-tuning (or prompt optimization) dataset and drive accuracy improvements
An exciting extrapolation of this approach is the idea of self-improvement by alternating improvements in accuracy of base models and AutoFeedback models -- a self-improving AutoFeedback model (with dynamically selected few-shot examples, the superset of which increase over time) was used to curate data to fine-tune (or prompt optimize) the base model, and assess its accuracy improvement. As more data comes in to the AI application, AutoFeedback helps filter the high quality examples that are most likely to yield accuracy improvements.
Alternatively, both positive and negative (or high and low score) examples could be used to fine-tune using approaches such as DPO (or RLHF)
Where to go next
Depending on whether you want to start with LLM observabiliy, or LLM evaluation for your projects, click on of the following: