Prompt Order Experiment

Results

Here we have a sortable table of our experiments and the results


Performance Star Chart

The chart below shows how each model performed across the most popular top 10 topics by row count. Each line represents a model, and the radial axis represents accuracy.

Make sure you explore what happeened between:

  • Base Model -> Final Answer
  • Base Model -> Reasoning (Both models) Final Answer
  • Base Model -> Final Answer Reasoning (Both models)