Prompt Order Experiment

Overview

Results

Prompt Order Experiment

This experiment aims to explore various scenarios for prompt fine-tuning using structured generation. We'll test how the order of elements in a prompt affects model performance. The elements we consider are:

(Q): Question
(AC): Answer Choices
(R): Reasoning
(FA): Final Answer

Scenarios

We will evaluate the following prompt orders:

Scenario 1: Q - AC - R - FA (Falcon and GPT3.5)

This is the most natural order. The model generates reasoning before the final answer, providing the most information prior to making a selection. This order leverages decoding mechanics effectively.

This is our user message, we can see the question and answer choices

Click to show prompt!

{'content': 'Answer the Question and include your reasoning and the final answer in a json like: {"reasoning": <reasoning about the answer>, "final_answer": <letter corresponding to the answer>}.', 'role': 'system'}

This is our assistant message, you can see that we are forcing a JSON (note I added spacing for visual purposes), and we are putting the reasoning first. Using a JSON in fine-tuning will improve our structured generation results as the model will get used to responding in that "space".

{'content': 'Question: What can genetic material have?\nAnswer Choices: (a) Resistance (b) Mutations (c) Clorophyll (d) Nucleotide (e) Symmetry (f) Allow growth (g) Contamination (h) Warmth', 'role': 'user'}

Scenario 2: Q - AC - FA - R (Falcon and GPT3.5)

An awkward order, placing reasoning after the final answer. While it is faster, it assumes the model can "know" reasoning internally before generating it. This approach saves tokens but is a skeptical case worth testing.

Click to show prompt!

{'content': 'Answer the Question and include your Final Answer and the Reasoning in a json like: {"final_answer": <letter corresponding to the answer>, "reasoning": <reasoning about the answer>}.', 'role': 'system'}

{'content': 'Question: What can genetic material have?\nAnswer Choices: (a) Resistance (b) Mutations (c) Clorophyll (d) Nucleotide (e) Symmetry (f) Allow growth (g) Contamination (h) Warmth', 'role': 'user'}

Scenario 3: Q - AC - FA

This serves as a fine-tuning control. No reasoning is provided in the output.

Scenario 4: Base

An un-fine-tuned control for comparison purposes.

Structured Generation

Structured generation ensures consistent response formats, which is crucial for reliable fine-tuning. Initial experiments faced difficulties with response consistency and structured generation can solve this.