Large Language Models (LLMs) like GPT-4, or open-source models such as Mistral, have unlocked incredible ways to generate text, solve problems, and even reason with complex queries. But to make use of their true power, we need to learn the art of prompt engineering.
In this post, I will go through:
- What prompt engineering means
- Popular techniques
- Basic prompt
- Zero-shot prompt
- One-shot prompt
- Few-shot prompt
- Chain-of-Thought prompting
- Self-consistency
- And show you how to implement them using open-source tools, entirely free.
Thanks to Hugging Face’s transformers library and excellent open models, you can run everything on Google Colab, explore a wide range of models like FLAN-T5, Llama, Mistral and learn without worrying about costs.
Note: I am using Google Colab for running the python scripts used in this blog. Also, you need to sign up for Hugging Face (https://huggingface.co/) to follow along with me.
Why Mistral?
mistralai/Mistral-7B-Instruct-v0.3 is a cutting-edge open instruct-tuned LLM:
- 7 billion parameters, strong general reasoning.
- Fine-tuned on instruction data; follows your prompts precisely.
- Perfect for chain-of-thought style questions.
- Available on Hugging Face with a free account + simple read token. (You just need a read token).
Note: I tried with open source LLMs like google/flan-t5-small but it failed with proper reasoning while performing tests on Chain-of-Thought (CoT) and Self-consistency. So has to go with high end model for this blog.
Getting started
Run this to install the tools:
!pip install transformers
!pip install torch
!pip install huggingface-hub
Then authenticate (simple read token):
from huggingface_hub import login
login("your_hugging_face_read_token_here")
Load Mistral pipeline:
from transformers import pipeline
generator = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3")
Prompt Engineering Concepts with Mistral
Basic Prompt
This is the simplest way to interact with an LLM: All you have to do is ask a question.
Python script:
#Basic Prompt
output = generator("What is machine learning?", max_new_tokens=50)
print(output[0]['generated_text']
Output:
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
What is machine learning?
Machine learning is a subfield of artificial intelligence that focuses on designing algorithms and models that can learn from data and make predictions or decisions without being explicitly programmed. Machine learning algorithms use statistical techniques to identify patterns in data and make predictions based on
What happens here?
- The LLM reads the question and provides a direct answer.
Zero-short Prompt
Zero-shot learning is when you ask the model to perform a task it has never seen examples for. It just relies on its training alone. Ask it to do a task like classification without examples.
Python script:
#Zero-shot Prompt
output = generator("Classify the sentiment: 'I absolutely love this product!'", max_new_tokens=50)
print(output[0]['generated_text'])
Output:
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Classify the sentiment: 'I absolutely love this product!'
positive
Explanation:
The sentiment of the statement 'I absolutely love this product!' is positive. The words 'love' and 'absolutely' show strong positive emotions towards the product.
No examples given. Model infers the label from its knowledge.
One-shot Prompt
With One-shot prompting, you give a single example to guide the model.
Python script:
#One-shot Prompt
prompt = """
Classify sentiment.
Example:
Text: 'I hate waiting.'
Sentiment: Negative
Now classify this:
Text: 'This is amazing!'
Sentiment:
"""
output = generator(prompt, max_new_tokens=50)
print(output[0]['generated_text'])
Output:
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Classify sentiment.
Example:
Text: 'I hate waiting.'
Sentiment: Negative
Now classify this:
Text: 'This is amazing!'
Sentiment:
Answer: Positive
The LLM sees one example, learns the format, and applies it.
Few-shot Prompt
Few-shot prompting is giving multiple examples, improving accuracy for structured tasks.
Python script:
#Few-shot Prompt
prompt = """
Classify sentiment.
Examples:
Text: 'Horrible service.'
Sentiment: Negative
Text: 'I am so happy!'
Sentiment: Positive
Text: 'It was okay.'
Sentiment: Neutral
Now classify this:
Text: 'Absolutely fantastic job.'
Sentiment:
"""
output = generator(prompt, max_new_tokens=500)
print(output[0]['generated_text'])
Output:
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Classify sentiment.
Examples:
Text: 'Horrible service.'
Sentiment: Negative
Text: 'I am so happy!'
Sentiment: Positive
Text: 'It was okay.'
Sentiment: Neutral
Now classify this:
Text: 'Absolutely fantastic job.'
Sentiment:
Answer: Positive
This is very effective for short classification tasks.
Chain-of-Thought (CoT) Prompting
With CoT, we explicitly ask the model to think step by step.
Python script:
#Chain-of-Thought (CoT)
prompt = """
Solve step by step:
If I had 60 chocolates and gave half to my friend, then gave 10 more to another friend, how many chocolates do I have left?
Answer:
"""
output = generator(prompt, max_new_tokens=500, temperature=0.7)
print(output[0]['generated_text'])
Output:
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Solve step by step:
If I had 60 chocolates and gave half to my friend, then gave 10 more to another friend, how many chocolates do I have left?
Answer:
1. First, you give half of the chocolates to your friend. In this case, half of 60 is 30 (60 / 2 = 30). So, you have 60 - 30 = 30 chocolates left.
2. Then, you give 10 more chocolates to another friend. So, you have 30 - 10 = 20 chocolates left.
You have 20 chocolates left.
This makes the model “show its work”, reducing logical mistakes.
Self-consistency
This goes a step further. We can run multiple CoT prompts, then pick the most common final answer.
Python script:
#Self-consistency with Mistral
import re
from collections import Counter
answers = []
for i in range(5):
out = generator(prompt, max_new_tokens=500, temperature=0.7)
text = out[0]['generated_text']
answers.append(text)
print(f"Run {i+1}:\n{text}\n")
def extract_number(text):
matches = re.findall(r"\d+", text)
return matches[-1] if matches else None
final_numbers = [extract_number(ans) for ans in answers]
print("Most consistent final answer:", Counter(final_numbers).most_common(1))
Output:
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Run 1:
Solve step by step:
If I had 60 chocolates and gave half to my friend, then gave 10 more to another friend, how many chocolates do I have left?
Answer:
First, let's calculate how many chocolates you gave away:
1. Half of 60 chocolates: 60 / 2 = 30 chocolates given to the first friend.
2. Then you gave 10 more chocolates: 30 + 10 = 40 chocolates given away in total.
Now, to find out how many chocolates you have left:
1. Subtract the number of chocolates given away from the original amount: 60 - 40 = 20.
So, you have 20 chocolates left.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Run 2:
Solve step by step:
If I had 60 chocolates and gave half to my friend, then gave 10 more to another friend, how many chocolates do I have left?
Answer:
Let's solve this step by step:
1. You initially had 60 chocolates.
2. You gave half of them to your friend, which means you gave 60 / 2 = 30 chocolates. This leaves you with 60 - 30 = 30 chocolates.
3. Then, you gave 10 more to another friend, so you subtract these 10 from the remaining chocolates. This leaves you with 30 - 10 = 20 chocolates.
So, you have 20 chocolates left.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Run 3:
Solve step by step:
If I had 60 chocolates and gave half to my friend, then gave 10 more to another friend, how many chocolates do I have left?
Answer:
Let's break this down step by step:
1. You start with 60 chocolates.
2. You give half of them to your friend, which means you give away 60 / 2 = 30 chocolates.
3. After giving 30 chocolates to your first friend, you have 60 - 30 = 30 chocolates left.
4. Then, you give 10 more chocolates to another friend, so you subtract 10 from the remaining chocolates: 30 - 10 = 20 chocolates.
So, you have 20 chocolates left after giving some to your friends.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Run 4:
Solve step by step:
If I had 60 chocolates and gave half to my friend, then gave 10 more to another friend, how many chocolates do I have left?
Answer:
1. Start with the total number of chocolates you had: 60
2. Give half to your friend, which is 60 / 2 = 30 chocolates. You are left with 60 - 30 = 30 chocolates.
3. Give 10 more to another friend. Subtract 10 from the remaining chocolates: 30 - 10 = 20 chocolates.
Therefore, you have 20 chocolates left.
Run 5:
Solve step by step:
If I had 60 chocolates and gave half to my friend, then gave 10 more to another friend, how many chocolates do I have left?
Answer:
1. Initially, you had 60 chocolates.
2. You gave half to your friend, which means you gave away 60 / 2 = 30 chocolates.
3. After giving 30 chocolates, you had 60 - 30 = 30 chocolates left.
4. Then, you gave 10 more chocolates to another friend, so you gave away an additional 10 chocolates.
5. After giving away the additional 10 chocolates, you had 30 - 10 = 20 chocolates left.
So, you have 20 chocolates left.
Most consistent final answer: [('20', 5)]
This is powerful:
- Different “chains of thought” emerge on each run.
- You then take the most agreed upon answer, reducing errors.
Summary Table
| Technique | Example |
| Basic prompt | “What is ML?” |
| Zero-shot | “Classify: ‘I love this!'” |
| One-shot | Give one example. |
| Few-shot | Give multiple examples. |
| Chain-of-Thought | “Solve step by step…” |
| Self-consistency | Run multiple steps, take majority. |