Intro to Prompt Engineering and Chatting with LLMs

When working with large language models (LLMs), your input matters just as much as the model itself. This post introduces the core ideas behind prompting, interacting with LLMs, and optimizing conversations effectively.

🧠 GIGO: Garbage In, Garbage Out

A fundamental principle to remember — the quality of output from an LLM depends on the quality of the input you provide.
Poor prompts = poor responses. Simple as that.

🍓 Strawberry (example) Problem in LLMs

Model used: GPT-4.1 Mini

Let's revisit a seemingly simple prompt:

Prompt:
How many ‘r’ letters are there in the word "strawberry"?

Expected Answer:

There are 3 'r' letters in "strawberry".

What Actually Happens:

Some LLMs — especially smaller or improperly prompted ones — may incorrectly respond with 2, 4, or other numbers.

🧠 Why Does This Happen?

This illustrates a classic limitation in LLMs when handling token-based vs character-based reasoning.

Here’s why errors can occur:

LLMs don’t "see" letters the same way humans do. They work on tokens, which are often chunks of words or characters — not single letters.
For smaller or less capable models, basic counting tasks aren’t trivial unless the prompt is extremely clear or structured.

🔍 Fixing It with Better Prompting

Try prompting like this instead:

“Count how many times the letter ‘r’ appears in the word ‘strawberry’. Show your steps.”

This encourages the model to use step-by-step reasoning, e.g.:

“The word 'strawberry' has the letters: s, t, r, a, w, b, e, r, r, y.
I can see the letter 'r' appears 3 times.”

This small example — counting the 'r's in "strawberry" — highlights how Prompt Engineering plays a crucial role in accuracy and trust when working with LLMs.

🗣️ What is Prompting?

Prompting is how you interact with an LLM. It defines what kind of output you get, and how reliably.

Different models may require different prompting styles. Here's a breakdown of commonly used ones.

🧾 Prompting Styles

1. Alpaca Format (Used by Meta's LLaMA)

This follows an instruction-input-response format:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:

ref: https://github.com/tatsu-lab/stanford_alpaca

2. ChatML (Used by OpenAI)

This is a message-based format used by OpenAI's chat models:

[
  {"role": "system", "content": "You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible."},
  {"role": "user", "content": "How are you?"},
  {"role": "assistant", "content": "I am doing well."}
]

ChatML is currently considered the industry standard for interacting with LLMs.

3. INST Format

Used in some instruction-tuned models:

[INST] What is the capital of France? [/INST]

Setting Up OpenAI SDK

To interact with OpenAI's models via code, follow these steps:

Install the SDK:

pip install openai

Secure Your API Key:

Store your key in a .env file like this:

OPENAI_API_KEY=<your_api_key>

from dotenv import load_dotenv
load_dotenv()

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model = "gpt-4.1-mini",
    messages = [
        {"role":"user","content": "Hey, My name is Vivek"},
        {"role":"assistant","content": "Hey Vivek! How can I help you today?"},
    ]
)
print(response.choices[0].message.content)

Chatting with LLMs: Understanding Statelessness

LLMs are stateless — they don't remember previous messages unless you include them again in the next prompt.

Managing Context with Long Conversations

Context Window: For example, GPT-4.1 Mini supports up to 1 million tokens.
But sending all 400 past messages to maintain context would be inefficient.
A common technique is the Sliding Window:
- Retain only the last 100 messages for context.
- Summarize the previous 300 messages and include them as a single message.
- Total: 101 messages, optimizing both performance and memory usage.

Using System Prompts to Control LLM Behavior

If you want to control the personality, tone, or behavior of an LLM, use a System Prompt.

Example use cases include:

Making the assistant formal/informal
Instructing it to behave like a tutor, programmer, or creative writer
Enforcing constraints on the type of responses it generates

Apps like v0.dev, Vercel AI SDK, and Cursor rely heavily on system prompts.

👉 Example prompt used in v0.dev’s open-source system prompt

Prompting Techniques

1. Zero-Shot Prompting

The model is given a task without any examples.
Often paired with a system prompt to ensure consistent behavior.
Example:

“Translate the following sentence to French: 'I love programming.'”

2. Few-Shot Prompting

A few examples are provided before the actual task.
Helps guide the model toward the desired format or style.

3. Chain-of-Thought (CoT) Prompting

Encourages the model to "think aloud" by breaking down its reasoning process.
Useful for math problems, logic, and multi-step reasoning.

4. Self-Consistency Prompting

The same question is posed to the LLM multiple times.
Outputs are compared to find the most consistent or common response.
This can improve the accuracy of complex queries.

Intro to Prompt Engineering and Chatting with LLMs

🧠 GIGO: Garbage In, Garbage Out

🍓 Strawberry (example) Problem in LLMs

Expected Answer:

What Actually Happens:

🧠 Why Does This Happen?

🔍 Fixing It with Better Prompting

🗣️ What is Prompting?

🧾 Prompting Styles

1. Alpaca Format (Used by Meta's LLaMA)

2. ChatML (Used by OpenAI)

3. INST Format

Setting Up OpenAI SDK

Chatting with LLMs: Understanding Statelessness

Managing Context with Long Conversations

Using System Prompts to Control LLM Behavior

Prompting Techniques

1. Zero-Shot Prompting

2. Few-Shot Prompting

3. Chain-of-Thought (CoT) Prompting

4. Self-Consistency Prompting

Comments

More from this blog

🔍 Introduction to RAG (Retrieval-Augmented Generation)

Building a Persona AI Chatbot ☕💻

Building a Simple Terminal-Based AI Coding Assistant

From Google to GPT: Exploring the Transformative Power of Generative AI

Command Palette

🧠 GIGO: Garbage In, Garbage Out

🍓 Strawberry (example) Problem in LLMs

Expected Answer:

What Actually Happens:

🧠 Why Does This Happen?

🔍 Fixing It with Better Prompting

🗣️ What is Prompting?

🧾 Prompting Styles

1. Alpaca Format (Used by Meta's LLaMA)

2. ChatML (Used by OpenAI)

3. INST Format

Setting Up OpenAI SDK

Chatting with LLMs: Understanding Statelessness

Managing Context with Long Conversations

Using System Prompts to Control LLM Behavior

Prompting Techniques

1. Zero-Shot Prompting

2. Few-Shot Prompting

3. Chain-of-Thought (CoT) Prompting

4. Self-Consistency Prompting

Comments

More from this blog