Intro to Prompt Engineering and Chatting with LLMs
When working with large language models (LLMs), your input matters just as much as the model itself. This post introduces the core ideas behind prompting, interacting with LLMs, and optimizing conversations effectively.
🧠 GIGO: Garbage In, Garbage Out
A fundamental principle to remember — the quality of output from an LLM depends on the quality of the input you provide.
Poor prompts = poor responses. Simple as that.
🍓 Strawberry (example) Problem in LLMs
Model used: GPT-4.1 Mini
Let's revisit a seemingly simple prompt:
Prompt:
How many ‘r’ letters are there in the word "strawberry"?

Expected Answer:
There are 3 'r' letters in "strawberry".
What Actually Happens:
Some LLMs — especially smaller or improperly prompted ones — may incorrectly respond with 2, 4, or other numbers.
🧠 Why Does This Happen?
This illustrates a classic limitation in LLMs when handling token-based vs character-based reasoning.
Here’s why errors can occur:
LLMs don’t "see" letters the same way humans do. They work on tokens, which are often chunks of words or characters — not single letters.
For smaller or less capable models, basic counting tasks aren’t trivial unless the prompt is extremely clear or structured.
🔍 Fixing It with Better Prompting
Try prompting like this instead:
“Count how many times the letter ‘r’ appears in the word ‘strawberry’. Show your steps.”
This encourages the model to use step-by-step reasoning, e.g.:
“The word 'strawberry' has the letters: s, t, r, a, w, b, e, r, r, y.
I can see the letter 'r' appears 3 times.”

This small example — counting the 'r's in "strawberry" — highlights how Prompt Engineering plays a crucial role in accuracy and trust when working with LLMs.
🗣️ What is Prompting?
Prompting is how you interact with an LLM. It defines what kind of output you get, and how reliably.
Different models may require different prompting styles. Here's a breakdown of commonly used ones.
🧾 Prompting Styles
1. Alpaca Format (Used by Meta's LLaMA)
This follows an instruction-input-response format:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
ref: https://github.com/tatsu-lab/stanford_alpaca
2. ChatML (Used by OpenAI)
This is a message-based format used by OpenAI's chat models:
[
{"role": "system", "content": "You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible."},
{"role": "user", "content": "How are you?"},
{"role": "assistant", "content": "I am doing well."}
]
ChatML is currently considered the industry standard for interacting with LLMs.
3. INST Format
Used in some instruction-tuned models:
[INST] What is the capital of France? [/INST]
Setting Up OpenAI SDK
To interact with OpenAI's models via code, follow these steps:
Install the SDK:
pip install openai
Secure Your API Key:
Store your key in a .env file like this:
OPENAI_API_KEY=<your_api_key>
from dotenv import load_dotenv
load_dotenv()
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model = "gpt-4.1-mini",
messages = [
{"role":"user","content": "Hey, My name is Vivek"},
{"role":"assistant","content": "Hey Vivek! How can I help you today?"},
]
)
print(response.choices[0].message.content)
Chatting with LLMs: Understanding Statelessness
LLMs are stateless — they don't remember previous messages unless you include them again in the next prompt.
Managing Context with Long Conversations
Context Window: For example, GPT-4.1 Mini supports up to 1 million tokens.
But sending all 400 past messages to maintain context would be inefficient.
A common technique is the Sliding Window:
Retain only the last 100 messages for context.
Summarize the previous 300 messages and include them as a single message.
Total: 101 messages, optimizing both performance and memory usage.
Using System Prompts to Control LLM Behavior
If you want to control the personality, tone, or behavior of an LLM, use a System Prompt.
Example use cases include:
Making the assistant formal/informal
Instructing it to behave like a tutor, programmer, or creative writer
Enforcing constraints on the type of responses it generates
Apps like v0.dev, Vercel AI SDK, and Cursor rely heavily on system prompts.
👉 Example prompt used in v0.dev’s open-source system prompt
Prompting Techniques
1. Zero-Shot Prompting
The model is given a task without any examples.
Often paired with a system prompt to ensure consistent behavior.
Example:
“Translate the following sentence to French: 'I love programming.'”
2. Few-Shot Prompting
A few examples are provided before the actual task.
Helps guide the model toward the desired format or style.
3. Chain-of-Thought (CoT) Prompting
Encourages the model to "think aloud" by breaking down its reasoning process.
Useful for math problems, logic, and multi-step reasoning.
4. Self-Consistency Prompting
The same question is posed to the LLM multiple times.
Outputs are compared to find the most consistent or common response.
This can improve the accuracy of complex queries.
