Soft Prompting: Trainable Embeddings as Prompts
Replace text prompts with learned continuous vectors. Understand prompt tuning, prefix tuning, and p-tuning for open models where fine-tuning is impractical.
Hard Prompts vs Soft Prompts
| Hard Prompt (Text) | Soft Prompt (Embeddings) | |
|---|---|---|
| What it is | Human-written text instructions | Learned continuous vectors |
| How it's created | Written and iterated by humans | Trained via backpropagation |
| Interpretable? | Yes — you can read it | No — vector salad |
| Requires training? | No | Yes — needs labeled data |
| Model modification? | No — inference only | No — model weights frozen |
| Provider support | All providers | Open models only (Llama, Mistral) |
Soft prompting (Lester et al. 2021) trades interpretability for efficiency: train a tiny set of prompt embeddings while keeping the model frozen. Useful when you have labeled data but fine-tuning a billion-parameter model is impractical.
How It Works
Instead of prepending text, prepend learnable embedding vectors:
Hard prompt:
"Classify the sentiment: [input text]"
Soft prompt:
[vect_1] [vect_2] [vect_3] ... [vect_N] [input embedding]
↑
Trained via backprop, model frozen
During training, only the prompt vectors are updated. The model processes prompt_vectors + input_embedding and the loss backpropagates through the frozen model to update only the prompt vectors.
Variants
| Method | What It Tunes | Where It Goes | Key Paper |
|---|---|---|---|
| Prompt Tuning | Input embedding layer only | Prepended to input | Lester et al. 2021 |
| Prefix Tuning | Activations at every transformer layer | Prepended to keys/values at each layer | Li & Liang 2021 |
| P-Tuning v2 | Deep prompt tokens at every layer | Continuous prompts throughout model depth | Liu et al. 2022 |
| LoRA | Low-rank adapter matrices (not technically soft prompting) | Injected into attention layers | Hu et al. 2022 |
Parameter Efficiency
A soft prompt is tiny compared to the model:
| Component | Parameters (Llama 2 7B) |
|---|---|
| Full model | 7 billion |
| Full fine-tuning | 7 billion (all updated) |
| LoRA | ~8 million |
| Soft prompt (100 tokens) | ~409,600 |
| Soft prompt (20 tokens) | ~81,920 |
You can train a soft prompt on a single GPU in minutes, vs days for full fine-tuning.
When Soft Prompting Makes Sense
Use soft prompting when:
- You have labeled task data (100-1000+ examples)
- You need to run the same task repeatedly (classification, extraction at scale)
- You're using open-source models (Llama, Mistral, Qwen) where you control inference
- You want to avoid modifying model weights (safer than fine-tuning for overwriting capabilities)
Don't use soft prompting when:
- You're using OpenAI, Anthropic, or Google APIs (they don't expose embedding injection)
- You have no training data (soft prompts must be trained)
- The task changes frequently (retraining overhead defeats the purpose)
- You need interpretable prompts (soft prompts are opaque vectors)
Implementation with HuggingFace PEFT
from peft import PromptTuningConfig, TaskType, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
# Define soft prompt config
peft_config = PromptTuningConfig(
task_type=TaskType.CAUSAL_LM,
num_virtual_tokens=20, # 20 learnable prompt tokens
prompt_tuning_init="TEXT", # Initialize from text
prompt_tuning_init_text="Classify the sentiment of this review:",
tokenizer_name_or_path="meta-llama/Llama-2-7b-hf",
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# trainable params: 81,920 || all params: 6,738,841,600 || trainable%: 0.0012
# Train normally
from transformers import Trainer, TrainingArguments
trainer = Trainer(
model=model,
args=TrainingArguments(output_dir="./soft-prompt", num_train_epochs=10),
train_dataset=dataset,
)
trainer.train()
# Save just the soft prompt — tiny file
model.save_pretrained("./my-soft-prompt")
Loading and Using a Trained Soft Prompt
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base_model, "./my-soft-prompt")
# Inference — the soft prompt is automatically prepended
inputs = tokenizer("This product exceeded my expectations.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0])) # Should output sentiment classification
Limitations
No API support. OpenAI, Anthropic, and Google do not expose model internals for embedding injection. Soft prompting is only viable with self-hosted open models.
Not interpretable. You can't read a soft prompt to understand what it learned. The tradeoff for efficiency is opacity.
Task-specific. Each soft prompt is trained for one task. Changing tasks means training a new one. You can swap prompt files quickly, but you can't generalize across tasks.
Requires training data. Soft prompts need labeled examples. If you have zero training data, stick with hard prompt engineering.
Training instability. Small prompt sizes can be sensitive to initialization and hyperparameters. Start with prompt_tuning_init="TEXT" for stable initialization.
Related Articles
Midjourney Environment Prompts: Master Creation & Design
Learn to create stunning environments in Midjourney with this comprehensive guide. Explore techniques for natural landscapes, urban settings, and interior spaces using effective prompts.
Marketing Strategy & Analytics Prompts for ChatGPT
ChatGPT prompt templates for marketing strategy, campaign planning, ad copy, analytics, brand positioning, and growth experiments.
Character Creation Prompts for ChatGPT
Master character development with ChatGPT prompts. Create compelling, multi-dimensional characters with distinct personalities, backgrounds, and authentic voices.