Multimodal Prompting
Learn to prompt AI models with text, images, audio, and video. Combine modalities for richer interactions and better results.
Multimodal Prompting
Multimodal prompting combines text with images, audio, or video to give AI models richer context. Modern models like GPT-4o, Claude 3.5, and Gemini can process multiple input types simultaneously, enabling more natural and capable interactions.
Text + Image Prompting
Image Analysis
[Attach image]
What objects are in this image? List them with their approximate positions.
Image Comparison
[Attach image 1]
[Attach image 2]
Compare these two designs. Identify:
1. Key differences
2. Which follows better UX principles
3. Specific improvements for each
Code from Screenshot
[Attach screenshot of code or UI]
Convert this to working code. Include:
- Exact layout structure
- All text content
- Styling details
Text + Audio Prompting
Transcription + Analysis
[Attach audio file]
1. Transcribe the audio
2. Identify key points discussed
3. Extract action items with owners
4. Note any decisions made
Voice Instructions
[Attach voice memo]
Based on these voice notes:
1. Create a structured outline
2. Fill in missing details where unclear
3. Suggest additional points to consider
Best Practices
Image Prompting
- Be specific about what you want analyzed
- Reference specific parts of the image when needed
- Provide context for ambiguous images
- Use high-quality, clear images
Audio Prompting
- Specify if you need verbatim or summary
- Note the language if not English
- Indicate speaker identification needs
- Mention background noise handling
Modality Combinations
| Combination | Use Cases |
|---|---|
| Text + Image | Design review, code conversion, visual Q&A |
| Text + Audio | Meeting notes, voice memos, transcription |
| Text + Video | Content analysis, tutorial creation |
| Image + Text + Audio | Comprehensive documentation |
Prompt Templates
Image Description:
Describe this image in detail, covering:
- Main subjects and their attributes
- Setting and background
- Colors, lighting, and mood
- Any text visible in the image
Visual Comparison:
Compare these two images focusing on:
1. Structural differences
2. Color and style variations
3. Quality and clarity
4. Which better achieves [stated goal]
Audio Summary:
From this audio recording:
1. Provide a 3-sentence summary
2. List key topics discussed
3. Extract direct quotes for important points
4. Identify any unresolved questions
Related Articles
Nano Banana Prompting Guide: Craft Effective Prompts
Master the art of prompting Nano Banana. Learn prompt structure, use case patterns, and techniques for professional results.
Character Design with Nano Banana: Generation Guide
Create original characters, mascots, and concept art from scratch using Nano Banana's text-to-image generation capabilities.
Mastering Artifact Creation in Midjourney: Mystical Objects, Relics & Ancient Treasures
Create stunning mystical and historical artifacts with Midjourney using advanced prompts, material techniques, and magical effects. Explore ancient relics, sacred objects, enchanted items, and legendary treasures.