Last month we generated 247 brand-consistent visuals across 15 client accounts. Total human design time: zero. No Figma. No Canva. No freelance designers in a Slack channel asking for feedback on revision 14. Just a Python script, the Gemini API, and a system we have been refining for months.
When I tell other agency owners this, they assume the output is garbage — generic AI art that makes every brand look the same. It is not. Every visual matches the specific client's color palette, typography style, and aesthetic identity. The visuals carry information, not decoration. They stop the scroll because they teach something, not because they are pretty.
This article is the complete system. The API setup, the style preamble architecture, the production pipeline with real Python code from our scripts, the visual types that perform on LinkedIn, the failure modes we have hit and how we fixed them, and an LLM prompt you can copy-paste to build your own visual system today. Everything we run in production, published in full.
Why AI Image Generation Changes B2B Marketing
The economics of visual content production have been broken for years. A skilled freelance designer charges $50 to $150 per custom visual, depending on complexity. Turnaround is 2 to 5 business days. Revisions add another 1 to 3 days per round. If you need 20 visuals per month for a single LinkedIn content program, you are looking at $1,000 to $3,000 per month in design costs and a constant back-and-forth that slows your entire content pipeline to the speed of your slowest designer.
AI image generation changes every variable in that equation. The cost per image with Gemini's API is effectively $0.01. Generation time is under 30 seconds. Brand consistency is not a function of whether your designer remembers the hex code for your accent color — it is baked into the prompt template that runs every single time. There are no revision cycles because regeneration is instant and free.
The gap is not in the technology. The gap is in knowing how to prompt for B2B. Most founders who try AI image generation end up with consumer-grade art — dreamy landscapes, abstract gradients, photorealistic people shaking hands in front of a whiteboard. That is not what stops the scroll on LinkedIn. What stops the scroll is information-dense, brand-specific visuals that teach something in under 3 seconds. Data comparisons. Framework diagrams. Process flows. Quote cards with a specific point of view. That requires a system, not a prompt.
Our data across 15 clients: 200+ visuals per month, 3 distinct brand systems running simultaneously, zero designer on staff. The visuals consistently outperform stock photography by 2 to 3x on LinkedIn engagement. The system pays for itself in the first week.
The Stack: Gemini 3 Pro Image Preview
We evaluated every major AI image generation model before settling on Gemini. Midjourney produces beautiful consumer art but has no API, no reference image support for brand consistency, and no reliable text rendering. DALL-E 3 has an API but struggles with consistent style adherence across batches and renders text poorly. Stable Diffusion requires GPU infrastructure and fine-tuning per client, which does not scale to 15 accounts.
Gemini 3 Pro Image Preview solves the three problems that matter for B2B visual production. First, native text rendering — it can render headlines, labels, and data points directly in the image with reasonable accuracy. Second, reference image support — you can pass a client's logo as an input image and Gemini will reproduce it in the output, which means brand consistency without Photoshop overlays. Third, consistent style adherence — when you inject a detailed style description into every prompt, Gemini maintains that aesthetic across hundreds of generations with minimal drift.
The model identifier is gemini-3-pro-image-preview. This is critical. There is a common mistake where developers use gemini-2.0-flash-exp thinking it supports image generation. It does not. That is a text model. The image generation model is specifically gemini-3-pro-image-preview, and you must set response_modalities=["IMAGE"] in the configuration.
Here is the core API call pattern from our production scripts:
from google import genai
from google.genai import types
client = genai.Client(api_key=GEMINI_API_KEY)
MODEL = "gemini-3-pro-image-preview"
# Load client logo as reference image
logo_bytes = Path("client-logo.png").read_bytes()
# Build the request with logo + prompt
contents = [
types.Part.from_bytes(data=logo_bytes, mime_type="image/png"),
types.Part.from_text(text=(
"The image above is the client logo. Reproduce it "
"faithfully in the bottom-right corner, small (~120px). "
"Now generate this visual:\n\n"
f"{STYLE_PREAMBLE}\n\n{visual_prompt}"
)),
]
response = client.models.generate_content(
model=MODEL,
contents=contents,
config=types.GenerateContentConfig(
response_modalities=["IMAGE"]
),
)
# Extract and save image bytes directly (no PIL)
for part in response.candidates[0].content.parts:
if hasattr(part, "inline_data") and part.inline_data:
img_data = part.inline_data.data
Path("output.png").write_bytes(img_data)
Two things to note in that code. First, the logo is passed as a Part.from_bytes with its MIME type, not described in text. Describing a logo in words produces inconsistent results. Passing the actual image file produces faithful reproduction. Second, the output is saved as raw bytes directly to disk. We never use PIL (Pillow) for any post-processing. Every time we have tried post-processing — resizing, overlaying, compositing — it introduced artifacts or broke the visual. Save the raw bytes. Trust the model output.
The Style Preamble System
The style preamble is the single most important component in the entire system. It is a 150 to 250 word description that gets injected at the top of every visual generation prompt for a given client. It locks the visual identity so that every image Gemini produces looks like it came from the same designer, whether it is the first visual or the 200th.
Without a preamble, Gemini defaults to a generic aesthetic. It picks colors arbitrarily. It chooses typography that feels "AI-generated." The output looks like every other AI image on LinkedIn. With a strong preamble, Gemini produces output that is recognizably yours — distinctive enough that your audience starts to associate the visual style with your brand before they even read the caption.
Here is the actual style preamble we use for alphavant's own visuals:
ABSOLUTE STYLE RULES (apply to EVERY visual): - Background: clean white (#FFFFFF) - Text color: black (#000000) for headlines, dark gray (#444444) for body - Accent color: coral (#FF6B6B) — used ONLY for highlights, underlines, numbered circles, key words - Typography: modern geometric sans-serif (Clash Display style), MEDIUM weight (never heavy/bold) - Aesthetic: ultra-clean, premium, airy — Linear / Vercel / Stripe vibes - NO icons, NO clipart, NO emoji, NO stock imagery, NO gradients, NO shadows - Generous whitespace — let the design breathe - Hand-drawn style elements: sketchy arrows, organic underlines, scribble annotations where noted - Fine lines (1-2px), delicate strokes for diagrams - The alphavant logo (provided as reference image) should appear small in the bottom-center of each slide
Every effective preamble contains 7 elements. Miss any one of them and the output drifts.
- Background. Specify the exact hex code. "White" is ambiguous.
#FFFFFFis not. If you use dark backgrounds, specify both light and dark variants and when each applies. - Text colors. Primary color for headlines, secondary color for body text. Always hex codes. If you have text on dark backgrounds, specify the light-on-dark variants too.
- Accent color. One color, maximum two. This is the color that makes your visuals recognizable. Ours is coral
#FF6B6B. Specify exactly where it should appear: highlights, underlines, numbered circles, key phrases. If you do not constrain it, Gemini will overuse it. - Typography style. Gemini cannot load custom fonts, so describe the style: "modern geometric sans-serif" or "clean humanist sans-serif" or "condensed technical monospace." Include weight preference. Our preamble specifies "MEDIUM weight (never heavy/bold)" because Gemini defaults to heavy weights that feel aggressive.
- Aesthetic reference. Name 2 to 3 brands whose visual style resembles your target. "Linear / Vercel / Stripe vibes" gives Gemini a concrete reference point that is more useful than abstract descriptions like "modern and clean." Your audience does not need to know these references — Gemini does.
- What to avoid (negative constraints). This is as important as the positive rules. "NO icons, NO clipart, NO emoji, NO stock imagery, NO gradients, NO shadows" prevents the most common AI generation failure modes. Without explicit negatives, Gemini will add decorative elements that make the output look generic.
- Logo placement. Where, how big, which version. "Small in the bottom-center" or "bottom-right corner, roughly 5-8% of image width." If you have light and dark logo variants, specify which one to use on which background.
Creating a Preamble for a New Client
The process takes 1 to 2 hours. We analyze the client's existing brand assets — their website, pitch deck, LinkedIn posts, any brand guidelines they have. We extract the rules into the 7-element framework above. Then we test the preamble by generating 5 sample visuals and evaluating whether they "feel" like the client's brand. We refine based on those test outputs. After the first 2 to 3 weeks of production, the preamble stabilizes and rarely needs updates.
The critical insight: every preamble we create draws from patterns we have already validated across other clients. A robotics company that wants a "technical, precise" aesthetic gets a preamble derived from the template we refined for 3 previous hardware companies. A SaaS founder who wants "minimal, modern" gets a variant of the template we validated across 5 software clients. The pattern library compounds. Client 15 takes a fraction of the time that client 1 took because the preamble templates are already proven.
The Production Pipeline
A style preamble gives you consistency. A production pipeline gives you scale. Here is the exact sequence we run to go from content brief to finished visuals, with the real code structure from our scripts.
Step 1: Brief to Structured Data
Every visual starts as a structured definition — a JSON-like dictionary with the image name, the full prompt (preamble plus content-specific instructions), and the output dimensions. We define these in a Python list:
VISUALS = [
{
"name": "data-comparison-deploy-time",
"prompt": f"""
{STYLE_PREAMBLE}
Create a clean data comparison infographic.
Two columns with a thin vertical separator:
- Left: "Traditional Deployment" — "6 months" in large text
- Right: "With Our System" — "3 weeks" in coral accent
- Bottom summary: "12x faster time to production"
Aspect ratio: 4:5 (1080x1350px)
"""
},
{
"name": "framework-evaluation-criteria",
"prompt": f"""
{STYLE_PREAMBLE}
Create a framework diagram: "The 4-Gate Evaluation Framework"
2x2 matrix layout:
- Top-left: "Technical Fit" — "Does it solve the core problem?"
- Top-right: "Integration Cost" — "What does adoption actually cost?"
- Bottom-left: "Team Readiness" — "Can your team operate it?"
- Bottom-right: "Timeline Risk" — "Will it ship on schedule?"
Aspect ratio: 4:5 (1080x1350px)
"""
},
# ... more visual definitions
]
The structured format matters. It means the content team can define visual needs without writing prompts. They fill in a brief (what the visual should show, what data it contains, what type it is), and the pipeline handles the prompt engineering by injecting the client's preamble and applying the appropriate template structure.
Step 2: Parallel Generation with ThreadPoolExecutor
Generating visuals sequentially is slow. A single Gemini image generation call takes 15 to 30 seconds. If you need 25 visuals across 5 blog posts, sequential generation takes 6 to 12 minutes. Parallel generation with 3 workers cuts that to 2 to 4 minutes.
from concurrent.futures import ThreadPoolExecutor, as_completed
def generate_visual(visual):
"""Generate a single visual with retry logic."""
name = visual["name"]
prompt = visual["prompt"]
output_path = OUTPUT_DIR / f"{name}.png"
for attempt in range(3):
try:
response = client.models.generate_content(
model=MODEL,
contents=[
types.Content(
parts=[
types.Part.from_bytes(
data=logo_bytes,
mime_type="image/png"
),
types.Part.from_text(text=prompt),
]
)
],
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
),
)
for part in response.candidates[0].content.parts:
if (part.inline_data and
part.inline_data.mime_type.startswith("image/")):
output_path.write_bytes(part.inline_data.data)
return True, name
except Exception as e:
time.sleep(5 * (attempt + 1))
return False, name
# Run 3 workers in parallel
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(generate_visual, v): v["name"]
for v in VISUALS
}
for future in as_completed(futures):
ok, name = future.result()
status = "OK" if ok else "FAIL"
print(f" {status}: {name}")
Three workers is the sweet spot. More than three and you start hitting Gemini's rate limits, which causes retries and actually slows the pipeline down. Fewer than three and you are leaving speed on the table.
Step 3: Logo as Reference Image
Every generation call includes the client's logo as the first content part. This is not optional. Without the reference image, Gemini will either omit the logo entirely or hallucinate something that vaguely resembles a logo but is not the client's actual mark. With the reference image, Gemini reproduces the logo faithfully in the specified location.
The instruction is explicit: "The image above is the client logo. Reproduce it faithfully in the bottom-right corner, small (~120px)." We found that vague instructions like "include the logo" produce inconsistent results. Specific size and placement instructions produce consistent results.
Step 4: Raw Bytes to Disk
The output handling is deliberately simple. We extract the image bytes from the API response and write them directly to a PNG file. No PIL. No post-processing. No resizing. No compositing.
We learned this the hard way. Early in our system development, we used PIL to overlay logos, adjust colors, and resize outputs. Every post-processing step introduced artifacts — color shifts from RGB/RGBA conversion, compression artifacts from re-encoding, aliasing from resizing. The visual quality degraded with every transformation. The solution was to eliminate all post-processing and let Gemini handle everything in a single generation pass. The output is the final asset.
Step 5: Retry Logic and Safety Filter Handling
Gemini's image generation is not deterministic. The same prompt can produce a perfect visual on the first attempt and fail on the second. Safety filters occasionally block prompts that contain words like "crash," "failure," or "risk" even in a B2B context. Our retry logic handles both cases:
- 3 attempts per image. If the first generation returns no image data (which happens roughly 5% of the time), we wait 5 seconds and try again. If the second fails, we wait 10 seconds. If all three fail, we log it and move on.
- Exponential backoff. The
time.sleep(5 * (attempt + 1))pattern prevents us from hammering the API when it is rate-limiting or experiencing transient issues. - Safety filter handling. If a prompt gets blocked, we do not retry the same prompt. We adjust the language — replacing "crash" with "incident," "failure" with "challenge," "risk" with "consideration" — and resubmit. This happens manually because automated prompt rewriting can change the meaning.
Across our production runs, the first-attempt success rate is approximately 85%. With retry logic, the overall success rate is 97%. The remaining 3% are typically safety filter blocks that require prompt adjustment.