AVAILABLE FOR NEW PROJECTS · INDEPENDENT SOFTWARE ENGINEER · BASED GLOBALLY ·AVAILABLE FOR NEW PROJECTS · INDEPENDENT SOFTWARE ENGINEER · BASED GLOBALLY ·AVAILABLE FOR NEW PROJECTS · INDEPENDENT SOFTWARE ENGINEER · BASED GLOBALLY ·AVAILABLE FOR NEW PROJECTS · INDEPENDENT SOFTWARE ENGINEER · BASED GLOBALLY ·AVAILABLE FOR NEW PROJECTS · INDEPENDENT SOFTWARE ENGINEER · BASED GLOBALLY ·AVAILABLE FOR NEW PROJECTS · INDEPENDENT SOFTWARE ENGINEER · BASED GLOBALLY ·

AI7 min read

Building with AI: Lessons from Integrating LLMs into Products

AI features are becoming table stakes for modern products, but integrating LLMs is full of traps that often only appear in production. These are some of the key lessons that emerge from building AI-powered features across multiple real products.

Many modern products now include some AI component — a summarisation feature, a smart search, a writing assistant, an automated classification system. The models themselves are remarkable. The engineering around them is where most teams struggle.

The prompt is product

The quality of an AI feature lives and dies on the quality of the prompt. Most teams treat prompts as an afterthought — a few lines of text thrown at the API. The strongest AI features typically have prompts that go through as many iterations as the UI. A good prompt is precise, defines the output format explicitly, handles edge cases, and tells the model exactly who it is and what it must never do.

"Prompt engineering is product design. Treat it that way."

Streaming is not optional for UX

Users will not wait 8 seconds staring at a spinner while an LLM generates a response. They will leave. Streaming the response token-by-token — so users see the output appearing in real time — transforms the perceived performance of an AI feature. It turns an 8-second wait into an engaging 8-second experience. The Vercel AI SDK makes streaming straightforward in Next.js, and it is a default choice for many teams.

Fallbacks, costs, and rate limits

Three things consistently cause trouble in production that often never show up in development: API rate limits, costs at scale, and model failures. All three are worth addressing before launch.

Always cache AI responses where the same input produces the same output
Set hard cost limits per user per day with clear user-facing feedback
Build a graceful degradation path for when the AI fails or is unavailable
Log every request and response for debugging and prompt iteration
Never expose raw API errors to end users — they reveal your stack

AI features that ship without these considerations often get pulled or throttled after a short time in production. Getting them right before launch is the difference between a feature users rely on and one that quickly becomes a liability.

Abdelrahman Abdelmoaty

Independent Software Engineer — designing, shipping, and iterating on real products. Available for new projects. Get in touch.

← Previous

A Full-Stack Architecture That Works for Most SaaS Products