Codesota · Building blocksThe composable operations of a pipelineIssue: April 22, 2026
Editorial · Building blocks

The composable
operations of AI.

Every AI pipeline reduces to a sequence of typed transformations: something goes in, something else comes out. This index catalogues those transformations by input modality, lists the implementations that currently matter, and links each block to its registry page.

Start from what you have — image, text, audio, video, a document — and follow the arrows to what you need.

§ From text

Text, in and out.

§ From image

Image, in and out.

§ From audio

Audio, in and out.

§ From video

Video, in and out.

§ From document

Document, in and out.

§ Common pipelines

Blocks, composed.

Frequently-assembled chains. Each is a reading of several blocks as one operation; the individual blocks remain linked for substitution.

  • Direct Visual Search

    Embed images directly with CLIP/SigLIP, search by text or image query.

    Image to Vector(Image Vector)Text to Vector(Text Vector)
    Good for
    • Photo library search
    • E-commerce visual search
    Strengths
    • Real-time indexing
    • Text-to-image search
    • Simple pipeline
    Trade-offs
    • May miss fine details
    • Abstract concepts can be weak
  • Caption + RAG Visual Search

    Generate captions for images, embed captions, search via text RAG.

    Image to Text(Image Text)Text to Vector(Text Vector)
    Good for
    • Detailed scene search
    • Accessibility-first apps
    Strengths
    • Human-readable index
    • Can describe complex scenes
    • Debuggable
    Trade-offs
    • Slower indexing
    • Caption quality limits retrieval
    • Higher cost
  • Document RAG Pipeline

    Extract text from documents, chunk, embed, retrieve, generate with LLM.

    Document to Structured(Document Structured Data)Text to Vector(Text Vector)Text to Text(Text Text)
    Good for
    • Enterprise search
    • Legal document QA
    • Knowledge base
    Strengths
    • Grounds LLM in your data
    • Citable sources
    Trade-offs
    • Chunking strategy matters
    • Multi-step latency
  • Voice Assistant Pipeline

    Speech-to-text, process with LLM, text-to-speech response.

    Audio to Text(Audio Text)Text to Text(Text Text)Text to Audio(Text Audio)
    Good for
    • Voice assistants
    • Call center bots
    • Accessibility
    Strengths
    • Natural interaction
    • Hands-free
    Trade-offs
    • Latency stacks up
    • Error propagation
Related · Further reading

Where each block lands.

All routes verified live · April 2026