Home/Building Blocks/Code Generation & Repair

Text→Text

Code Generation & Repair

Generate, refactor, or fix code with language models specialized for programming.

How Code Generation and Repair Works

From autocomplete to autonomous coding agents. How modern AI writes, fixes, and tests code.

1. Completion vs Generation 2. Bug Fixing 3. Test Generation 4. The Pipeline 5. Models 6. Code Examples

Code Completion vs Generation

Completion predicts the next few tokens from context. Generation creates entire functions or files from a description. The same models can do both - the difference is in how you prompt them.

Context (What the model sees)

code.py

def calculate_area(radius):
    """Calculate the area of a circle."""
    return |

Possible Completions (Ranked by confidence)

Option 1

85%

3.14159 * radius ** 2

Option 2

72%

math.pi * radius * radius

Option 3

45%

radius ** 2 * 3.14

How It Works

Given partial code, predict the most likely next tokens

The Key Insight

Code models are trained on billions of lines of code. They learn patterns, idioms, and best practices. Given partial code, they predict what typically comes next - essentially pattern matching at scale.

Bug Detection and Repair

Models identify bugs by recognizing patterns that deviate from correct code. They have seen millions of bug fixes during training and can apply that knowledge to new code.

Buggy Code

def get_last_n_elements(arr, n):
    """Return the last n elements of an array."""
    return arr[len(arr) - n - 1:]

Fixed Code

def get_last_n_elements(arr, n):
    """Return the last n elements of an array."""
    return arr[len(arr) - n:]  # Removed the -1

What the model detected:

Classic off-by-one: -n-1 starts one position too early

Logic Errors

Off-by-one, wrong operators, incorrect conditions

Type Errors

Wrong types, missing conversions, null references

Security Issues

SQL injection, XSS, improper input validation

Performance

N+1 queries, unnecessary loops, memory leaks

Test Generation

Given source code, models generate comprehensive test suites. They identify edge cases, error conditions, and integration points that need testing.

Source Code

module.py

def calculate_discount(price, percentage):
    """Apply a percentage discount to a price."""
    if percentage < 0 or percentage > 100:
        raise ValueError("Percentage must be 0-100")
    return price * (1 - percentage / 100)

Generated Tests

test_module.py

import pytest

def test_calculate_discount_basic():
    assert calculate_discount(100, 10) == 90

def test_calculate_discount_zero():
    assert calculate_discount(100, 0) == 100

def test_calculate_discount_full():
    assert calculate_discount(100, 100) == 0

def test_calculate_discount_invalid_negative():
    with pytest.raises(ValueError):
        calculate_discount(100, -10)

def test_calculate_discount_invalid_over_100():
    with pytest.raises(ValueError):
        calculate_discount(100, 150)

Test Strategy

Generated tests cover: happy path, edge cases, and error conditions

Happy Path

Normal inputs that should work correctly. The basic functionality.

Edge Cases

Boundary values: zero, empty, max values, unicode, special characters.

Error Handling

Invalid inputs, exceptions, timeouts, resource exhaustion.

The Code Generation Pipeline

Production code generation is not just one model call. It is a pipeline: generate, validate, test, and refine until the code works.

Stage 1: Prompt

Natural language or code context

The prompt includes:

Task Description

"Write a function that sorts a list of users by age"

Context

Existing code, types, imports, related functions

Constraints

Language, style guide, performance requirements

Code Generation Models

From API-based giants to self-hosted specialists. Each model trades off capability, cost, and control.

When to Use What

API Models (GPT-4, Claude)

Best for: Complex reasoning, large refactors, architecture decisions. When accuracy matters more than cost.

Self-Hosted (CodeLlama, DeepSeek)

Best for: Privacy requirements, high volume, custom fine-tuning. When you need control.

Small Models (StarCoder2, Codestral)

Best for: IDE integration, real-time autocomplete. When latency matters most.

Implementation Examples

Working code for OpenAI, local models with Ollama, and self-hosted with HuggingFace.

openai_example.pypip install openai

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are an expert Python developer. Generate clean, well-documented code."
        },
        {
            "role": "user",
            "content": """Write a function that:
1. Takes a list of URLs
2. Fetches them concurrently with aiohttp
3. Returns a dict mapping URL to response status"""
        }
    ],
    temperature=0.2,  # Lower = more deterministic
    max_tokens=1000
)

code = response.choices[0].message.content
print(code)

The Complete Picture

Prompt

Generate

Parse + Validate

Test

Fix (if needed)

Working Code

Code generation is not magic - it is sophisticated pattern matching trained on billions of lines of code. The key to reliable results is treating the model as a junior developer: provide clear context, verify the output, and iterate on failures.

For Autocomplete

Use fast, small models (StarCoder2, Codestral) with low latency.

For Complex Tasks

Use capable models (GPT-4, Claude) with iterative refinement.

For Production

Always include: parsing, syntax checks, tests, and human review.