Home/Building Blocks/Code Generation & Repair
TextText

Code Generation & Repair

Generate, refactor, or fix code with language models specialized for programming.

How Code Generation and Repair Works

From autocomplete to autonomous coding agents. How modern AI writes, fixes, and tests code.

1

Code Completion vs Generation

Completion predicts the next few tokens from context. Generation creates entire functions or files from a description. The same models can do both - the difference is in how you prompt them.

Context (What the model sees)
code.py
def calculate_area(radius):
    """Calculate the area of a circle."""
    return |
Possible Completions (Ranked by confidence)
Option 1
85%
3.14159 * radius ** 2
Option 2
72%
math.pi * radius * radius
Option 3
45%
radius ** 2 * 3.14
How It Works

Given partial code, predict the most likely next tokens

The Key Insight

Code models are trained on billions of lines of code. They learn patterns, idioms, and best practices. Given partial code, they predict what typically comes next - essentially pattern matching at scale.

2

Bug Detection and Repair

Models identify bugs by recognizing patterns that deviate from correct code. They have seen millions of bug fixes during training and can apply that knowledge to new code.

Buggy Code
def get_last_n_elements(arr, n):
    """Return the last n elements of an array."""
    return arr[len(arr) - n - 1:]
Fixed Code
def get_last_n_elements(arr, n):
    """Return the last n elements of an array."""
    return arr[len(arr) - n:]  # Removed the -1
What the model detected:

Classic off-by-one: -n-1 starts one position too early

Logic Errors
Off-by-one, wrong operators, incorrect conditions
Type Errors
Wrong types, missing conversions, null references
Security Issues
SQL injection, XSS, improper input validation
Performance
N+1 queries, unnecessary loops, memory leaks
3

Test Generation

Given source code, models generate comprehensive test suites. They identify edge cases, error conditions, and integration points that need testing.

Source Code
module.py
def calculate_discount(price, percentage):
    """Apply a percentage discount to a price."""
    if percentage < 0 or percentage > 100:
        raise ValueError("Percentage must be 0-100")
    return price * (1 - percentage / 100)
Generated Tests
test_module.py
import pytest

def test_calculate_discount_basic():
    assert calculate_discount(100, 10) == 90

def test_calculate_discount_zero():
    assert calculate_discount(100, 0) == 100

def test_calculate_discount_full():
    assert calculate_discount(100, 100) == 0

def test_calculate_discount_invalid_negative():
    with pytest.raises(ValueError):
        calculate_discount(100, -10)

def test_calculate_discount_invalid_over_100():
    with pytest.raises(ValueError):
        calculate_discount(100, 150)
Test Strategy

Generated tests cover: happy path, edge cases, and error conditions

Happy Path

Normal inputs that should work correctly. The basic functionality.

Edge Cases

Boundary values: zero, empty, max values, unicode, special characters.

Error Handling

Invalid inputs, exceptions, timeouts, resource exhaustion.

4

The Code Generation Pipeline

Production code generation is not just one model call. It is a pipeline: generate, validate, test, and refine until the code works.

->
->
->
->
->
P

Stage 1: Prompt

Natural language or code context

The prompt includes:

Task Description
"Write a function that sorts a list of users by age"
Context
Existing code, types, imports, related functions
Constraints
Language, style guide, performance requirements
5

Code Generation Models

From API-based giants to self-hosted specialists. Each model trades off capability, cost, and control.

When to Use What

API Models (GPT-4, Claude)

Best for: Complex reasoning, large refactors, architecture decisions. When accuracy matters more than cost.

Self-Hosted (CodeLlama, DeepSeek)

Best for: Privacy requirements, high volume, custom fine-tuning. When you need control.

Small Models (StarCoder2, Codestral)

Best for: IDE integration, real-time autocomplete. When latency matters most.

6

Implementation Examples

Working code for OpenAI, local models with Ollama, and self-hosted with HuggingFace.

openai_example.pypip install openai
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are an expert Python developer. Generate clean, well-documented code."
        },
        {
            "role": "user",
            "content": """Write a function that:
1. Takes a list of URLs
2. Fetches them concurrently with aiohttp
3. Returns a dict mapping URL to response status"""
        }
    ],
    temperature=0.2,  # Lower = more deterministic
    max_tokens=1000
)

code = response.choices[0].message.content
print(code)

The Complete Picture

Prompt
->
Generate
->
Parse + Validate
->
Test
->
Fix (if needed)
->
Working Code

Code generation is not magic - it is sophisticated pattern matching trained on billions of lines of code. The key to reliable results is treating the model as a junior developer: provide clear context, verify the output, and iterate on failures.

For Autocomplete
Use fast, small models (StarCoder2, Codestral) with low latency.
For Complex Tasks
Use capable models (GPT-4, Claude) with iterative refinement.
For Production
Always include: parsing, syntax checks, tests, and human review.

Use Cases

  • Autocomplete
  • Bug fixing
  • Migration/modernization
  • Security patching

Architectural Patterns

Fill-in-the-Middle

Bi-directional context for code completion.

Retrieval-Augmented Code

Pull repo context before generation.

Static-Analysis Guided

Use linters/vuln scanners to steer generation.

Implementations

API Services

GPT-4o (Code)

OpenAI
API

Strong multi-language code model.

Open Source

DeepSeek-Coder-V2

MIT
Open Source

Top open-source coder with FIM.

CodeLlama 70B Instruct

Llama 2 Community
Open Source

Solid self-hosted option.

Benchmarks

Quick Facts

Input
Text
Output
Text
Implementations
2 open source, 1 API
Patterns
3 approaches

Have benchmark data?

Help us track the state of the art for code generation & repair.

Submit Results