Serverless GPU Infrastructure

Run or deploy code at scale

Modal provides instant serverless containers with attached GPUs. Simple Python code is all it takes to scale workflows from 0 to 1000s of workers in milliseconds.

$ pip install modal
import modal

app = modal.App("gpu-job")
image = modal.Image.debian_slim().pip_install("torch")

@app.function(image=image, gpu="A10G")
def generate_text(prompt):
    # Container boots and runs with GPU acceleration
    return run_model(prompt)
import modal

app = modal.App("nightly-sync")

@app.function(schedule=modal.Period(days=1))
def daily_cleanup():
    # Runs automatically every 24 hours
    clean_database()
import modal

app = modal.App("simple-web-endpoint")

@app.function()
@modal.web_endpoint(method="POST")
def handler(item: dict):
    # Exposed as a public serverless API endpoint
    return {"status": "processed", "id": item["id"]}
SIMULATED TERMINAL
Click "Deploy To Modal" to simulate container build and deployment.

Cold Starts in Milliseconds

Containers boot in less than 200ms. Modal scales up instantly when requests arrive and scales down to 0, saving infrastructure cost.

Instant GPU Access

Access high-performance GPUs (H100s, A100s, A10Gs) with a simple python function decorator. No contract, pay by the second.

Code as Configuration

No Dockerfiles, YAML files, or complex build pipes. Your Python imports define the environment directly inside the code.

Accelerate Your Engineering

Whether you have custom enterprise deployment pipelines or high scale AI inference needs, get in touch to optimize your serverless cloud.

support@modal.com