Workload Types Overview - Replicate Stack vs Workers AI

Workload Types in Replicate Stack

The Replicate Stack handles several distinct workload types, each with different characteristics and configurations. Understanding these distinctions is essential for comparing with Workers AI, which has a more uniform workload model.

Overview

Replicate defines four primary workload types via the DeployableKind enum (replicate/web/models/models/deployable_config.py:114):

class DeployableKind(models.TextChoices):
    DEPLOYMENT_PREDICTION = "deployment-prediction", "deployment-prediction"
    FUNCTION_PREDICTION = "function-prediction", "function-prediction"
    VERSION_PREDICTION = "version-prediction", "version-prediction"
    VERSION_TRAINING = "version-training", "version-training"

Each workload type has its own deployable_metadata_for_* function in replicate/web/models/logic.py that generates the appropriate configuration.

1. Deployment Predictions

What: Predictions running on a Deployment - a stable, long-lived identifier that routes to a backing model version. The backing version can be changed over time and doesn’t need to be owned by the same account as the deployment.

Characteristics:

Persistent deployment entity (configuration/routing), but infrastructure can scale to 0 replicas
Custom configuration per deployment that can override many version-level settings
Uses dedicated deployment key for consistent routing

Configuration: deployable_metadata_for_deployment_prediction()

Code references:

Deployment model: replicate/web/models/models/deployment.py
Kind validation: logic.py:1156 - asserts kind == DeployableKind.DEPLOYMENT_PREDICTION and not deployment.used_by_model

Queue behavior: Standard shuffle-sharded queues per deployment

2. Function Predictions (Pipelines/Procedures)

What: Predictions for multi-step workflows (Replicate Pipelines). Function predictions run on a shared container image with CPU-only hardware that “swaps” in procedure source code at prediction time. Similar to hotswaps but specifically for CPU-only Python code rather than GPU model weights.

Characteristics:

Run on CPU hardware (no GPU)
Share a base container image across procedures
Procedure source code is swapped in at runtime (analogous to weight swapping in hotswaps)
Part of a larger multi-step workflow
Uses AbstractProcedure model, not Version

Configuration: deployable_metadata_for_procedure_prediction()

Code references:

Procedure model: replicate/web/models/models/procedure.py
Kind validation: logic.py:1165 - when kind == DeployableKind.FUNCTION_PREDICTION, asserts deployment.used_by_model

Director configuration: Director runs procedures with DIRECTOR_JOB_KIND=procedure (director/config.go:37)

Queue behavior: Standard queues, no special routing

3. Version Predictions

What: Predictions running directly on a model version (not through a deployment).

Characteristics:

Ephemeral infrastructure (scaled up/down based on demand)
Configuration comes from version’s current_prediction_deployable_config
Two sub-types: normal and hotswap (see below)

Configuration: deployable_metadata_for_version_prediction()

3a. Normal Version Predictions

What: Standard version predictions without hotswapping.

Characteristics:

One container image per version
Standard queue routing
prefer_same_stream = False (default)

Code: logic.py:1214-1222

if not version.is_hotswappable:
    metadata = DeployableConfigSerializer(
        version.current_prediction_deployable_config
    ).data

3b. Hotswap Version Predictions

What: Versions that share a base container but load different weights at runtime. Multiple “hotswap versions” can run on the same pod by swapping weights instead of restarting containers.

Characteristics:

Multiple versions share the same base Docker image
Each version has additional_weights (weights loaded at runtime)
Base version must be public, non-virtual, and accept Replicate weights
prefer_same_stream = True - workers preferentially consume from the same stream to optimize weight locality
Uses base version’s deployment key for infrastructure sharing

When a version is hotswappable (version.py:586):

def is_hotswappable(self) -> bool:
    if not self.additional_weights:
        return False
    if not self.base_version:
        return False
    if self.base_docker_image_id != self.base_version.docker_image_relation_id:
        return False
    return self.base_version.is_valid_hotswap_base

Configuration: logic.py:1229-1244

deployable_config_fields = {
    ...
    "docker_image": version.base_version.docker_image_relation,
    "fuse_config": None,
    "key": version.base_version.key_for_hotswap_base_predictions,
    "prefer_same_stream": True,  # KEY DIFFERENCE
}

Queue behavior:

Shuffle-sharded queues (like all workloads)
DIRECTOR_PREFER_SAME_STREAM=true set for hotswap versions
Director workers repeatedly consume from the same stream before checking others
Optimizes for weight caching locality (reduce weight download/loading overhead)

Why prefer_same_stream matters:

Hotswap versions load different weights into the same container
If a worker has already loaded weights for version A, it’s faster to keep processing version A predictions
Without prefer_same_stream, workers round-robin across all streams, losing weight locality benefits

Code references:

Version model: replicate/web/models/models/version.py
Hotswap validation: version.py:586-595
Queue affinity: director/config.go:50
Redis implementation: director/redis/queue.go

4. Version Trainings

What: Training jobs for versions marked as trainable.

Characteristics:

Uses separate current_training_deployable_config (not prediction config)
Longer timeouts and different resource requirements
Different billing model
Director runs with DIRECTOR_JOB_KIND=training

Configuration: deployable_metadata_for_version_training()

Code references:

Training config: logic.py:1307-1308
Director job kind: director/config.go:37

Summary Table

Workload Type	Infrastructure	Config Source	`prefer_same_stream`	Director Job Kind
Deployment Prediction	Persistent	Deployment config	`False`	`prediction`
Function Prediction	Ephemeral	Procedure config	`False`	`procedure`
Version Prediction (normal)	Ephemeral	Version prediction config	`False`	`prediction`
Version Prediction (hotswap)	Shared base	Base version config + weights	`True`	`prediction`
Version Training	Ephemeral	Version training config	`False`	`training`

Key Takeaways

Replicate has explicit workload type separation with different configurations and behaviors per type
Hotswap predictions are unique - they’re the only workload using prefer_same_stream=True for weight locality optimization
Function predictions use code swapping - similar to hotswaps but for CPU-only Python code instead of GPU model weights
Each workload type has distinct characteristics - different infrastructure models, configuration sources, and queue behaviors

These workload distinctions will be referenced throughout the document when discussing queue behavior, timeouts, scaling, and other operational characteristics.

Keyboard shortcuts

Replicate Stack vs Workers AI