Workload Types in Replicate Stack
The Replicate Stack handles several distinct workload types, each with different characteristics and configurations. Understanding these distinctions is essential for comparing with Workers AI, which has a more uniform workload model.
Overview
Replicate defines four primary workload types via the DeployableKind enum
(replicate/web/models/models/deployable_config.py:114):
class DeployableKind(models.TextChoices):
DEPLOYMENT_PREDICTION = "deployment-prediction", "deployment-prediction"
FUNCTION_PREDICTION = "function-prediction", "function-prediction"
VERSION_PREDICTION = "version-prediction", "version-prediction"
VERSION_TRAINING = "version-training", "version-training"
Each workload type has its own deployable_metadata_for_* function in
replicate/web/models/logic.py
that generates the appropriate configuration.
1. Deployment Predictions
What: Predictions running on a Deployment - a stable, long-lived identifier that routes to a backing model version. The backing version can be changed over time and doesn’t need to be owned by the same account as the deployment.
Characteristics:
- Persistent deployment entity (configuration/routing), but infrastructure can scale to 0 replicas
- Custom configuration per deployment that can override many version-level settings
- Uses dedicated deployment key for consistent routing
Configuration:
deployable_metadata_for_deployment_prediction()
Code references:
- Deployment model:
replicate/web/models/models/deployment.py - Kind validation:
logic.py:1156- assertskind == DeployableKind.DEPLOYMENT_PREDICTIONandnot deployment.used_by_model
Queue behavior: Standard shuffle-sharded queues per deployment
2. Function Predictions (Pipelines/Procedures)
What: Predictions for multi-step workflows (Replicate Pipelines). Function predictions run on a shared container image with CPU-only hardware that “swaps” in procedure source code at prediction time. Similar to hotswaps but specifically for CPU-only Python code rather than GPU model weights.
Characteristics:
- Run on CPU hardware (no GPU)
- Share a base container image across procedures
- Procedure source code is swapped in at runtime (analogous to weight swapping in hotswaps)
- Part of a larger multi-step workflow
- Uses
AbstractProceduremodel, notVersion
Configuration:
deployable_metadata_for_procedure_prediction()
Code references:
- Procedure model:
replicate/web/models/models/procedure.py - Kind validation:
logic.py:1165- whenkind == DeployableKind.FUNCTION_PREDICTION, assertsdeployment.used_by_model
Director configuration: Director runs procedures with DIRECTOR_JOB_KIND=procedure
(director/config.go:37)
Queue behavior: Standard queues, no special routing
3. Version Predictions
What: Predictions running directly on a model version (not through a deployment).
Characteristics:
- Ephemeral infrastructure (scaled up/down based on demand)
- Configuration comes from version’s
current_prediction_deployable_config - Two sub-types: normal and hotswap (see below)
Configuration:
deployable_metadata_for_version_prediction()
3a. Normal Version Predictions
What: Standard version predictions without hotswapping.
Characteristics:
- One container image per version
- Standard queue routing
prefer_same_stream = False(default)
Code:
logic.py:1214-1222
if not version.is_hotswappable:
metadata = DeployableConfigSerializer(
version.current_prediction_deployable_config
).data
3b. Hotswap Version Predictions
What: Versions that share a base container but load different weights at runtime. Multiple “hotswap versions” can run on the same pod by swapping weights instead of restarting containers.
Characteristics:
- Multiple versions share the same base Docker image
- Each version has
additional_weights(weights loaded at runtime) - Base version must be public, non-virtual, and accept Replicate weights
prefer_same_stream = True- workers preferentially consume from the same stream to optimize weight locality- Uses base version’s deployment key for infrastructure sharing
When a version is hotswappable
(version.py:586):
def is_hotswappable(self) -> bool:
if not self.additional_weights:
return False
if not self.base_version:
return False
if self.base_docker_image_id != self.base_version.docker_image_relation_id:
return False
return self.base_version.is_valid_hotswap_base
Configuration:
logic.py:1229-1244
deployable_config_fields = {
...
"docker_image": version.base_version.docker_image_relation,
"fuse_config": None,
"key": version.base_version.key_for_hotswap_base_predictions,
"prefer_same_stream": True, # KEY DIFFERENCE
}
Queue behavior:
- Shuffle-sharded queues (like all workloads)
DIRECTOR_PREFER_SAME_STREAM=trueset for hotswap versions- Director workers repeatedly consume from the same stream before checking others
- Optimizes for weight caching locality (reduce weight download/loading overhead)
Why prefer_same_stream matters:
- Hotswap versions load different weights into the same container
- If a worker has already loaded weights for version A, it’s faster to keep processing version A predictions
- Without
prefer_same_stream, workers round-robin across all streams, losing weight locality benefits
Code references:
- Version model:
replicate/web/models/models/version.py - Hotswap validation:
version.py:586-595 - Queue affinity:
director/config.go:50 - Redis implementation:
director/redis/queue.go
4. Version Trainings
What: Training jobs for versions marked as trainable.
Characteristics:
- Uses separate
current_training_deployable_config(not prediction config) - Longer timeouts and different resource requirements
- Different billing model
- Director runs with
DIRECTOR_JOB_KIND=training
Configuration:
deployable_metadata_for_version_training()
Code references:
- Training config:
logic.py:1307-1308 - Director job kind:
director/config.go:37
Summary Table
| Workload Type | Infrastructure | Config Source | prefer_same_stream | Director Job Kind |
|---|---|---|---|---|
| Deployment Prediction | Persistent | Deployment config | False | prediction |
| Function Prediction | Ephemeral | Procedure config | False | procedure |
| Version Prediction (normal) | Ephemeral | Version prediction config | False | prediction |
| Version Prediction (hotswap) | Shared base | Base version config + weights | True | prediction |
| Version Training | Ephemeral | Version training config | False | training |
Key Takeaways
- Replicate has explicit workload type separation with different configurations and behaviors per type
- Hotswap predictions are unique - they’re the only workload using
prefer_same_stream=Truefor weight locality optimization - Function predictions use code swapping - similar to hotswaps but for CPU-only Python code instead of GPU model weights
- Each workload type has distinct characteristics - different infrastructure models, configuration sources, and queue behaviors
These workload distinctions will be referenced throughout the document when discussing queue behavior, timeouts, scaling, and other operational characteristics.