Workload Types in Replicate
Replicate handles several distinct workload types, each with different characteristics and configurations. Understanding these distinctions is essential for comparing with Workers AI, which has a more uniform workload model.
Overview
Replicate defines four primary workload types via the DeployableKind enum
(replicate/web/models/models/deployable_config.py:114):
class DeployableKind(models.TextChoices):
DEPLOYMENT_PREDICTION = "deployment-prediction", "deployment-prediction"
FUNCTION_PREDICTION = "function-prediction", "function-prediction"
VERSION_PREDICTION = "version-prediction", "version-prediction"
VERSION_TRAINING = "version-training", "version-training"
Each workload type has its own deployable_metadata_for_* function in
replicate/web/models/logic.py
that generates the appropriate configuration.
1. Deployment Predictions
What: Predictions running on a Deployment - a stable, long-lived identifier that routes to a backing model version. The backing version can be changed over time and doesn’t need to be owned by the same account as the deployment.
Characteristics:
- Persistent deployment entity (configuration/routing), but infrastructure can scale to 0 replicas
- Custom configuration per deployment that can override many version-level settings
- Uses dedicated deployment key for consistent routing
Configuration:
deployable_metadata_for_deployment_prediction()
Code references:
- Deployment model:
replicate/web/models/models/deployment.py - Kind validation:
logic.py:1156- assertskind == DeployableKind.DEPLOYMENT_PREDICTIONandnot deployment.used_by_model
Queue behavior: Standard shuffle-sharded queues per deployment
2. Function Predictions (Pipelines/Procedures)
What: Predictions for multi-step workflows (Replicate Pipelines). Function predictions run on a shared container image with CPU-only hardware that “swaps” in procedure source code at prediction time. Similar to hotswaps but specifically for CPU-only Python code rather than GPU model weights.
Characteristics:
- Run on CPU hardware (no GPU)
- Share a base container image across procedures
- Procedure source code is swapped in at runtime (analogous to weight swapping in hotswaps)
- Part of a larger multi-step workflow
- Uses
AbstractProceduremodel, notVersion
Configuration:
deployable_metadata_for_procedure_prediction()
Code references:
- Procedure model:
replicate/web/models/models/procedure.py - Kind validation:
logic.py:1165- whenkind == DeployableKind.FUNCTION_PREDICTION, assertsdeployment.used_by_model
Director configuration: Director runs procedures with DIRECTOR_JOB_KIND=procedure
(director/config.go:37)
Queue behavior: Standard queues, no special routing
3. Version Predictions
What: Predictions running directly on a model version (not through a deployment).
Characteristics:
- Ephemeral infrastructure (scaled up/down based on demand)
- Configuration comes from version’s
current_prediction_deployable_config - Two sub-types: normal and hotswap (see below)
Configuration:
deployable_metadata_for_version_prediction()
3a. Normal Version Predictions
What: Standard version predictions without hotswapping.
Characteristics:
- One container image per version
- Standard queue routing
prefer_same_stream = False(default)
Code:
logic.py:1214-1222
if not version.is_hotswappable:
metadata = DeployableConfigSerializer(
version.current_prediction_deployable_config
).data
3b. Hotswap Version Predictions
What: Versions that share a base container but load different weights at runtime. Multiple “hotswap versions” can run on the same pod by swapping weights instead of restarting containers.
Characteristics:
- Multiple versions share the same base Docker image
- Each version has
additional_weights(weights loaded at runtime) - Base version must be public, non-virtual, and accept Replicate weights
prefer_same_stream = True- workers preferentially consume from the same stream to optimize weight locality- Uses base version’s deployment key for infrastructure sharing
When a version is hotswappable
(version.py:586):
def is_hotswappable(self) -> bool:
if not self.additional_weights:
return False
if not self.base_version:
return False
if self.base_docker_image_id != self.base_version.docker_image_relation_id:
return False
return self.base_version.is_valid_hotswap_base
Configuration:
logic.py:1229-1244
deployable_config_fields = {
...
"docker_image": version.base_version.docker_image_relation,
"fuse_config": None,
"key": version.base_version.key_for_hotswap_base_predictions,
"prefer_same_stream": True, # KEY DIFFERENCE
}
Queue behavior:
- Shuffle-sharded queues (like all workloads)
DIRECTOR_PREFER_SAME_STREAM=trueset for hotswap versions- Director workers repeatedly consume from the same stream before checking others
- Optimizes for weight caching locality (reduce weight download/loading overhead)
Why prefer_same_stream matters:
- Hotswap versions load different weights into the same container
- If a worker has already loaded weights for version A, it’s faster to keep processing version A predictions
- Without
prefer_same_stream, workers round-robin across all streams, losing weight locality benefits
Code references:
- Version model:
replicate/web/models/models/version.py - Hotswap validation:
version.py:586-595 - Queue affinity:
director/config.go:50 - Redis implementation:
director/redis/queue.go
4. Version Trainings
What: Training jobs for versions marked as trainable.
Characteristics:
- Uses separate
current_training_deployable_config(not prediction config) - Longer timeouts and different resource requirements
- Different billing model
- Director runs with
DIRECTOR_JOB_KIND=training
Configuration:
deployable_metadata_for_version_training()
Code references:
- Training config:
logic.py:1307-1308 - Director job kind:
director/config.go:37
Summary Table
| Workload Type | Infrastructure | Config Source | prefer_same_stream | Director Job Kind |
|---|---|---|---|---|
| Deployment Prediction | Persistent | Deployment config | False | prediction |
| Function Prediction | Ephemeral | Procedure config | False | procedure |
| Version Prediction (normal) | Ephemeral | Version prediction config | False | prediction |
| Version Prediction (hotswap) | Shared base | Base version config + weights | True | prediction |
| Version Training | Ephemeral | Version training config | False | training |
Key Takeaways
- Replicate has explicit workload type separation with different configurations and behaviors per type
- Hotswap predictions are unique - they’re the only workload using
prefer_same_stream=Truefor weight locality optimization - Function predictions use code swapping - similar to hotswaps but for CPU-only Python code instead of GPU model weights
- Each workload type has distinct characteristics - different infrastructure models, configuration sources, and queue behaviors
These workload distinctions will be referenced throughout the document when discussing queue behavior, timeouts, scaling, and other operational characteristics.