Backup & restore
This page covers backup and restore strategies for the stateful components of an ArmoniK deployment.
What needs to be backed up
ArmoniK has the following stateful components:
Component |
Contains |
Backup priority |
|---|---|---|
MongoDB |
Task metadata, session data, authentication config |
High |
Object storage (MinIO / S3 / GCS) |
Task payloads and results |
High |
Queue (ActiveMQ / RabbitMQ / SQS / PubSub) |
In-flight tasks |
Low — see note below |
Prometheus |
Metrics history |
Low — recreatable, useful only for historical dashboards |
Seq / log storage |
Structured application logs |
Low — useful for post-mortem analysis only |
The queue does not need to be backed up: tasks that are lost during a failure will surface as errors on the client side. Re-submission is not automatic — the client (or operator) must explicitly re-submit the affected tasks.
MongoDB
Managed services (recommended)
When using MongoDB Atlas or a cloud-managed equivalent, enable automated backups through the provider’s interface. Atlas supports continuous backups with point-in-time restore.
Self-managed MongoDB
Use mongodump to create a snapshot of the database:
mongodump --uri="mongodb://<host>:27017" --out=/backup/$(date +%Y%m%d)
To restore:
mongorestore --uri="mongodb://<host>:27017" /backup/<date>
Schedule regular dumps with a cron job and store the output in a location outside the Kubernetes cluster (e.g. an S3 bucket or NFS share).
For a running Kubernetes deployment, run mongodump via a pod:
kubectl -n armonik exec deploy/mongodb -- mongodump --out=/tmp/backup
kubectl -n armonik cp mongodb-<pod>:/tmp/backup ./backup
After restoring MongoDB
If authentication is enabled, the RoleData, UserData, and AuthData collections are populated from parameters.tfvars by the authentication-in-database Job, not from the MongoDB backup itself. After restoring MongoDB, re-run this Job to repopulate those collections from the current configuration:
kubectl -n armonik get job authentication-in-database -o json \
| jq "del(.spec.selector)" \
| jq "del(.spec.template.metadata.labels)" \
| kubectl -n armonik replace --force -f -
Object storage
AWS S3
Enable S3 versioning on the bucket used by ArmoniK. Use S3 lifecycle rules to move older versions to cheaper storage tiers and expire them after a retention period.
For cross-region disaster recovery, enable S3 Cross-Region Replication.
GCP Cloud Storage (GCS)
Enable object versioning on the GCS bucket. Use lifecycle management rules to control retention.
MinIO (on-premises / local)
MinIO supports server-side replication to a secondary MinIO instance. Configure a replication policy via the MinIO console or CLI:
mc mirror --watch minio/armonik-bucket backup-minio/armonik-bucket
For periodic snapshots, use mc cp or mc mirror to copy bucket contents to an external location.
Certificates
Local deployments generate TLS certificates that expire after a configurable period (default: 7 days; recommended: 8760 hours for one year). These are not backed up — if lost, redeploy to regenerate them.
For production deployments using custom certificates, store the CA and client certificates securely in a secrets manager (e.g. AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) and reference them at deploy time.
Disaster recovery checklist
Restore MongoDB from the latest snapshot.
Re-run the authentication-in-database job if authentication is enabled.
Verify object storage is accessible and intact.
Redeploy ArmoniK infrastructure if needed (
terraform apply).Confirm all pods reach
Runningstate withkubectl get po -A.Re-submit any tasks that were in flight at the time of failure.