Deployment Configuration

All ArmoniK deployments are controlled through a single Terraform variables file called parameters.tfvars, located in the target deployment directory:

Deployment

File

Local (localhost)

infrastructure/quick-deploy/localhost/parameters.tfvars

AWS

infrastructure/quick-deploy/aws/parameters.tfvars

GCP

infrastructure/quick-deploy/gcp/parameters.tfvars

The sections below describe each top-level block. Omitting an optional block disables that component; setting a block to {} uses the module defaults.


Storage back-ends

Object storage

Uncomment exactly one of the following. Only the uncommented block is deployed.

# Self-hosted Redis (default for localhost)
redis = {}

# Self-hosted MinIO (S3-compatible)
minio = {}

# NFS share
nfs = {
  server = "172.30.37.125"
  path   = "/srv/files"
}

# AWS ElastiCache (Redis)
# elasticache = {
#   engine             = "redis"
#   engine_version     = "6.x"
#   node_type          = "cache.r4.large"
#   num_cache_clusters = 2
# }

# AWS S3 (recommended for AWS deployments)
s3_os = {}

Queue

Uncomment exactly one of the following:

# ActiveMQ (default for localhost/Kubernetes)
activemq = {
  activemq_opts_memory = "-Xms1g -Xmx3g"
}

# RabbitMQ (≥ 4.0, alternative for on-premises)
# rabbitmq = {}

# AWS SQS (recommended for AWS deployments)
sqs = {}

Database

mongodb = {
  cluster = {
    replicas      = 1       # Must not exceed the number of nodes in the state-database node group
    database_name = "database"
  }
  persistence = {
    shards   = { storage_size = "8Gi" }
    configsvr = { storage_size = "3Gi" }
  }
}

Set mongodb = null and use mongodb_atlas instead to connect to a managed MongoDB Atlas cluster:

mongodb_atlas = {
  project_id   = "<your_project_id>"
  cluster_name = "<your_cluster_name>"
}

Control Plane

control_plane = {
  limits = {
    cpu    = "1000m"
    memory = "2048Mi"
  }
  requests = {
    cpu    = "50m"
    memory = "50Mi"
  }
  default_partition = "default"   # Must match a key in compute_plane
  node_selector     = { service = "control-plane" }
}

default_partition is the partition used when a task is submitted without an explicit partition ID. It must exist in compute_plane.


Compute Plane (partitions)

Each key in compute_plane defines one partition. See the partitioning guide for a full discussion of when and why to use multiple partitions.

compute_plane = {
  my-partition = {
    replicas                         = 0   # Starting replica count; KEDA scales from here
    termination_grace_period_seconds = 30
    node_selector                    = { service = "workers" }

    polling_agent = {
      limits   = { cpu = "2000m", memory = "2048Mi" }
      requests = { cpu = "50m",   memory = "50Mi"   }
    }

    worker = [
      {
        image             = "my-registry/my-worker"
        tag               = "1.0.0"
        image_pull_policy = "IfNotPresent"
        limits            = { cpu = "1000m", memory = "1024Mi" }
        requests          = { cpu = "50m",   memory = "50Mi"   }
      }
    ]

    hpa = {
      type              = "prometheus"
      polling_interval  = 15    # seconds between KEDA checks
      cooldown_period   = 300   # seconds before scaling down after queue empties
      min_replica_count = 0
      max_replica_count = 100
      behavior = {
        restore_to_original_replica_count = true
        stabilization_window_seconds      = 300
        type                              = "Percent"
        value                             = 100
        period_seconds                    = 15
      }
      triggers = [
        {
          type      = "prometheus"
          threshold = 2   # tasks-per-pod ratio that triggers scale-up
        }
      ]
    }
  }
}

Key HPA fields:

Field

Effect

min_replica_count = 0

Allows the partition to scale to zero when idle (saves cost on spot instances)

max_replica_count

Hard ceiling on worker pods for this partition

cooldown_period

How long to wait after the queue empties before removing pods

triggers[].threshold

Scale up when the queue depth per running pod exceeds this value


Ingress

ingress = {
  tls                  = false  # Enable TLS termination at the ingress
  mtls                 = false  # Enable mutual TLS (requires client certificates)
  generate_client_cert = false  # Auto-generate client certificates
}

# Set to null to disable ingress entirely:
# ingress = null

Authentication

Uncomment to require authentication on the gRPC API:

authentication = {
  require_authentication  = true
  require_authorization   = true
  authentication_datafile = "/path/to/auth.json"
  trusted_common_names    = ["armonik.mcp", "armonik.admin"]
}

See the authentication guide for the format of the authentication_datafile.


Core environment variables

The configurations block injects environment variables into ArmoniK Core containers. Variables follow the Section__Key double-underscore convention used by ASP.NET Core.

configurations = {
  core = {
    env = {
      # Queue (AMQP — ActiveMQ / RabbitMQ)
      Amqp__MaxPriority      = "10"
      Amqp__MaxRetries       = "5"
      Amqp__LinkCredit       = "2"  # Number of unacknowledged messages the client can prefetch from the queue
      Amqp__ParallelismLimit = "1"  # Number of concurrent AMQP sessions used by the polling agent

      # Database
      MongoDB__DataRetention             = "1.00:00:00"  # How long to keep completed task records (d.hh:mm:ss)
      MongoDB__TableStorage__PollingDelayMin = "00:00:01"
      MongoDB__TableStorage__PollingDelayMax = "00:00:10"

      # Object storage (Redis / ElastiCache)
      Redis__TtlTimeSpan = "1.00:00:00"  # TTL for task payloads and results
    }
  }
  control = {
    env = {
      Submitter__MaxErrorAllowed = 50  # Max task errors before a session is marked as failed
    }
  }
  worker = {
    env = {
      # Worker-specific env vars (passed to your worker container)
      target_zip_path = "/tmp"
    }
  }
  jobs = {
    env = {
      MongoDB__DataRetention = "1.00:00:00"
    }
  }
}

Common tuning scenarios:

  • High throughput: increase Amqp__ParallelismLimit and Amqp__LinkCredit so the polling agent fetches and processes more messages concurrently.

  • Cost optimisation: reduce MongoDB__DataRetention and Redis__TtlTimeSpan to lower storage requirements.


Monitoring components

Each monitoring component accepts at minimum a node_selector and optional resource overrides:

prometheus = {
  node_selector = { service = "metrics" }
}

metrics_exporter = {
  node_selector = { service = "metrics" }
}

grafana = {
  node_selector = { service = "monitoring" }
}

seq = {
  node_selector = { service = "monitoring" }
}

fluent_bit = {
  is_daemonset  = true
  node_selector = {}
}

Set a component to null (or omit it entirely) to disable it. See Monitoring & Metrics for guidance on sizing Prometheus and the Metrics Exporter.


Logging level

logging_level = "Information"  # Verbose | Debug | Information | Warning | Error | Fatal

Environment description (Admin GUI banner)

environment_description = {
  name        = "aws-dev"
  version     = "0.1.0"
  description = "AWS environment"
  color       = "#80ff80"  # Any valid CSS colour; displayed as the GUI header colour
}

This controls the coloured banner at the top of the Admin GUI. Use a distinct colour per environment (e.g. red for production, green for dev) to reduce the risk of accidentally running commands against the wrong cluster.


Admin GUI

admin_gui = {
  limits   = { cpu = "1000m", memory = "1024Mi" }
  requests = { cpu = "100m",  memory = "128Mi"  }
  node_selector = { service = "monitoring" }
}

static = {
  gui_configuration = {}  # Paste exported GUI JSON here to ship a default configuration
}

See Personalizing the Admin GUI for the gui_configuration format.