Skip to main content

GCP Marketplace — Package Reference

This is a technical reference for the OpenDSO GCP Marketplace package: how the deployer works, what it creates, and the configuration, security, networking, and persistence model behind it. It complements the hands-on guides:


Executive Summary

OpenDSO is a near-real-time Distribution System Operator (DSO) platform built by Open Energy Solutions Inc. (OES) for managing grid-edge devices, DERs, and microgrid operations using the OpenFMB standard. This GCP Marketplace package deploys the full OpenDSO stack onto a GKE cluster via a single Helm release managed by a custom deployer image.

The package deploys the full "backoffice" tier of the platform — the deployer script, Helm chart (38 subcharts), schema, and verification script. Production installs do not rely on shipped plaintext credential fallbacks, TLS is standardized around a release-scoped secret with compatibility aliases for older mounts, and several backend services use explicit numeric non-root security contexts.


1. Overview

What OpenDSO Is in This Marketplace Package

OpenDSO is an open-source platform enabling interoperability, edge intelligence, and application management for electric distribution grids. It uses the OpenFMB standard (UCAIUG) as its interoperability layer and NATS as its internal message bus. The platform supports DER management, conservation voltage reduction (CVR), ESS management, topology modeling from CIM data, and real-time operational dashboards.

The GCP Marketplace package deploys the full "backoffice" tier of the platform — all central services, databases, and UI applications — as a single Helm release onto a GKE cluster.

Source: opendso-gcp-marketplace/README.md

Main Components and Deployment Shape

A single Helm umbrella chart (opendso, version 0.1.0, appVersion 1.0.0) with 37 internal subcharts + 1 external (Grafana 10.5.14). All components are deployed into a single Kubernetes namespace under a single Helm release name.

CategoryComponents
InfrastructureNATS (messaging, with NKey auth callout), Keycloak 24.0 (identity), Grafana 10.5.14 (monitoring)
DatabasesMongoDB 7.0.2-ubi8, Citus 12.1.2-alpine (PostgreSQL), TimescaleDB 2.26.0-pg16, Keycloak-DB (PostgreSQL 17, optional)
Core ServicesGMS API, Historian, OpenFMB Event Service, NATS Auth Service, Topology Genesis, Topology Nodes
Grid ApplicationsDER Dispatch (app + svc), ESS Manager (app + svc + Redis), ESS Tester (app + svc), Asset Health (svc + sim-svc), CVR (3 services), OmegaDSS, RPCDSS
Frontend AppsOne-Line, GIS, Historian, Inspector, Inventory, Data Viewer, Event Viewer, DER Dispatch App, ESS Manager App, ESS Tester App, Schedule Dispatch, OpenFMB Event Creator, OpenDSO Docs, Genesis Node

Source: chart/Chart.yaml, chart/README.md, chart/values.yaml

Intended Runtime Environment

  • GKE cluster (Kubernetes 1.24+)
  • External nginx ingress controller
  • Wildcard DNS pointing to nginx LoadBalancer
  • TLS certificate ideally pre-created as a Kubernetes secret; the chart can also generate a self-signed fallback in the Marketplace path for test or recovery scenarios
  • Images served from GCP Artifact Registry (mirrored from OES registry)

2. Installation Model

How the Marketplace Deployment Works at a High Level

The deployer is a custom Docker image that extends the GCP deployer_helm base image. It executes deployer/deploy.sh, which performs these steps in sequence:

  1. NKey generation (step 1/3): Generates NATS NKey pairs (account, user, curve/xkey). On re-runs, existing keys from the <release>-nats-auth-keys secret are reused to avoid NATS reconfiguration.
  2. Secret creation (step 2/3): Creates/updates the <release>-nats-auth-keys Kubernetes secret. Generates or reuses passwords for opendso-apps-db and citus-db.
  3. Topology Genesis ConfigMap (step 3a): Pre-creates the topology-genesis site ConfigMap via kubectl apply --server-side. This is required because cim.xml is 367KB, exceeding the 262KB Kubernetes annotation limit for client-side apply.
  4. License secret (step 3a-2): Creates <release>-opendso-license secret containing LICENSE_KEY, LICENSE_INSTALLATION_KEY, and LICENSE_ENVIRONMENT_NAME (cluster UID from kube-system namespace).
  5. Keycloak client secrets (step 3b): Generates UUID secrets per Keycloak client, injects them into the realm JSON (so Keycloak imports pre-populated secrets on first boot), and creates <release>-<client>-keycloak-env Kubernetes secrets.
  6. Helm deploy (step 3/3): Runs helm upgrade --install with a 15-minute timeout, combining values-gcp.yaml, user values from /data/user/values.yaml, and an auto-generated NATS auth overlay.
  7. Keycloak secret sync (step 4/4): Pushes client secrets to Keycloak via Admin REST API post-deploy (non-fatal if Keycloak not yet ready — secrets are already in the realm JSON from step 3b).
  8. Status patch: Calls patch_assembly_phase.sh --status="Success".

Source: deployer/deploy.sh

Role of Deployer, Helm Chart, and Configuration Values

  • Deployer image (deployer/Dockerfile, deployer/deploy.sh): Orchestrates all pre-Helm setup, secret generation, and Keycloak provisioning. Extends deployer_helm.
  • Helm chart (chart/): Umbrella chart with 38 subcharts. All services are configured via values.yaml defaults, overridden by values-gcp.yaml (GCP-specific production settings), user-supplied Marketplace UI parameters, and an auto-generated NATS auth overlay.
  • schema.yaml: Defines the GCP Marketplace UI parameters (name, namespace, domain, license key, installation key, Keycloak/MongoDB/Grafana credentials, resource profile, image registry).

Values precedence (last wins): values-gcp.yaml/data/user/values.yaml (Marketplace UI inputs) → auto-generated NATS overlay → --set overrides for release-name-dependent values.

Source: deployer/deploy.sh, schema.yaml, chart/values.yaml, chart/values-gcp.yaml

Release/Namespace Behavior

  • All resources are deployed into a single namespace specified by the Marketplace UI.
  • The Helm release name equals the application instance name (APP_INSTANCE_NAME) from the Marketplace framework.
  • All service references use {{ .Release.Name }}-<service> naming patterns, enabling multi-release deployments in separate namespaces.
  • The deployer adopts any Application resource pre-created by mpdev into Helm management via annotation/label patching.

Source: chart/README.md, deployer/deploy.sh

What Is Created During Install

  • Kubernetes Deployments for all enabled services and frontends
  • StatefulSets for MongoDB, Citus, opendso-apps-db, Keycloak-DB (if enabled), ESS Manager Redis
  • Services (ClusterIP) for all components
  • Ingress resources: <release>-ingress (UI apps), <release>-ingress-api (GMS API), <release>-ingress-nats-ws (NATS WebSocket)
  • PVCs for stateful components (MongoDB, Citus, opendso-apps-db, Grafana, Keycloak, topology-genesis, openfmb-event-service, asset-health-sim-svc)
  • Secrets: NATS auth keys, Keycloak env per client, Grafana credentials, database credentials, release-scoped TLS secret, and TLS compatibility aliases (root-ca, server-cert, server-key) when TLS management is enabled
  • ConfigMaps: site configs (topology-genesis, keycloak realm, mongodb init, etc.), frontend environment config
  • Jobs: <release>-mongodb-init (runs once to initialize MongoDB collections/users)
  • Roles, RoleBindings, ServiceAccounts for specific services (gms-api, mongodb, der-dispatch-svc)
  • A Application resource (app.k8s.io/v1beta1) for GCP Marketplace tracking
  • PodDisruptionBudgets for omegadss-svc and rpcdss-svc

Source: chart/Chart.yaml, chart/templates/, individual subchart templates


3. Prerequisites

Cluster Requirements

  • GKE cluster, Kubernetes 1.24+, Helm 3.8+
  • nginx-ingress controller installed and running (external LoadBalancer IP assigned)
  • app.k8s.io/v1beta1 Application CRD installed (required by mpdev tooling)
  • GKE node service account must have Artifact Registry Reader access (for image pulls; imagePullSecrets is set to [] in the GCP overlay — Workload Identity or node SA role required)

Source: opendso-gcp-marketplace/README.md, deployer/deploy.sh (NATS_AUTH_VALUES sets imagePullSecrets: [])

DNS / Ingress / TLS Requirements

  • Wildcard DNS *.yourdomain.com pointing to the nginx LoadBalancer IP

  • TLS certificate pre-created as a Kubernetes secret named <release-name>-tls-secret in the target namespace is the preferred production model

    Options documented:

    • cert-manager + Let's Encrypt (recommended)
    • Self-signed certificate via mkcert
  • Important nuance: When global.gcpMarketplace=true (set by the deployer), tls-secrets.yaml will auto-generate a self-signed <release>-tls-secret if it does not already exist and will also create the legacy alias secrets root-ca, server-cert, and server-key for workloads that still mount those names. This allows the deployer to proceed without a pre-existing cert, but the resulting certificate is self-signed and not trusted by browsers. The intended long-term production model is still a real pre-created or cert-manager-issued certificate.

Source: opendso-gcp-marketplace/README.md, chart/templates/tls-secrets.yaml

Storage Requirements

  • Default storage class: standard (values.yaml) / standard-rwo (values-gcp.yaml)
  • PVCs created at install:
ComponentSize (GCP profile)Notes
MongoDB10Gi data + 5Gi logdeleteOnUpgrade: false
Citus DB50GiProduction size; dev default is 10Gi
opendso-apps-db10GiHosts ess_tester + assets DBs
Grafana10GiDashboard persistence
Keycloak10GiRealm data persistence
topology-genesis(chart default)CIM topology data
openfmb-event-service(chart default)Event data
asset-health-sim-svc(chart default)Simulation data
ESS Manager Redis1GiCache persistence

Source: chart/values-gcp.yaml, chart/charts/*/templates/pvc.yaml

Image Registry / Pull Access Assumptions

  • All images must be present in GCP Artifact Registry before deploying
  • The GCP overlay sets imagePullSecrets: [] — pull access relies on GKE Workload Identity or the node service account's Artifact Registry Reader IAM role
  • OES images (gms-api, historian, topology-genesis, etc.) plus third-party images (NATS, MongoDB, Citus, Keycloak, Redis, TimescaleDB, Envoy) must all be mirrored
  • schema.yaml images section is populated for the chart-managed default Marketplace image set

Source: schema.yaml, chart/values.yaml, chart/values-gcp.yaml, deployer/deploy.sh

Required Secrets, Licenses, and Keys

The following must be obtained from OES before deployment:

  • OpenDSO License Key (license.key) — required; validated via <release>-opendso-license secret consumed by topology-nodes
  • OpenDSO Installation Key (installation.key) — required; stored as LICENSE_INSTALLATION_KEY in the same secret

The following credentials are user-supplied in the Marketplace UI:

  • Keycloak admin password
  • MongoDB root password and app password
  • Grafana admin password

All credentials are passed via GCP Marketplace's MASKED_FIELD mechanism and stored in Kubernetes Secrets.

Source: schema.yaml, deployer/deploy.sh, chart/values-gcp.yaml (topology-nodes env section)


4. Configuration Model

Important Marketplace Inputs (schema.yaml properties)

ParameterTypeRequiredDefaultNotes
namestringyesInjected by Marketplace (release name)
namespacestringyesInjected by Marketplace
license.keyMASKED_FIELDyesOES-issued license key
installation.keyMASKED_FIELDyesOES-issued installation key
global.domainstringyesBase domain, e.g. opendso.example.com
global.imageRegistrystringno""Artifact Registry prefix
keycloak.config.adminUserstringnoadmin
keycloak.config.adminPasswordMASKED_FIELDyes
mongodb.auth.rootUsernamestringnoroot
mongodb.auth.rootPasswordMASKED_FIELDyes
mongodb.auth.usernamestringnoopendso
mongodb.auth.passwordMASKED_FIELDyes
grafana.adminUserstringnoadmin
grafana.adminPasswordMASKED_FIELDyes
global.resourceProfileenumnodefaultminimal, default, or production

Source: schema.yaml

Domain, TLS, Keycloak, Database, and Storage Configuration

Domain: global.domain sets the base domain. All ingress routes and Keycloak URLs are derived from it. Frontend apps receive global.environment.apiUrl and global.environment.natsUrl via a frontend-environment-configmap rendered from this domain.

TLS: The deployer passes --set global.tls.existingSecret=<release>-tls-secret --set ingress.tls.secretName=<release>-tls-secret --set nats.tls.secretName=<release>-tls-secret. If the secret does not exist prior to Helm running, tls-secrets.yaml generates a self-signed cert when global.gcpMarketplace=true and creates root-ca, server-cert, and server-key compatibility secrets for older workload mounts.

Keycloak: The deployer sets:

  • global.keycloak.internalUrlhttp://<release>-keycloak-svc:8080 (for in-cluster service communication)
  • global.keycloak.urlhttps://keycloak.<domain> (for browser-facing flows)
  • Keycloak is configured with realm name oes, client ID gms (shared by UI and API)
  • keycloak.config.hostnameStrict: "false" is set directly in values-gcp.yaml; no deployer --set override is needed or present

Databases: Internal databases are enabled by default:

  • MongoDB (settings_api DB, user opendso)
  • Citus DB (ofmb_db, user citususer) — historian time-series data
  • opendso-apps-db (TimescaleDB, user essuser) — hosts ess_tester and assets databases
  • keycloak-db (PostgreSQL) — disabled by default; Keycloak uses in-chart persistence (keycloak.persistence.enabled: true)

External database overrides are supported for all three databases via externalDatabase.* values.

Storage: global.storageClass: standard-rwo in GCP overlay. All PVC-backed components use this storage class. pd-ssd is documented as an alternative for production workloads.

Source: chart/values.yaml, chart/values-gcp.yaml, deployer/deploy.sh, chart/templates/

Values Expected from User vs Generated Automatically

User-provided (Marketplace UI):

  • Domain, license key, installation key, all passwords, image registry, resource profile

The production chart path no longer relies on shipped plaintext password defaults for MongoDB, Keycloak, Grafana, Citus, or opendso-apps-db. Those values must be supplied explicitly or created by the deploy flow.

Deployer-generated at deploy time:

  • NATS NKey pairs (account seed, user seed, xkey seed) → stored in <release>-nats-auth-keys secret
  • Keycloak client UUIDs → stored in <release>-<client>-keycloak-env secrets
  • opendso-apps-db password → stored in <release>-opendso-apps-db-secret
  • citus-db password → stored in <release>-citus-db-secret
  • License environment name (cluster UID from kube-system)
  • topology-genesis ConfigMap (server-side applied)

Helm-generated:

  • TLS secrets (self-signed fallback if not pre-existing, plus root-ca, server-cert, and server-key aliases when managed by the chart)
  • Grafana credentials secret
  • MongoDB admin/app secrets
  • GMS API MongoDB secret
  • Grafana datasource credentials

Which Values Must Remain Release-Specific

The following are templated using {{ .Release.Name }} and must be set as --set overrides (cannot be in values files), as documented in deployer/deploy.sh:

  • grafana.admin.existingSecret
  • grafana.envValueFrom.CITUS_PASSWORD.secretKeyRef.name
  • grafana.envValueFrom.OPENDSO_APPS_DB_PASSWORD.secretKeyRef.name
  • global.tls.existingSecret
  • ingress.tls.secretName
  • nats.tls.secretName
  • global.keycloak.internalUrl
  • global.environment.apiUrl
  • global.keycloak.url

Source: chart/README.md, deployer/deploy.sh


5. Security Model

TLS Expectations

  • TLS terminates at the nginx ingress layer; internal service-to-service communication is HTTP (ClusterIP, no TLS)
  • NATS uses TLS (nats.tls.enabled: true in values-gcp.yaml) with the same <release>-tls-secret
  • MongoDB TLS is explicitly disabled (mongodb.tls.enabled: false) — internal ClusterIP-only traffic
  • Keycloak in values-gcp.yaml has httpsEnabled: true referencing cert files, and hostnameStrict: "false" is set directly in values-gcp.yaml; no deployer override is needed or present
  • The NATS WebSocket ingress (ingress-nats-ws.yaml) uses nginx.ingress.kubernetes.io/backend-protocol: "HTTPS", indicating it connects to NATS over TLS internally

Source: chart/values-gcp.yaml, deployer/deploy.sh, chart/templates/ingress-nats-ws.yaml

Secret Handling

  • License key and installation key are stored in <release>-opendso-license Kubernetes Secret (created by deployer, not Helm, so not in Helm state)
  • NATS NKey seeds stored in <release>-nats-auth-keys Kubernetes Secret
  • Keycloak client secrets stored in per-client <release>-<client>-keycloak-env Kubernetes Secrets
  • All Marketplace credential inputs (MASKED_FIELD) stored as Kubernetes Secrets by the deployer framework
  • The production chart path does not ship plaintext fallback passwords; sensitive credentials must come from Marketplace inputs, deployer-generated secrets, or pre-created secrets
  • TLS is standardized around <release>-tls-secret; when the chart manages TLS it also creates root-ca, server-cert, and server-key compatibility secrets for workloads that still mount legacy names
  • NKey private seeds are never committed to the repository; they are generated at deploy time
  • Keycloak client secrets are generated as UUIDs at deploy time and never stored in the repository
  • Database internal passwords (opendso-apps-db, citus-db) are generated via python3 -c 'import secrets; print(secrets.token_urlsafe(24))' and stored in dedicated secrets
  • On re-runs, all generated secrets are reused (idempotent)

Source: deployer/deploy.sh, opendso-gcp-marketplace/README.md

Authentication / Keycloak Model

  • Keycloak 24.0 is deployed in-cluster, accessible externally at https://keycloak.<domain>
  • Realm name: oes
  • Client ID shared by UI and API: gms
  • Per-service NATS clients each have their own Keycloak client ID and secret, injected via <release>-<client>-keycloak-env secrets
  • The realm JSON (configs/ieee13/keycloak/realm/oes-realm.json) contains REPLACE_SECRET_<client-id> placeholders. The deployer replaces these with generated UUIDs before Helm runs.
  • Grafana anonymous access is disabled (GF_AUTH_ANONYMOUS_ENABLED: "false")

Source: deployer/deploy.sh, chart/values.yaml

Non-Root / Container Security Posture

The chart uses a mixed hardening model rather than forcing one policy across every workload:

  • Selected backend services that are known to support numeric non-root execution use explicit non-root security contexts. Examples include omegadss-svc, rpcdss-svc, ess-manager-svc, ess-tester-svc, nats-auth-svc, and keycloak.
  • Additional backend services such as historian-svc, topology-genesis, openfmb-event-service, and der-dispatch-svc also carry explicit non-root-oriented security settings in their subchart values.
  • Frontend apps generally carry baseline hardening (seccompProfile.type: RuntimeDefault, allowPrivilegeEscalation: false, dropped capabilities), but universal runAsNonRoot is not yet documented as safe for every frontend image.
  • Some stateful or broker-style workloads remain intentionally more conservative because their startup routines still need image-default filesystem behavior. This currently applies to components such as NATS, Citus DB, opendso-apps-db, and ESS Manager Redis.
  • MongoDB and keycloak-db have their own image-specific security contexts, but those should still be validated against the exact runtime path used in Marketplace testing.

In practice, the hardening baseline that is most consistently applied across the chart is:

  • allowPrivilegeEscalation: false
  • capabilities.drop: [ALL]
  • seccompProfile.type: RuntimeDefault

Source: chart/values.yaml, chart/values-gcp.yaml

Documented Limitations or Exceptions

  • MongoDB TLS is disabled; relies on ClusterIP network isolation
  • GMS API's Docker API is pointed to http://127.0.0.1:2376 on GKE because GKE uses containerd (no Docker socket). Orchestration API calls will fail gracefully.
  • keycloak-db (separate PostgreSQL for Keycloak) is disabled by default; Keycloak uses its built-in persistence with a PVC
  • grafana.rbac.namespaced: true is set in values-gcp.yaml because the GCP Marketplace deployer SA only has namespace-scoped permissions

Source: chart/values-gcp.yaml


6. Networking

Public Endpoints

After deployment at <domain>:

EndpointURL PatternBackend Service / Port
GMS / Genesis Node (root)https://gms.<domain> and https://<domain>genesis-node-app :8081
Keycloakhttps://keycloak.<domain>keycloak-svc :8080
Grafanahttps://grafana.<domain>grafana :80
GMS APIhttps://api.<domain>gms-api :8000
NATS WebSocketwss://nats.<domain>nats-service-ws :9222
GIS Apphttps://gis.<domain>gis-app :8084
One-Line Apphttps://oneline.<domain>one-line-app :8085
Event Viewerhttps://eventviewer.<domain>event-viewer-app :8088
Inventoryhttps://inventory.<domain>inventory-app :8089
OpenFMB Event Creatorhttps://openfmbeventcreator.<domain>openfmb-event-creator-app :8090
Data Viewerhttps://dataviewer.<domain>data-viewer-app :8093
Inspector (OpenFMB)https://openfmb.<domain>inspector-app :8086
DER Dispatchhttps://derdispatch.<domain>der-dispatch-app :8095
ESS Managerhttps://device.<domain>ess-manager-app :8094
ESS Testerhttps://esstesting.<domain>ess-tester-app :8096
Historianhttps://historian.<domain>historian-app :8087
Schedule Dispatchhttps://scheduledispatch.<domain>schedule-dispatch-app :8094
OpenDSO Docshttps://docs.<domain>opendso-docs-app :8092

Source: chart/templates/ingress.yaml, chart/templates/ingress-api.yaml, chart/templates/ingress-nats-ws.yaml, opendso-gcp-marketplace/README.md

Internal Service Communication

  • All service-to-service communication uses ClusterIP Services with the pattern <release>-<service>
  • NATS messaging: internal services connect to <release>-nats-service:4222
  • MongoDB: <release>-mongodb:27017
  • Citus DB: <release>-citus-db:5432
  • opendso-apps-db (TimescaleDB): <release>-opendso-apps-db:5432
  • Keycloak internal: http://<release>-keycloak-svc:8080
  • GMS API: <release>-gms-api:8000
  • ESS Manager Redis: <release>-ess-manager-redis:6379 (assumed from Redis default)

Ingress Behavior

Three separate Ingress resources are created:

  1. <release>-ingress — all UI frontend apps, Keycloak, Grafana; shared TLS secret; nginx annotations for CORS, proxy timeouts, and SSL redirect
  2. <release>-ingress-api — GMS API at api.<domain>; CORS with dynamically computed origin list
  3. <release>-ingress-nats-ws — NATS WebSocket at nats.<domain>; backend-protocol: HTTPS, WebSocket enabled, 3600s timeouts

All ingress uses ingressClassName: nginx. TLS is applied via <release>-tls-secret to cover all hosts.

Source: chart/templates/ingress.yaml, chart/templates/ingress-api.yaml, chart/templates/ingress-nats-ws.yaml

Ports and Protocols

ProtocolPortUsage
HTTPS443All external access via nginx ingress
HTTP80Redirected to HTTPS by nginx
NATS TCP4222Internal in-cluster NATS client connections
NATS WebSocket9222Internal; exposed externally via ingress as WSS
MongoDB27017Internal ClusterIP only
PostgreSQL (Citus)5432Internal ClusterIP only
PostgreSQL (apps-db)5432Internal ClusterIP only
Keycloak HTTP8080Internal; nginx terminates TLS externally

7. Persistence and Data

Stateful Components

Stateful components use StatefulSets with PVCs:

ComponentStatefulSetData Stored
MongoDB<release>-mongodbGMS API settings, user config, application state (settings_api DB)
Citus DB<release>-citus-dbOpenFMB historian time-series data (ofmb_db DB)
opendso-apps-db<release>-opendso-apps-dbESS tester data (ess_tester DB) and asset health data (assets DB, TimescaleDB features)
ESS Manager Redis<release>-ess-manager-redisESS state caching

Deployments with PVCs (not StatefulSets):

  • Keycloak — realm/session data (keycloak.persistence.enabled: true, 10Gi)
  • Grafana — dashboard persistence (10Gi)
  • topology-genesis — parsed CIM topology data
  • openfmb-event-service — event data
  • asset-health-sim-svc — simulation data

PVC / Storage Expectations

  • global.storageClass: standard-rwo (GKE persistent disk, HDD). pd-ssd is noted as recommended for production databases.
  • All PVCs use ReadWriteOnce access mode (implied by standard-rwo)
  • MongoDB: deleteOnUpgrade: false — PVC data is preserved across Helm upgrades

What Data Is Persisted

  • Grid Management System (GMS) configuration and user settings — MongoDB
  • OpenFMB historian time-series data — Citus DB
  • ESS testing data and asset health telemetry — opendso-apps-db (TimescaleDB)
  • Keycloak realm state, users, sessions
  • Grafana dashboards and preferences
  • Topology CIM model (CIM XML parsed by topology-genesis)

Upgrade or Reinstall Implications

  • Helm upgrade re-uses all existing Kubernetes secrets (NATS keys, Keycloak client secrets, database passwords) — fully idempotent by design
  • The mongodb-init Job is deleted before each upgrade (immutable Job specs) and re-created; it runs idempotently
  • Keycloak realm data is NOT re-imported on upgrades if a PVC already exists; it only imports on a fresh database. The deployer's step 4 (Keycloak Admin API sync) compensates for this by syncing client secrets post-deploy.
  • PVC data is preserved across upgrades (deleteOnUpgrade: false on MongoDB)
  • Uninstall (helm uninstall) does NOT delete PVCs by default; manual cleanup required

Source: chart/values-gcp.yaml, chart/README.md, deployer/deploy.sh


8. Operations

Basic Health / Verification Expectations

scripts/verify.sh performs the following checks post-deploy (10-minute timeout):

  1. StatefulSet readiness: <release>-mongodb
  2. Deployment readiness: <release>-keycloak, <release>-nats
  3. Database StatefulSet readiness attempts: <release>-citus-db, <release>-opendso-apps-db, <release>-keycloak-db
  4. Application service readiness: <release>-gms-api, <release>-historian-svc, <release>-nats-auth-svc
  5. NATS auth keys secret existence check
  6. Keycloak OIDC discovery via port-forward: http://localhost:18080/realms/oes/.well-known/openid-configuration
  7. GMS API health via port-forward: http://localhost:18081/api/health (falls back to /api)

Important implementation detail: although the database checks are written with || true, the helper they call exits the script on timeout. In practice, these database readiness checks are currently fatal, not non-fatal.

Source: scripts/verify.sh

Known Startup Dependencies

  • Keycloak readinessProbe.initialDelaySeconds: 120, livenessProbe.initialDelaySeconds: 180 — long startup expected
  • Grafana same: 120s readiness, 180s liveness initial delay
  • MongoDB init job waits for MongoDB readiness before running init script
  • Services that consume Keycloak client secrets via envFrom will fail if those secrets are missing — the deployer creates them before Helm runs
  • Keycloak Admin API sync in deploy.sh retries up to 60 seconds (12 attempts × 5s) for Keycloak to become available
  • Helm is invoked with --timeout 15m (no --wait flag) — the 15-minute timeout applies to the resource application operation, not pod readiness

Source: chart/values.yaml, deployer/deploy.sh

Upgrade Behavior

helm upgrade <release-name> . \
-f values-gcp.yaml \
-f <user-values> \
-n <namespace>
  • All generated secrets are reused on re-runs (NATS keys, Keycloak secrets, DB passwords)
  • MongoDB init Job is deleted before upgrade and re-applied (idempotent)
  • topology-genesis ConfigMap is re-applied via --server-side
  • Keycloak client secrets are synced to the live Keycloak instance via Admin API
  • Application resource version is patched to 1.0.0 after Helm completes

Source: deployer/deploy.sh, chart/README.md

Uninstall / Cleanup Expectations

# Uninstall release
helm uninstall <release-name> -n <namespace>

# PVCs must be deleted manually
kubectl delete pvc -l app.kubernetes.io/instance=<release-name> -n <namespace>

# Secrets not managed by Helm must be deleted manually
kubectl delete secret <release-name>-nats-auth-keys \
<release-name>-opendso-license \
<release-name>-opendso-apps-db-secret \
<release-name>-citus-db-secret \
-n <namespace>

# Per-client Keycloak env secrets
kubectl delete secret -l ... -n <namespace> # (no label selector documented)

# Optional: delete namespace
kubectl delete namespace <namespace>

The deployer-created secrets (nats-auth-keys, opendso-license, per-client Keycloak envs, opendso-apps-db-secret, citus-db-secret) are NOT managed by Helm and will NOT be deleted by helm uninstall.

Source: chart/README.md, deployer/deploy.sh

Troubleshooting Guidance Present in Docs

From chart/README.md:

  • Grafana pod pending → check CPU/memory resources, describe pod
  • Secret not found → kubectl get secrets, recreate manually
  • Grafana redirect issues → verify GF_SERVER_ROOT_URL
  • Image pull errors → verify the GKE node service account has roles/artifactregistry.reader on the Artifact Registry; imagePullSecrets is empty ([]) in the GCP overlay — image pulls rely on Workload Identity or the node SA IAM role, not a regsecret
  • Helm dependency issues → rm -rf charts/*.tgz Chart.lock && helm dependency update

Debug commands documented:

  • kubectl get all -n <namespace>
  • kubectl get pods -n <namespace> -o wide
  • kubectl logs -l app.kubernetes.io/name=<name> -n <namespace>
  • kubectl get events -n <namespace> --sort-by='.lastTimestamp'
  • kubectl get/describe ingress -n <namespace>
  • Test service connectivity with a debug pod (busybox)

Note: A dedicated GKE troubleshooting guide exists at GKE_MARKETPLACE_TROUBLESHOOTING.md. The troubleshooting guidance in chart/README.md still serves as the lower-level Helm/Kubernetes quick reference.

Source: chart/README.md


9. Marketplace Review Considerations

GCP Marketplace Hard Requirements / Relevant Items

  • Application CRD: app.k8s.io/v1beta1 Application resource is correctly created with partner_id: oesinc, product_id: opendso, partner_name: Open Energy Solutions Inc. (see chart/templates/application.yaml)
  • Schema version: schemaVersion: v2, applicationApiVersion: v1beta1 — correct for current Marketplace deployer_helm
  • schema.yaml images section: populated with 29 repository/tag/digest mappings for the chart-managed default Marketplace image set. This covers the OpenDSO application images plus the main chart-managed infrastructure images such as NATS, Keycloak, MongoDB, Citus, Redis, and the core services. It does not describe every dependency-chart image path in the repo, most notably Grafana dependency images.
  • managedUpdates: kalmSupported: false — KALM update support is not claimed
  • All passwords use MASKED_FIELD — correctly prevents plaintext display in Marketplace UI
  • helm --timeout 15m without --wait — the deployer runs helm upgrade --install with --timeout 15m but no --wait flag. Helm exits after resource application and readiness is checked separately by scripts/verify.sh.
  • Deployer image: Must be built and pushed to Artifact Registry. Build instructions are in the README.
  • mpdev verify tooling: scripts/mpdev.sh and scripts/provision-test-env.sh exist for local testing.

Potential Reviewer Questions or Weak Spots

  1. TLS certificate fallback remains a product-policy questiontls-secrets.yaml can still auto-generate a self-signed certificate when global.gcpMarketplace=true and no <release>-tls-secret exists. This is only for mpdev verify, controlled test installs, and installer resilience, not the intended production TLS model. A reviewer may still ask why that fallback remains enabled in Marketplace code paths.
  2. License model is intentionally out-of-band from Marketplace metering — OpenDSO uses OES-issued license.key and installation.key values and validates them through the license API path used by topology-nodes. Reviewers may ask whether Marketplace entitlement and OES licensing are expected to coexist or whether OES licensing is the sole enforcement mechanism.
  3. Artifact Registry readiness is still an operational dependency — the GCP overlay relies on node-level IAM / Workload Identity instead of imagePullSecrets, and the image mirroring process remains manual. A misconfigured registry or IAM binding fails at runtime rather than at chart render time.
  4. GMS API orchestration features are intentionally limited on GKEgms-api.config.dockerApi is stubbed on GKE because there is no Docker socket. A reviewer may ask which user-visible features are unavailable as a result.

10. Reference Notes / Clarifications

Default Marketplace-enabled component set

The baseline chart enables the following by default in the Marketplace path: nats, keycloak, grafana, mongodb, citus-db, historian-svc, gms-api, openfmb-event-service, topology-genesis, topology-nodes, der-dispatch-app, der-dispatch-svc, genesis-node-app, data-viewer-app, event-viewer-app, gis-app, historian-app, inspector-app, inventory-app, one-line-app, opendso-docs-app, openfmb-event-creator-app, schedule-dispatch-app, ess-manager-app, ess-tester-app, ess-manager-svc, ess-tester-svc, opendso-apps-db, ess-manager-redis, omegadss-svc, and rpcdss-svc. Disabled by default are nats-auth-svc in base values, keycloak-db, asset-health-svc, asset-health-sim-svc, and the three CVR services. In the Marketplace path specifically, the deployer enables nats-auth-svc via a generated overlay. This means the practical Marketplace default set is the base enabled list plus nats-auth-svc.

License and installation-key validation behavior

topology-nodes enforces runtime license validation when built with ENABLE_LICENSE_VALIDATION=ON by calling POST <LICENSE_API_URL>/v1/license/validate with license_key, installation_key, and environment_name at startup and then re-validating every 8 hours by default. After 3 consecutive re-validation failures, it exits. The entitlement service in ../entitlement-py verifies both the HMAC-signed license key and the installation key, binds installation keys to an environment name, and enforces installation limits. The Marketplace deployer creates <release>-opendso-license with the env var names topology-nodes expects: LICENSE_KEY, LICENSE_INSTALLATION_KEY, and LICENSE_ENVIRONMENT_NAME.

NATS and ESS state during rolling updates

The current chart does not prove durable preservation of all transient state across rolling updates. NATS is deployed as a Deployment, not a StatefulSet, and its JetStream store_dir: datastore is not backed by a PVC in the chart, so in-flight or broker-local persisted NATS state should not be documented as durable across pod replacement. For ESS Manager, Redis state is persisted via a PVC-backed StatefulSet, but the service's DAY_AHEAD_DIR is mounted from emptyDir, so working files in /var/lib/ess/dayahead are ephemeral across pod replacement. The correct stance is that some state is durable (Redis PVC), some is explicitly not (emptyDir working files), and the repo does not establish a guarantee for preserving in-flight NATS messages across rolling upgrades.