Logging at massive scale and under pressure
The agency adopted Apache Kafka® early in its modernization program, but as it moved deeper into cloud infrastructure, it acknowledged that running its own Kafka clusters was not where it delivered value. Nav turned to Aiven for Apache Kafka to manage its mission-critical Kafka clusters. Nav subsequently adopted Aiven for OpenSearch to run small but meaningful workloads, including the search functionality behind nav.no, as well as internal caseworker systems, and Norway's official job marketplace. It also later adopted Aiven for Valkey for caching.
The next challenge was to address data logging.
The agency pays out legally mandated and financially critical welfare benefits to millions of people, and a delayed or incorrect payment can cause real harm, especially for vulnerable populations. Logging enables Nav to verify that its systems are performing lawfully, accurately and in line with the agency’s own quality standards. Even though the agency is moving toward a broader OpenTelemetry ecosystem, with traces and metrics, logs remain critically important.
At Nav's scale, logging alone produces around 400 GB of data every day and requires 60 TB of data to be retained. The agency deploys approximately 4,000 production changes per week, and logs play a key role in maintaining that pace. “When logging works, it's invisible,” says Hans Kristian Flaatten, Platform Engineer at Nav. “When it doesn't work, the consequences spread fast. Developers no longer have visibility across the system. We can’t safely release changes or troubleshoot incidents.”
Nav was operating centralized, on-premises logging built on Elasticsearch. But the arrangement was becoming increasingly untenable, as licensing costs increased and operational demands intensified. Performance wasn’t at a level that an organization of Nav’s size and scope actually needed and, crucially, maintaining the infrastructure required expertise that had little to do with Nav's core mission.
“We had already migrated Kafka to Aiven, which is truly mission-critical. If Kafka goes down, welfare services stop,” says Flaatten. “But like our original Kafka clusters, operating and maintaining our on-premises Elasticsearch logging stack had become a burden. Aiven was already hosting some of our most critical data services, including Kafka, so expanding that partnership to OpenSearch felt like a natural evolution.”