Dec 7, 2021

Preparing Apache Kafka® for Scala 3

Aiven's OSPO is dealing with migration of Apache Kafka to Scala 3. Find out how they managed!

josep-prat — Josep Prat
|RSS Feed
Senior Engineering Director, Streaming Services

Apache Kafka is a distributed event streaming platform, and it's open source. Today's post isn't about the cool things you can do with it. This time we talk about Kafka's internals. Specifically, we discuss one of the latest tasks being done at Aiven's Open Source Program Office: migration of Apache Kafka to Scala 3.

What is Scala?

Scala is a strong statically typed programming language that combines the object-oriented and functional programming paradigms.

Scala code can be compiled to run on the JVM, JavaScript engines, or even the LLVM compiler.

Particularities of Scala

Scala has been known to suffer from cumbersome migrations between major versions (for example from 2.12 to 2.13), as the bytecode generated between them is not even meant to be compatible. This means that for a Scala project to be migrated to a new major version, all the dependencies need to be migrated first, creating massive delays in bigger projects. However, this changed drastically with Scala 3 as the compiler can now read dependency files compiled for Scala 2.131.

If you're not familiar with Apache Kafka internal code, you might wonder what Scala and Kafka have to do with each other. The answer is quite simple: Kafka is written in Java and Scala. However, the percentage of Scala code in Kafka's codebase is decreasing version by version, going from roughly 50% in Apache Kafka 0.7, to the current 23%2. As of Apache 3.1.0, the largest and most important module written in Scala is the core one, which as its name indicates is Kafka's heart'. The other module written in Scala is a Scala API module for Kafka Streams.
Kafka, however, is not using most of the de-facto standard tools in Scala (build tool, testing libraries...). Also one might argue that the Scala code written in Kafka is not written in the one of the widely accepted idiomatic ways.

Why update to Scala 3?

As with many libraries and many other languages, once a version is too old it will stop receiving updates. This means any new vulnerability discovered won't be fixed or there will be no new improvements added to that version. Scala is no exception. That's why migrating to Scala 3 is a way to keep up with security updates and upcoming features and performance improvements.

Additionally, by migrating to Scala 3, we would make the work easier for anybody depending on any Kafka artifact compiled with Scala.

Migrating to Scala 3

During the proof of concept to check the feasibility of the Scala 3 migration, we encountered several problems. These were in different parts of the ecosystem: the build tool, the bytecode generation, and the Java interoperability. We will describe each of those points with further detail.

Gradle enhancement

When facing the task of migrating to Scala 3, the first problem we encountered was that the build tool used by Kafka, Gradle, didn't support Scala 3 yet. There was already an feature request in Gradle's ticketing system but it was not being worked on, just waiting for somebody from the community to step in.

We approached the Scala Center 3 asking if the Gradle support was something they would have bandwidth to contribute to. One of its members, Tomasz Godzik from Virtus Lab, very well known in the Scala community for their work on different build tools and IDE support, showed interest in this task and contributed Scala 3 support to Gradle. We are really thankful to Tomasz for contributing this feature!

The migration itself (Syntax)

Once support for Scala 3 was available (as a nightly build) in Gradle, we could really start with the migration. The vast majority of the problems found were related on how Kafka's Scala code was written, using some features or capabilities that are now discouraged.

Too many parentheses

Scala allows you to define methods that take no parameters as parenthesis-less methods. In Scala 2, such methods could be called with or without parenthesis. In Scala 3 though, under some special circumstances, the compiler is not able to properly assume the parenthesis belong to a parenthesis-less method and tries to apply it to the result. This does not apply to all instances where a method without parenthesis is called with them, only to certain ones.

Here you can see an example with only the relevant parts showcased:

Given:

class KafkaConfig private(doLog: Boolean, val props: java.util.Map[_, _], dynamicConfigOverride: Option[DynamicBrokerConfig])
  extends AbstractConfig(KafkaConfig.configDef, props, doLog) with Logging {
  ...
  override def originals: util.Map[String, AnyRef] =
    if (this eq currentConfig) super.originals else currentConfig.originals
  ...
}

Where AbstractConfig is the following Java class:

public class AbstractConfig {
  ...
    public Map<String, Object> originals() {
        Map<String, Object> copy = new RecordingMap<>();
        copy.putAll(originals);
        return copy;
    }
  ...
}

The following code compiles in Scala 2 but not in Scala 3.

class ReplicaManager(val config: KafkaConfig,
  ...
  ) {
  ...
  protected def createReplicaSelector(): Option[ReplicaSelector] = {
    config.replicaSelectorClassName.map { className =>
      val tmpReplicaSelector: ReplicaSelector = CoreUtils.createObject[ReplicaSelector](className)
      tmpReplicaSelector.configure(config.originals())
      tmpReplicaSelector
    }
  }
  ...
}

The error comes from the Scala 3 compiler assuming the extra parenthesis in the originals call refer to the return type which happens to be a Map. Calling () on an object in Scala is translated to apply(). This method doesn't exist, resulting in a compiler error.
Scala 2 was either smart enough or forgiving enough to guess that the extra parenthesis referred to the method originals.

Shadowing

Shadowing refers to a circumstance where declaring a name that previously existed in a given scope renders the previous name invisible.

Scala 2 let developers shadow previously defined names in the scope, but Scala 3 became more strict in that area and considers it a double definition. One generic example can be seen here:

class Shadow (shadowedName: String) {

  def shadowedName(): String = shadowedName

}

Too many automatic conversions

The following code works on Scala 2 but fails to compile on Scala 3.

class TooManyImplicitsAtOnce () {

  val shortNumber1: Short = 1
  val shortNumber2: Short = 4

  val range = shortNumber1 to shortNumber2

}

The reason is that Scala 3 seems less prone than Scala 2 to apply several implicit conversions or extension methods in a row. Note that Scala 3 got a bit of an overhaul in how implicits work[^4]. What happened in Scala 2 was the following: the Short provided was automatically widen to an Int, because the method to is not present, and the compiler is looking for possible implicit conversions in scope to satisfy the compilation. Scala 2 founds a conversion that wraps an Int to a RichInt which offers the convenience of the to method.

Problems with bytecode generation

One of the historical weak points for Scala was the usage of Scala code from Java. Lots of improvements happened in this area during the last years and Scala 2.13 managed to have an excellent Scala accessibility from within Java code. However, some of those improvements seem to have been lost while Scala 3 was developed (remember it was developed in parallel with Scala 2).

In the vast majority of the cases, a work-around was possible and only minimal changes in the Java code were needed to successfully migrate the codebase. The nature of those problems are deeply technical and probably worth of a separate blog post.

One of the main reasons why those bugs were discovered now and not during the development of Scala 3 is that any new Scala version is tested against a corpus of projects which are almost completely written in Scala. This means that the Quality Assurance tests for new Scala releases might overlook some aspects like the Scala to Java interoperability. Most of the Open Source projects in Scala are either completely written in Scala or only a thin API layer is written in Java to offer a good usability for Java clients.

Apache Kafka is undoubtedly the biggest project whose code is mostly written in Java to have attempted migration to Scala 3. This means Kafka has a lot of Java code using Scala code and this rely on deep internals of how Scala's bytecode is generated.

Thanks to this task of evaluating the migration to Scala 3 we managed to discover several usability problems that are (or will be) fixed in Scala 3 future versions.

Find out more

If you want to see in deeper detail what the migration involved you can take a look at this draft PR were all the discoveries are documented: PR-11350. Also you can read the thread in the mailing list discussing this migration.

The Aiven blog also has lots of Apache Kafka related content.

Footnotes

[1]: Any artifact compiled with Scala 2.13.4 would be able to be consumed by the Scala 3 compiler.

[2]: As of Apache Kafka 3.1.0.

[3]: Scala Center is the non-profit organization that works on Scala tooling as well as education content for Scala.

[4]: You can read more here

Wrapping up

Not using Aiven for Apache Kafka® and our other services yet? Sign up now for your free trial at https://console.aiven.io/signup!

In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.

Table of contents