Feb 22, 2023

Learn from the experts: BigData Boutique and Aiven talk about OpenSearch®

They had me at the panel title: “Lessons learned from maintaining 10K+ OpenSearch® clusters in production”

tibs-tony-ibbs — Tibs (Tony Ibbs)
|RSS Feed
Developer Educator at Aiven

The panel and its participants

On 17th January 2023, Aiven and BigData Boutique held an OpenSearch® Fireside Chat on Lessons learned from maintaining 10K+ OpenSearch® clusters in production, and the conversation illustrates how far OpenSearch® has come since the project first started in January 2021, just two years ago.

The seed for this session was planted at our Uptime conference last year, where BigData Boutique’s CTO talked about building a future-proof data serving layer.

BigData Boutique is a premium consulting firm focusing on Big Data technologies, known for their expertise in Elasticsearch and OpenSearch. Their panelists were:

Itamar Syn-Hershko, CTO and Founder, who has been working with OpenSearch and Elasticsearch since the very beginning (2010, give or take). Itamar leads a team of experts providing support for companies using OpenSearch all around the world.
Arkadii Chumachenko is one of those experts. An OpenSearch Support Engineer at BigData Boutique, Arkadii has been fiddling with computers since pre-school.
Lior Friedler is an OpenSearch expert and BigData Ops team lead at BigData Boutique. His day-to-day revolves around data modeling, scaling data infrastructure, and optimizing cost and performance of huge scale data platforms.

Aiven was represented by Andriy Redko, a member of our Open Source Program Office and an OpenSearch contributor

The panel was led by Lorna Mitchell, who at the time of the session was a Principal Developer Advocate at Aiven. Lorna is well known (amongst other things!) as a speaker, author, and Open Source specialist.

Overview

It's a rare opportunity to get this much experience on OpenSearch in one place. The panel does assume you already have some knowledge of OpenSearch or equivalent technologies; the panelists assume at least a basic awareness of how search technologies work, that systems need to be tuned, and that terms like "vector" can have a meaning in the text-based/linguistic world as well as in the mathematical sense.

On the other hand, if you're interested in knowing a bit more about why OpenSearch is more than just a different version of Elasticsearch, some nifty hints on tuning, why managed services may be the best answer (at least for now), and where the associated technologies may be going over the next few years, then this is the right talk.

My particular fascinations? I love text, and have a longstanding interest in text markup and manipulation, so OpenSearch has strong appeal to me in its document handling and indexing aspects. I'm also a fan of backup (I've run the backup system for a small company, and once wrote the backup system for an object oriented database). And I have an appreciation for technologies that stand the test of time. As we'll see below, all of those are clues that I might enjoy this panel 🙂.

My take

One way to consider the value of a panel is to review the topics that were discussed. Some of Lorna's leading questions in this session were:

When do people use OpenSearch when they should use something else (and does it matter?)
What do the panelists really like about the OpenSearch tech stack? and how have they seen it change during their use of it?
How widespread is awareness of OpenSearch, and is that linked to the availability of managed services?
What are the best practices for improving OpenSearch performance?
Finally, the participants were asked what developments they hoped for in the next five years.

That doesn’t really give the gist of the panel though, so here are some of the things that struck me.

Some fun quotes

“It's opt out of indexing, rather than opt in to indexing[, like other database systems]”
– Lior, 15:00

I think this one is important. OpenSearch is at heart a search engine, and so indexing is core to how it works. In a relational database, for instance, you can choose indexing of columns, but it’s a conscious choice, and the overheads are different.

“For developers, it’s extremely simple product to start working with”
– Andriy, 16:30

Apache Lucene® is an excellent core technology, in active development for more than 20 years (Elastic has an interesting page on its history) and OpenSearch provides the right level of abstraction to “hide” the complexity of Lucene and make it accessible., which is “a very big selling point for developers”.

“Don’t think that it’s a schemaless database”
– Lior, 23:50

This one is something people sometimes forget. There’s always a schema for documents, it’s just that OpenSearch will deduce one if you don’t specify it. And as you might expect, letting it “guess” what you mean can sometimes make for surprises.

“Don’t use super-expensive analyzers, unless you really need them”
– Lior, 24:18

I was particularly struck by the value of the specific answers to the "best practices" topic (although actually the discussion overlapped with the previous consideration of managed services). There was a clear feeling of the importance of (appropriate) monitoring, and a good discussion on the necessity of not only backing up data, but also checking that it can be restored again (so easy to forget!).

“My disadvantage is that I only see those systems when they have already been broken.
– Lior, 29.:59

Other thoughts

I was reassured to hear that, as the project matures, more of the sensible configuration choices are defaulting to "on" - this seems to me a sign of a project with a healthy approach to its users.

I did notice that there were several recurring topics, and I believe these all came up in the "what to look forward to in the future" section as well:

Providing even more of the core Apache Lucene functions, and tracking development in Lucene
Improving the cluster capabilities, and thus improving operation at scale
Looking towards serverless offerings
Support for even more (written) languages, and more integrations
Support for vector search - supporting search for a word and words related to it
Support for ML/AI

Which of those most excite me? Definitely vector search, which I expect to open up exciting possibilities, but also the provision of more and more integrations.

(For those who don't know what vector search is, I think that Text mining using vectors explained to business people by Federico Cesconi is a good place to start. It's at the wonderful intersection between text data, linguistics research, and mathematics.)

As the panel says, the future of OpenSearch looks very bright, and it’s going to be used in even more application spaces.

Apache Lucene is a trademark of the Apache Software Foundation in the United States and/or other countries.
Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.

Further reading