How a single space broke OpenSearch backups - and how Aiven fixed it for our customers
One space character, one broken backup. See how Aiven’s engineering team traced an OpenSearch k-NN bug to its source and implemented a lasting fix.
Backing up your OpenSearch indexes via the Snapshot process is vital for disaster recovery, allowing you to restore the indexed data, cluster configuration and state if something goes wrong. Some teams will also run snapshots and restore processes when migrating data (for example between a development and production cluster) or during OpenSearch version upgrades. At Aiven, we helped one of our customers discover why their snapshots were failing - and it was all down to a single space character.
Aiven for Opensearch offers a managed search service, which includes the industry-grade features like HA and automatic failover, cross-cluster replication (CCR), integration with metrics via Prometheus, a number of plugins (for vector search and beyond), disaster recovery features like automatic in-region backups and backup to another region feature, granular access control, forking (for accelerated testing of new features and migrations), one-click version upgrades and maintenance updates. On top of this you can upgrade your cluster to another plan as you grow (or go back in size), and integrate with a host of downstream tools like Datadog Metrics, Amazon CloudWatch Logs and Metrics, Google Cloud Logging, Apache Kafka, Rsyslog and more.
An OpenSearch Dashboards eCommerce overview showing real-time KPIs (transactions per day, average order value, average items per order), revenue trends over time, and promotional funding breakdowns — all built on live data with no separate BI tool required.
Chasing errors with logs
Bugs like this can be hard to trace: snapshots can take time to create if there’s a lot of data in your index. OpenSearch can be configured to log errors, but it’s not always obvious what an error message actually means in practice. In this case, the error message produced was pretty verbose (note this is a recreation of the message, not our actual customer’s details!):
Loading code...
It looks like the underlying storage architecture is failing, because it can’t delete a file, and this is something to do with the filename itself. We can’t tell if the file has mysteriously disappeared, or changed name, or if something else is going on. Maybe it’s a network issue, or something to do with the storage system itself.
Digging in with experimentation
Our first experiment was to change where the backups were being made to, by changing the Amazon S3 volume. This briefly seemed to work - our first backup worked OK - but then the problem reappeared.
We then took a look at which file is actually being snapshotted here. From the shard name knn-index we can conclude it’s part of the structure that allows OpenSearch to provide efficient vector search using K-Nearest Neighbours. It looks like this file is failing to be deleted in some way. After a few more experiments we can tell that this is pretty consistent behaviour - e.g. it’s probably not a generic issue with OpenSearch’s storage backend, but specific to the KNN index.
Naming our backups
Time for a flash of inspiration - why might it be hard to delete this file? Let’s look at the filename - there’s something unusual there, a space character between ‘my’ and ‘vector’. This is perfectly allowable on Linux, Windows and other modern operating systems (although some say “Spaces in filenames aren't any problem, except when they are”). Maybe this is the issue?
It turns out that the OpenSearch storage engine was less tolerant of spaces in filenames than it could be, and the KNN plugin was happily allowing spaces to be used. The next question was how to fix the problem. As OpenSearch is open source, a bug was raised and later on we saw others had encountered the same problem.
A fix and a workaround
Of course even with a bug raised, in an open source project one cannot be sure when the issue will be fixed. The OpenSearch team are responsive however and the fix eventually ended up in OpenSearch v2.17. In the meantime we advised our customer to avoid spaces in index names - and since the problem often appeared when upgrading to OpenSearch v2.x, we added a guardrail to block these upgrades when a filename contained spaces.
Not every team has the time or knowledge to dive deeply into the open source code they’re using. Our goal at Aiven is to provide a highly reliable hosted OpenSearch service, making it as easy as possible for our customers to concentrate on their own offering without worrying about the underlying search engine. Our deep experience, experimental approach (and the occasional flash of inspiration) makes us a trusted partner and expert OpenSearch solution provider.
Did you know about our free trial of our fully managed Aiven for OpenSearch® service with $300 in free credits?
Sign up today or get in touch for more information - and you can also catch us in Prague at OpenSearchCon Europe on 16th & 17th April, where our expert Dmitry Kan will be talking about how we helped the Norwegian government migrate to managed OpenSearch.
Stay updated with Aiven
Subscribe for the latest news and insights on open source, Aiven offerings, and more.


