Skip to main content

Reindex Aiven for DataHub search and graph indices Limited availability

Rebuild your OpenSearch indices for search and graph data if your search results or relationship graphs differ from the data in your metadata database. This is useful:

  • After OpenSearch® data loss
  • When an index is corrupt or inconsistent
  • After wiping a cluster or re-provisioning the search backend
  • After a schema or mapping change that requires a full reindexing
  • For disaster recovery where SQL is intact, but OpenSearch is not

To reindex your indices, run the RestoreIndices upgrade task. This task rebuilds the indices from the source of truth metadata_aspect_v2 SQL table. It replays every aspect from the database back into search and graph stores.

You can run this at any time. Events are replayed asynchronously and existing reads keep working. Always test reindexing in a staging environment first.

Prerequisites

Get the URL for the GMS app:

  1. In your DataHub service, in the DataHub resources section, open the Aiven App that ends in -gms.
  2. In the Connection information section, copy the Application URL.

Run the restore indices task

  1. In your DataHub service, go to the DataHub resources section.

  2. Open the Aiven App that ends in -upgrade.

  3. In the Environment variables section, click Edit.

  4. On the Variables tab, add the following variables:

    KeyValueDescription
    UPGRADE_JOBRestoreIndicesThe restore indices task.
    KAFKA_SCHEMAREGISTRY_URLGMS_APP_URL/schema-registry/api/Queries Kafka topic schemas for re-emitting events.

    The GMS_APP_URL is the application URL for the GMS app.

  5. Optional: Add UPGRADE_JOB_ARGS to include additional arguments:

    ArgDescription
    -a cleanWipes each index before repopulating. Use when an index has stale documents that you don't want to carry over.
    -a batchSizeNumber of records per batch.
    -a urnBasedPaginationUses URN-based pagination instead of offset-based. Set to true for large datasets.
    -a aspectNamesComma-separated list of aspect names to reindex. Use to speed up partial recoveries. For example, aspectNames=datasetProperties,ownership.
    -a urnLikeSQL LIKE pattern to filter URNs. Use to target specific entity types. For example, urnLike=urn:li:dataset:% to reindex all datasets.
  6. Click Save.

    After setting the variables, the upgrade app restarts automatically. It's in the Building state until the reindexing completes.

  7. When the upgrade app is in the Powered off state, remove the variables you added.