Skip to main content

Custom dictionary files enterprise

Custom dictionary files are user-defined files that enhance query analysis and improve search relevance in OpenSearch. By adding domain-specific vocabulary and rules, these files refine search results to be more accurate and relevant.

Custom dictionary files are categorized into three types:

  • Stopwords: Exclude common words like "the" and "is" to refine search results.
  • Synonyms: Equate similar terms, such as "car" and "automobile," to improve query matching.
  • WordNet: Provide semantic relationships between words, such as synonyms and antonyms.
note

Ensure your custom dictionary files are in plain text (UTF-8 encoded) format.

Upload files

Upload new custom dictionary files to your OpenSearch service.

  1. Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.
  2. Click Indexes on the sidebar.
  3. Click Upload file in the Custom dictionary files section.
  4. In the Upload a custom dictionary file modal:
    • Select File type (Stopwords, Synonyms, WordNet).
    • Enter a File name.
    • Choose the file from your system and click Upload.

List files

List all custom dictionary files associated with your OpenSearch service.

In the Aiven Console, all uploaded custom dictionary files are listed in the Custom dictionary files section, showing the file path, type, size, and latest upload timestamp.

Replace files

Once you upload a custom dictionary file, you can only replace it, not delete it. To update an existing custom dictionary file, replace it with a new file containing the updated words.

  1. Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.
  2. Click Indexes on the sidebar.
  3. In the Custom dictionary files section, locate the desired file.
  4. Click Actions > Replace file.
  5. Choose the new file from your system and click Upload.

Download files

Download a custom dictionary file to your local system.

  1. Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.
  2. Click Indexes on the sidebar.
  3. In the Custom dictionary files section, locate the desired file.
  4. Click Actions > Download.
  5. Choose you location and click Save.

Limitations

  • This feature requires Aiven Enterprise.
  • Files cannot be deleted. They can only be replaced.
  • The file location is fixed and cannot be customized.
  • If you move to a different cloud or project, files will be copied or moved accordingly.
  • For OpenSearch Cross-Cluster Replication (CCR), files must be uploaded to both services manually.
  • Use alphanumeric characters and underscores only for file names.

Example: How to use custom dictionary files with indexes

After uploading a custom dictionary file, you can use it in your index settings by specifying custom filters or analyzers. This example demonstrates how to create an index that uses a custom stopwords file.

Create a stopwords file

Create a file named demo_stopwords.txt with your stopwords.

a
fox
jumps
the
EOF

Upload the stopwords file

Upload this file using the Aiven Console or CLI.

Create an index that uses the stopwords file

Create an index using the stopwords file via the OpenSearch Dashboards or the API.

  1. Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.

  2. Access the OpenSearch Dashboards tab in the Connection information section.

  3. Use the Service URI to access OpenSearch Dashboards in a browser.

  4. Log in with the provided User and Password.

  5. Click Index Management > Indices > Create Index.

  6. Enter the details for the index.

  7. Expand the Advanced settings section and insert the following JSON configuration to use the stopwords file:

    {
    "index.analysis.analyzer.default.filter": [
    "custom_stop_words_filter"
    ],
    "index.analysis.analyzer.default.tokenizer": "whitespace",
    "index.analysis.filter.custom_stop_words_filter.ignore_case": "true",
    "index.analysis.filter.custom_stop_words_filter.stopwords_path": "custom/stopwords/nofox",
    "index.analysis.filter.custom_stop_words_filter.type": "stop",
    "index.number_of_replicas": "1",
    "index.number_of_shards": "1"
    }
  8. Click Create.

Verify the stopwords filter

Verify the stopwords filter by using the _analyze API.

  1. Go to Dev Tools in OpenSearch Dashboards.
  2. Use the _analyze API to verify that the stopwords filter is working.
POST customdictionarytest/_analyze
{
"text": "a quick brown fox jumps over the lazy dog"
}