Custom dictionary files
Custom dictionary files are user-defined files that enhance query analysis and improve search relevance in OpenSearch. By adding domain-specific vocabulary and rules, these files refine search results to be more accurate and relevant.
Custom dictionary files are categorized into three types:
- Stopwords: Exclude common words like "the" and "is" to refine search results.
- Synonyms: Equate similar terms, such as "car" and "automobile," to improve query matching.
- WordNet: Provide semantic relationships between words, such as synonyms and antonyms.
Ensure your custom dictionary files are in plain text (UTF-8 encoded) format.
Upload files
Upload new custom dictionary files to your OpenSearch service.
- Console
- CLI
- Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.
- Click Indexes on the sidebar.
- Click Upload file in the Custom dictionary files section.
- In the Upload a custom dictionary file screen:
- Select File type (Stopwords, Synonyms, WordNet).
- Enter a File name.
- Choose the file from your system and click Upload.
Run:
avn service custom-file upload --project PROJECT_NAME \
--file_type <stopwords|synonyms|wordnet> \
--file_path <file_path> \
--file_name <file_name> SERVICE_NAME
Parameters:
PROJECT_NAME
: Your Aiven project name.<stopwords|synonyms|wordnet>
: The type of dictionary file to upload.<file_path>
: Path to the local file on your system.<file_name>
: The name of the file to appear in Aiven for OpenSearch.SERVICE_NAME
: Name of your OpenSearch service.
List files
List all custom dictionary files associated with your OpenSearch service.
- Console
- CLI
In the Aiven Console, the Custom Dictionary Files section displays all uploaded custom dictionary files, including details such as the file path, type, size, and the most recent upload timestamp.
Run:
avn service custom-file list --project PROJECT_NAME SERVICE_NAME
Parameters:
PROJECT_NAME
: Your Aiven project name.SERVICE_NAME
: Name of your OpenSearch service.
Replace files
Once you upload a custom dictionary file, you can only replace it, not delete it. To update an existing custom dictionary file, replace it with a new file containing the updated words.
- Console
- CLI
- Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.
- Click Indexes on the sidebar.
- In the Custom dictionary files section, locate the desired file.
- Click Actions > Replace file.
- Choose the new file from your system and click Upload.
Run:
avn service custom-file update --project PROJECT_NAME \
--file_path <file_path> \
--file_id <file_id> SERVICE_NAME
Parameters:
PROJECT_NAME
: Your Aiven project name.<file_path>
: Path to the local file on your system.<file_id>
: ID of the file to replace. Obtain this ID using the List command.SERVICE_NAME
: Name of your OpenSearch service.
Download files
Download a custom dictionary file to your local system.
- Console
- CLI
- Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.
- Click Indexes on the sidebar.
- In the Custom dictionary files section, locate the desired file.
- Click Actions > Download.
- Choose you location and click Save.
Run:
avn service custom-file get --project PROJECT_NAME \
--file_id <file_id> \
--target_filepath <file_path> \
--stdout_write SERVICE_NAME
Parameters:
PROJECT_NAME
: Your Aiven project name.<file_id>
: ID of the file to replace to download. Obtain this ID using the List command.<file_path>
: Path where the file should be saved locally.SERVICE_NAME
: Name of your OpenSearch service.
Limitations
- This feature requires Aiven Enterprise.
- Files cannot be deleted. They can only be replaced.
- The file location is fixed and cannot be customized.
- If you move to a different cloud or project, files are copied or moved accordingly.
- For OpenSearch Cross-Cluster Replication (CCR), files must be uploaded to both services manually.
- Use alphanumeric characters and underscores only for file names.
Example: How to use custom dictionary files with indexes
After uploading a custom dictionary file, you can use it in your index settings by specifying custom filters or analyzers. This example demonstrates how to create an index that uses a custom stopwords file.
Create a stopwords file
Create a file named demo_stopwords.txt
with your stopwords.
a
fox
jumps
the
EOF
Upload the stopwords file
Upload this file using the Aiven Console or CLI.
Create an index that uses the stopwords file
Create an index using the stopwords file via the OpenSearch Dashboards or the API.
- OpenSearch Dashboards
- API
-
Log in to the Aiven Console, select your project, and select your Aiven for OpenSearch service.
-
Access the OpenSearch Dashboards tab in the Connection information section.
-
Use the Service URI to access OpenSearch Dashboards in a browser.
-
Log in with the provided User and Password.
-
Click Index Management > Indices > Create Index.
-
Enter the details for the index.
-
Expand the Advanced settings section and insert the following JSON configuration to use the stopwords file:
{
"index.analysis.analyzer.default.filter": [
"custom_stop_words_filter"
],
"index.analysis.analyzer.default.tokenizer": "whitespace",
"index.analysis.filter.custom_stop_words_filter.ignore_case": "true",
"index.analysis.filter.custom_stop_words_filter.stopwords_path": "custom/stopwords/nofox",
"index.analysis.filter.custom_stop_words_filter.type": "stop",
"index.number_of_replicas": "1",
"index.number_of_shards": "1"
} -
Click Create.
Alternatively, you can use the API by replacing ${SERVICE_URL}
with your service URL
and running the following command:
curl -X PUT -H "Content-Type: application/json" -d'{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "whitespace",
"filter": ["custom_stop_words_filter"]
}
},
"filter": {
"custom_stop_words_filter": {
"type": "stop",
"ignore_case": true,
"stopwords_path": "custom/stopwords/nofox"
}
}
}
}
}' ${SERVICE_URL}/demo-index?pretty
Verify the stopwords filter
Verify the stopwords filter by using the _analyze
API.
- OpenSearch Dashboards
- API
- Go to Dev Tools in OpenSearch Dashboards.
- Use the
_analyze
API to verify that the stopwords filter is working.
POST customdictionarytest/_analyze
{
"text": "a quick brown fox jumps over the lazy dog"
}
Alternatively, use the API by replacing ${SERVICE_URL}
with your service URL and
running the following command:
curl -H 'Content-Type: application/json' -d'{
"text": "a quick brown fox jumps over the lazy dog"
} ' ${SERVICE_URL}/demo-index/_analyze?pretty