Write search queries with OpenSearch® and NodeJS
Learn how the OpenSearch® JavaScript client gives a clear and useful interface to communicate with an OpenSearch cluster and run search queries. To make it more delicious we'll be using a recipe dataset from Kaggle.
Prepare the playground
You can create an OpenSearch cluster either with the visual interface or with the command line. Depending on your preference follow the instructions for getting started with the console for Aiven for Opensearch or see how to create a service with the help of Aiven command line interface.
You can also clone the final demo project from GitHub repository.
File structure and GitHub repository
To organise our development space we'll use these files:
config.js
to keep necessary basis to connect to the cluster,index.js
to hold methods which manipulate the index,helpers.js
to contain utilities for logging responses,search.js
for methods specific to search requests.
we'll be adding code into these files and running the methods from the command line.
Connect to the cluster and load data
Follow instructions on how to
connect to the cluster with a NodeJS client and add the necessary code to config.js
. Once you're
connected
load a sample data set and
retrieve the data mapping to understand the structure of the created index.
Extra helpers
To render the response, add the following helper method to your
helpers.js
file.
/**
* Parsing and logging list of titles from the result, used in callbacks.
*/
const logTitles = (error, result) => {
if (error) {
console.error(error);
} else {
const hits = result.body.hits.hits;
console.log(`Number of returned results is ${hits.length}`);
console.log(hits.map(hit => hit._source.title));
}
};
In the code snippets we'll keep error handling somewhat simple and use
console.log
to print information into the terminal.
Now you're ready to start querying the data.
Query the data
Now that we have data in the OpenSearch cluster, we're ready to
construct and run search queries. We will use search
method which is
provided by the OpenSearch JavaScript client. The following code goes
into search.js
, you'll need connection configuration and helpers
methods. Therefore, include them at the top of your search.js
file
with
const { client, indexName } = require("./config");
const { logTitles } = require("./helpers");
The search
method expects three optional parameters: params
,
options
and callback
.
The query details are into the params
object, which contains the name of the index
(index
), the maximum number of results to be returned (size
), if the
response is paginated (size
and from
), by which fields to sort the
data (sort
) and others.
we'll pay a closer attention to two of these parameters - q
- a query
defined in the Lucene query string syntax and body
- a query based on
Query DSL (Domain Specific Language). These are two main methods to
construct a query.
The query string syntax is a powerful tool which can be used for a variety of requests. It is especially convenient for cURL requests, since it is a very compact string. However, as the complexity of a request grows, it becomes more difficult to read and maintain these types of queries.
//example of using a query syntax
client.search({
index: 'recipes',
q: 'ingredients:broccoli AND calories:(>=100 AND <200)'
})
A query with a request body
might look bulky at first glance, but its
structure makes it easier to read, understand and modify the content.
Unlike q
, which expects a string, body
is an object allowing a
variety of granular parameters.
//example of using a request body
client.search({
index: indexName,
body: {
query: {
match: { property: 'value' }
}
}
})
Let's focus on Query DSL and its three main groups of requests: term-level, full-text and boolean. You will also see how to use the Lucene query string syntax inside Query DSL.
- Term-level queries are handy when we need to find exact matches for numbers, dates or tags and don't need to sort the results by relevance. Term-level queries use search terms as they are without additional analysis.
- Full-text queries allow a smarter search for matches in analysed text fields and return results sorted by relevance.
- Boolean queries are useful to combine multiple queries together. It
supports boolean clauses such as
must
,filter
,should
andmust_not
.
Find matching field values
One of the examples of a term-level query is searching for all entries
containing a particular value in a field. To construct a body request we
use term
property which defines an object, where the name is a field
and the value is a term we're searching in this field.
/**
* Searching for exact matches of a value in a field.
* run-func search term sodium 0
*/
module.exports.term = (field, value) => {
console.log(`Searching for values in the field ${field} equal to ${value}`);
const body = {
query: {
term: {
[field]: value,
},
},
};
client.search(
{
index: indexName,
body,
},
logTitles
);
};
run-func search term sodium 0
Try to replace "sodium" with other fields we have, such as "calories" or "fat".
Find fields with a value within a range
When dealing with numeric values, naturally we want to be able to search
for certain ranges of values. To find all documents that contain terms
in a specific field within a given range, use range
property. It
expects an object, where the name is set to the field name and the body
defines the upper and lower bounds: gt
(greater than), gte
(greater
than or equal to), lt
(less than) and lte
(less than or equal to).
/**
* Searching for a range of values in a field.
* run-func search range sodium 0 10
*/
module.exports.range = (field, gte, lte) => {
console.log(
`Searching for values in the ${field} ranging from ${gte} to ${lte}`
);
const body = {
query: {
range: {
[field]: {
gte,
lte,
},
},
},
};
client.search(
{
index: indexName,
body,
},
logTitles
);
};
run-func search range sodium 0 10
Try your own term query. How about a search for food with a particular rating value, or finding all meals with zero calories?
Find fields with fuzzy text matching
When searching for terms inside text fields, we can take into account
typos and misspellings. We measure such "deviations" by a minimum
number of single-character edits necessary to convert one word into
another. Such types of queries are called fuzzy
and the property
fuzziness
specifies the maximum edit distance.
/**
* Specifying fuzziness to account for typos and misspelling.
* run-func search fuzzy title pinapple 2
*/
module.exports.fuzzy = (field, value, fuzziness) => {
console.log(
`Search for ${value} in the ${field} with fuzziness set to ${fuzziness}`
);
const query = {
query: {
fuzzy: {
[field]: {
value,
fuzziness,
},
},
},
};
client.search(
{
index: indexName,
body: query,
},
logTitles
);
};
See if you can find recipes with misspelled pineapple 🍍
run-func search fuzzy title pinapple 2
Even though there is a typo in the word "pineapple", you still got
relevant results. Try other search terms and different values for
fuzziness
to understand better how fuzzy queries work. What is your
favourite food ingredient typo?
Find best match with multiple search words
A standard way to perform a full-text query is to use match
property
inside a request. match
expects an object, the name of which is set to
a specific field, and its body contains a search query in a form of a
string.
To see match
in action use the method below to search for "Tomato
garlic soup with dill".
/**
* Finding matches sorted by relevance.
* run-func search match title 'Tomato-garlic soup with dill'
*/
module.exports.match = (field, query) => {
console.log(`Searching for ${query} in the field ${field}`);
const body = {
query: {
match: {
[field]: {
query,
},
},
},
};
client.search(
{
index: indexName,
body,
},
logTitles
);
};
run-func search match title 'Tomato-garlic soup with dill'
In the response you should see different recipes of soups sorted by how close they are to "Tomato-garlic soup with dill" according to OpenSearch engine.
What are your favourite recipes? Try searching for them and see if you find some new and unusual recipe combinations.
Find matching phrases
When the order of the words is important, use match_phrase
instead of
match
. An additional power of match_phrase
is that it allows to
define how far search words can be from each other to still be
considered a match. This parameter is called slop
and its default
value is 0
. The format of match_phrase
is almost identical to
match
:
/**
* Specifying a slop - a distance between search words.
* run-func search slop directions "pizza pineapple" 10
*/
module.exports.slop = (field, query, slop) => {
console.log(
`Searching for ${query} with slop value ${slop} in the field ${field}`
);
const body = {
query: {
match_phrase: {
[field]: {
query,
slop,
},
},
},
};
client.search(
{
index: indexName,
body,
},
logTitles
);
};
We can use this method to find some recipes for pizza with pineapple. I
learned from my Italian colleague that this considered a combination
only for tourists, not a true pizza recipe. we'll do it by searching
the directions
field for words "pizza" and "pineapple" with
top-most distance of 10 words in between.
run-func search slop directions "pizza pineapple" 10
Oh look: "Pan-Fried Hawaiian Pizza" (don't tell my colleague).
So far all the requests we tried returned us at most 10 results. Why 10?
Because it is a default size
value. It can be increased by setting
size
property to a higher number when making the request. we'll
include this in the next example.
Search with query string syntax
Remember the Lucene query string syntax we talked about earlier, in
relation to q
parameter? We can also use it inside of Query DSL by
defining query_string
object. It requires its own query
parameter
and, optionally, we can specify default_field
or fields
properties
to indicate the search fields.
This example also sets size
to demonstrate how we can get more than 10
results.
/**
* Using special operators within a query string and a size parameter.
* run-func search query ingredients "(salmon|tuna) +tomato -onion" 100
*/
module.exports.query = (field, query, size) => {
console.log(
`Searching for ${query} in the field ${field} and returning maximum ${size} results`
);
const body = {
query: {
query_string: {
default_field: field,
query,
},
},
};
client.search(
{
index: indexName,
body,
size,
},
logTitles
);
};
To find recipes with tomato, salmon or tuna and no onion, run:
run-func search query ingredients "(salmon|tuna) +tomato -onion" 100
Now, experiment with your recipe search by including and excluding different ingredients.
Combine queries to improve results
The boolean clause types each affect the document relevance score
differently. Both must
and should
positively contribute to the
score, affecting the relevance of matches; must_not
sets the score to
0, ensuring that the document won't appear in the results. filter
clause is similar to must
, however it has no effect on the relevance
score.
In the next method we combine what we learned so far, using both term-level and full-search queries to find recipes to make a quick and easy dish, with no garlic, low sodium and high protein.
/**
* Combining several queries together
* run-func search boolean
*/
module.exports.boolean = () => {
console.log(
`Searching for quick and easy recipes without garlic with low sodium and high protein`
);
const body = {
query: {
bool: {
must: { match: { categories: "Quick & Easy" } },
must_not: { match: { ingredients: "garlic" } },
filter: [
{ range: { sodium: { lte: 50 } } },
{ range: { protein: { gte: 5 } } },
],
},
},
};
client.search(
{
index: indexName,
body,
},
logTitles
);
};
run-func search boolean
Now create your own boolean query, using what we learned to find recipes with particular nutritional values and ingredients. Experiment using different clauses to see how they affects the results.
Related pages
- Aggregation tutorial.
- Pausing the service.
- Demo repository. All the examples we run in this tutorial can be found in: