Word Searches and the Optic API
Exploring how they can be used together
Introduction
MarkLogic’s Optic API provides a powerful, composable interface for querying structured, semantic, and document data. This document focuses on how to use the full range of full-text search features available in the cts
API within Optic queries. While the examples here use cts.wordQuery
for clarity, you can apply the same principles with other cts
full-text search functions depending on your application's needs. By combining these full-text search capabilities with the Optic API, you can blend unstructured search with structured analytics—unlocking new possibilities for applications like the llamaverse, where you may want to find llamas by name, description, or other text fields.
What are word queries
MarkLogic includes a rich set of query constructors used for querying words and phrases. For this example, we are using a cts.wordQuery()
. The cts.wordQuery
function creates a query that matches documents or values containing specified words or phrases. It is highly efficient and supports stemming, wildcards, and other advanced search features. For the query below we have stemming enabled. Stemming matches word stems and then expands and finds other versions of the word. Example: Sing, sang, sung, singing. In this example, we will look for "sung" - the past participle form of sing. There are over 900 results.
This search uses a query to find documents in the "wild-llamas" collection that contain any form of the word "sang" (like "sing", "sung", "singing", etc.) due to stemming. Although there were our examples use fn.subsequence() to limit the results to just three documents for display purposes and no toArray() - but thats a discussion for another day.
fn.subsequence(cts.search(cts.andQuery([
cts.collectionQuery("wild-llamas"),
cts.wordQuery("sang", "stemmed")
])),1,3)
{
"id": "8520c251-7abe-4eb4-a0e8-6706bf5c2397",
"name": "Marilyn",
"heightCm": 121,
"weightKg": 178,
"eyeColor": "Blue",
"hairColor": "Silver",
"breed": "Hybrid",
"placeOfBirth": "New Joseph, Saint Barthelemy",
"description": "Marilyn is a blue-eyed llama with silver hair, standing 121 cm tall. Originally from New Joseph, Saint Barthelemy, Marilyn enjoys hiking, dancing, singing. Known for their playful and energetic personality, Marilyn is a beloved member of the llama community.",
"relatedTo": {
"id": "992ebdc4-bed3-4978-bc53-7178992bffa7",
"relationship": "uncle"
}
}
,
{
"id": "24a25bb9-b35f-48b3-b01c-d8f4a64d7a2d",
"name": "Sean",
"heightCm": 195,
"weightKg": 134,
"eyeColor": "Brown",
"hairColor": "Red",
"breed": "Suri",
"placeOfBirth": "West Tonyaland, Belgium",
"description": "Sean is a brown-eyed llama with red hair, standing 195 cm tall. Originally from West Tonyaland, Belgium, Sean enjoys reading, wood carving, singing. Known for their playful and friendly personality, Sean is a beloved member of the llama community."
}
,
{
"id": "58fc1645-17d2-4732-aded-1e88f637f967",
"name": "Kerry",
"heightCm": 160,
"weightKg": 199,
"eyeColor": "Brown",
"hairColor": "Brown",
"breed": "Hybrid",
"placeOfBirth": "Hendersonchester, United Arab Emirates",
"description": "Kerry is a brown-eyed llama with brown hair, standing 160 cm tall. Originally from Hendersonchester, United Arab Emirates, Kerry enjoys singing, dancing, knitting. Known for their friendly and friendly personality, Kerry is a beloved member of the llama community."
}
Using word related queries in Optic where clauses
You can use cts word queries in the where
clause of an Optic plan to filter results. This is simply going back to the fundemental point that most f what is done in all of the search features of MarkLogic are just filtering fragments. This is especially useful when working with TDE views or when you want to combine structured and unstructured search. This example uses the Llamaverse dataset related to wild llamas.
This example find s all the wild llamas of the Huacaya breed that have sung. The where
clause includes both a structured filter on the breed
column and a full-text search using cts.wordQuery
to find documents containing the word "sung" (with stemming enabled). The results are limited to 3 for display purposes.
const op = require('/MarkLogic/optic');
op.fromView('llamaverse', 'wildLlamas', "wildLlamas")
.where(op.eq(op.viewCol("wildLlamas", "breed"), "Huacaya"))
.where(cts.wordQuery("sung", "stemmed"))
.limit(3)
.result();
{
"wildLlamas.id": "f36c9e79-c293-48f7-99b6-67526ce6a9f3",
"wildLlamas.name": "Seth",
"wildLlamas.heightCm": 145,
"wildLlamas.weightKg": 142,
"wildLlamas.eyeColor": "Green",
"wildLlamas.hairColor": "Silver",
"wildLlamas.breed": "Huacaya",
"wildLlamas.placeOfBirth": "Port Tinastad, Oman",
"wildLlamas.medicalCondition": "none",
"wildLlamas.description": "Seth is a green-eyed llama with silver hair, standing 145 cm tall. Originally from Port Tinastad, Oman, Seth enjoys singing, gardening, knitting. Known for their curious and curious personality, Seth is a beloved member of the llama community.",
"wildLlamas.relatedTo": "c53e7eb9-4fba-4ec0-af20-caa7e60c9417"
}
,
{
"wildLlamas.id": "8f584ab8-2926-4ab3-b26a-1595e2f01d78",
"wildLlamas.name": "Susan",
"wildLlamas.heightCm": 153,
"wildLlamas.weightKg": 199,
"wildLlamas.eyeColor": "Blue",
"wildLlamas.hairColor": "Red",
"wildLlamas.breed": "Huacaya",
"wildLlamas.placeOfBirth": "North Jamie, Maldives",
"wildLlamas.medicalCondition": "none",
"wildLlamas.description": "Susan is a blue-eyed llama with red hair, standing 153 cm tall. Originally from North Jamie, Maldives, Susan enjoys singing, hiking, dancing. Known for their curious and calm personality, Susan is a beloved member of the llama community.",
"wildLlamas.relatedTo": "526cdbfe-f099-4632-bf49-725ecc028e04"
}
,
{
"wildLlamas.id": "9e86dbee-ae3a-40fa-955d-38992d51f501",
"wildLlamas.name": "David",
"wildLlamas.heightCm": 115,
"wildLlamas.weightKg": 150,
"wildLlamas.eyeColor": "Gray",
"wildLlamas.hairColor": "Gray",
"wildLlamas.breed": "Huacaya",
"wildLlamas.placeOfBirth": "East Joshuaton, Samoa",
"wildLlamas.medicalCondition": "none",
"wildLlamas.description": "David is a gray-eyed llama with gray hair, standing 115 cm tall. Originally from East Joshuaton, Samoa, David enjoys reading, knitting, singing. Known for their calm and energetic personality, David is a beloved member of the llama community.",
"wildLlamas.relatedTo": "70803261-6b44-47bc-bd3a-78c42c053806"
}
Where are records for 'sung'?
We now clearly see the power of unstemed word queries. The records above do not contain the word "sung" but they do contain "singing" or "sang". Howver, because we are currently unaware of the power of word search scoring, we do not know which records are the most relevant. This is a key difference between using cts.wordQuery
in the where
clause of an Optic plan versus using it with op.fromSearch()
, which we will explore next.
Using op-fromSearch() with cts.wordQuery for scored results
If you need access to the score for each result, you should use the same cts.wordQuery
as input to op.fromSearch()
. The op.fromSearch()
function automatically includes a special score
column in the results, allowing you to sort, filter, or display results by relevance.
Note that in this example, we used an inner join to combine the results from op.fromSearch()
with the wildLlamas
view. This allows us to filter the search results to only include llamas of the Huacaya breed while still leveraging the full-text search capabilities of cts.wordQuery
. The results are ordered by score in descending order, ensuring that the most relevant results appear first.
const op = require('/MarkLogic/optic');
op.fromSearch(
cts.andQuery([
cts.wordQuery("sung", "stemmed")
]), // The search query
['fragmentId', 'score'], // Columns to include
'singingLlamas', // Optional qualifier for column names
{ scoreMethod: 'logtfidf' } // Options for scoring - this is the default
)
.joinInner(
op.fromView('llamaverse', 'wildLlamas', "wildLlamas", [op.fragmentIdCol("fragmentId")])
.where(op.eq(op.viewCol("wildLlamas", "breed"), "Huacaya")),
op.on(op.viewCol("singingLlamas", "fragmentId"), op.viewCol("wildLlamas", "fragmentId"))
)
.orderBy(op.desc('score'))
.limit(3)
.result().toArray();
{
"wildLlamas.id": "0a6911e5-0d17-44e1-a114-cb747490f469",
"singingLlamas.score": 56832,
"wildLlamas.name": "Kyle",
"wildLlamas.heightCm": 175,
"wildLlamas.weightKg": 185,
"wildLlamas.eyeColor": "Gray",
"wildLlamas.hairColor": "Gray",
"wildLlamas.breed": "Huacaya",
"wildLlamas.placeOfBirth": "Lynnmouth, Myanmar",
"wildLlamas.medicalCondition": "none",
"wildLlamas.description": "Kyle is a gray-eyed llama with gray hair, standing 175 cm tall. Originally from Lynnmouth, Myanmar where he once sung for a group of prestegious penguins, knitting, hiking. Known for their playful and energetic personality, Kyle is a beloved member of the llama community.",
"wildLlamas.relatedTo": "96c897e6-dd56-40aa-af5d-27b0cd6bf7e0"
}
,
{
"wildLlamas.id": "45171014-977f-45d9-859c-87f5c8b5c6d1",
"singingLlamas.score": 26624,
"wildLlamas.name": "Anna",
"wildLlamas.heightCm": 111,
"wildLlamas.weightKg": 170,
"wildLlamas.eyeColor": "Amber",
"wildLlamas.hairColor": "Brown",
"wildLlamas.breed": "Huacaya",
"wildLlamas.placeOfBirth": "Webbton, Austria",
"wildLlamas.medicalCondition": "none",
"wildLlamas.description": "Anna is a amber-eyed llama with brown hair, standing 111 cm tall. Originally from Webbton, Austria, Anna enjoys hiking, cooking, singing. Known for their gentle and gentle personality, Anna is a beloved member of the llama community.",
"wildLlamas.relatedTo": "50e324fc-6ac4-4d91-a7ed-f228d04f46db"
}
,
{
"wildLlamas.id": "2e7647fd-1788-4a96-b3ad-b11b4ae0f8e6",
"singingLlamas.score": 26624,
"wildLlamas.name": "Jason",
"wildLlamas.heightCm": 131,
"wildLlamas.weightKg": 142,
"wildLlamas.eyeColor": "Gray",
"wildLlamas.hairColor": "Silver",
"wildLlamas.breed": "Huacaya",
"wildLlamas.placeOfBirth": "Jenniferchester, Russian Federation",
"wildLlamas.medicalCondition": "none",
"wildLlamas.description": "Jason is a gray-eyed llama with silver hair, standing 131 cm tall. Originally from Jenniferchester, Russian Federation, Jason enjoys cooking, singing, painting. Known for their calm and mischievous personality, Jason is a beloved member of the llama community.",
"wildLlamas.relatedTo": "b4753781-cb43-4794-9b1e-ff4456b42928"
}
With the above results, we can see that the llama named "Kyle" has the highest score for the word "sung", indicating that his description is most relevant to the search term. The other two llamas, "Anna" and "Jason", have lower scores but are still relevant due to the presence of related words like "singing".
Conclusion
By integrating cts.wordQuery
—or any of the related full-text search functions from the cts
API—with the Optic API, you can combine the power of full-text search with structured, relational-style analytics. Remember that using these queries in a where
clause does not provide a score, but using them with op.fromSearch()
does. This distinction is important for building effective search-driven applications in MarkLogic, and you can leverage any of the full-text search functions that best fit your use case.
- Introduction
- What are word queries
- Where are records for 'sung'?
- Using op-fromSearch() with cts.wordQuery for scored results
- Conclusion