Exploring op.fromLexicons()

Querying lexicons in MarkLogic Optic API

personClever Llamas
CleverLlamasMinimum Llamaverse Version: 1.4
databaseMinimum MarkLogic Version: 9

Introduction

The op.fromLexicons() function allows you to source data directly from lexicons including varions indexes, URI, and collection lexicons. More details on lexicons can be found our lexicon article Understanding Lexicons.

Summary

Use op.fromLexicons() when you need to access indexed values efficiently, or when you want to join lexicon data with other sources in your plan. There are differences between accessing lexicons via optic compared to other accessors such as cts.values() and cts.coOccurences() and cts.tuples().

Parameters

NameDatatypeRequiredNotes
indexdefObjectYesAn object defining the lexicon columns to retrieve, where each key is a column name and each value is a cts.reference for the index.
qualifierStringNoAn optional qualifier for the view, used to specify name for the view at runtime. See Understanding Qualifiers
systemColsStringNoAn optional named fragment ID column. This will be explored with some of the join realted articles. See Understanding Fragment ID Columns for more details.

Setup

For this example, we have added a few additional indexes to make the example thorough. They ship with the Llamaverse.

Path Range Index

We have added a path range index for ID on the root(top-level). This will essentiall match all of our non-enveloped documents including the wild llamas.

Json Property Range Index

We have added a JSON property range index on the llama breed. Note that this is actually configured refencing and element. This is because the JSON property is just another node type and the user interfae uses the term element to refer to any node type. It should be noted this nuance is not universal. Most of Marklog

Comparison to cts.values()

Lets first compare the op.fromLexicons() to the cts.values() function. For the cts.values(), we get a unique list of lexicons regardless of hoiw many document fragments are attached to each one. We could filter this list based on other queries, but wothout a bit more programming and another function or two, you cannot actually see the documents related to each of the values.

Differently, the op.fromLexicons() function will return a plan where the link to all document-fragments is exposed and available. This also means that you get one row per document fragment rather than aunique values.

cts.values(cts.elementReference("breed"))
Huacaya
Hybrid
Suri
const op = require('/MarkLogic/optic');
op.fromLexicons({"breed"  : cts.jsonPropertyReference("breed")})
.result()

[
  {
  "breed": "Huacaya"
  }, 
  ... hundreds of other rows with Huacaya
  {
  "breed": "Huacaya"
  }, 
  {
  "breed": "Hybrid"
  }, 
  ... hundreds of other rows with Hybrid
  {
  "breed": "Suri"
  }, 
  {
  ... hundreds of other rows with Suri
  }
  .... thousands of rows in total
]

Example

const op = require('/MarkLogic/optic');
const plan = op.fromLexicons({
  LlamaID: cts.elementReference(xs.QName('LlamaID')),
  Name: cts.elementReference(xs.QName('Name')),
  Field: cts.elementReference(xs.QName('Field'))
})
.select(['LlamaID', 'Name', 'Field'])
.result();

This plan retrieves llama IDs, names, and their fields from range indexes.

Conclusion

op.fromLexicons() is ideal for high-performance access to indexed data, supporting advanced analytics and joins.
See also: optic-intro.md