Filtered vs Unfiltered Searches
Understanding Search Behavior in MarkLogic
Extended Llama Lesson
This is not a typical article. It is an Extended Llama Lesson — a deep dive that takes you further into a topic than you would normally be able to go on your own. The insights here come from years of real-world experience working with MarkLogic at scale: the kind of knowledge that only surfaces when you have hit the edges, debugged the unexpected, and worked through the harder corners of a platform. We hope it saves you some of that journey.
If any of this raises questions about your own implementation — or if you need expert help with search performance, database configuration, or anything else MarkLogic — reach out to us. This is exactly the kind of work we specialise in.
Understanding when MarkLogic uses filtered versus unfiltered searches is essential for both accuracy and performance. The default behaviour varies by function, which surprises many developers. A filtered search uses indexes to find candidates, then opens each document to confirm the match. An unfiltered search relies solely on index resolution — faster, but able to return false positives.
cts:search() defaults to filtered; cts:uris() and other lexicon-based functions default to unfiltered. cts:search() returns full documents where accuracy matters; lexicon functions are optimised for speed. Path expressions in XQuery (like fn:doc(...)/parent/child) are a special case — when used as the root context in cts:search(), "filtered" and "unfiltered" produce materially different results. Unfiltered returns the first node in the fragment regardless of whether it genuinely matches; filtered validates each node and returns only genuine matches. The MarkLogic docs warn that unfiltered searches should only be performed at top-level nodes or fragment roots, and that path expressions below the fragment root can produce unexpected answers and stop after the first matching node per fragment.
Unfiltered is not a performance option — it is a decision to trust index resolution. That trust is reasonable for many workloads, but becomes a material risk with proximity or near queries, phrase queries (especially three or more words), wildcard queries, case- or diacritic-sensitive searches, geospatial precision requirements, reverse geospatial queries such as polygon containment (which can return the entire dataset as false positives when unfiltered), negation, repeated substructures within fragments, searches below fragment root, or index transition states during a reindex. For business-critical retrieval, audit, compliance, or migration workflows, treat filtered execution as the safer default unless the query shape, database configuration, and index coverage are known to support accurate unfiltered resolution.
One instructive edge case: cts:column-range-query() — a query constrained by a TDE column. cts:uris((), (), cts:column-range-query("schema", "view", "column", "=", "value")) returns the expected URIs. cts:search(fn:collection(), cts:column-range-query("schema", "view", "column", "=", "value")) with the same query may return zero results. Query tracing shows that cts:search() did correctly resolve the matching fragments — then discarded all of them in the filter pass. The likely cause is that the SPARQL resolution powering cts:column-range-query() interacts with filtering in a way that eliminates every candidate. The fix is to pass "unfiltered" to cts:search(). This is the kind of behaviour that costs real debugging time if you do not know it exists.
Filtered Searches
How Filtered Search Works
When a search runs in filtered mode, MarkLogic performs a two-phase operation:
- Index resolution — the query is evaluated against the indexes to produce a set of candidate document fragments.
- Filtering — each candidate is opened and examined against the query to confirm it genuinely matches.
This guarantees accuracy at the cost of document retrieval. In most transactional workloads this is the right trade-off, but in high-throughput read scenarios the per-document cost can add up.
Sample Data
The examples in Sections 1 and 2 assume the llamaverse (v2.0+) is deployed. The llamaverse contains JSON documents for each llama resident, stored in the llamaverse collection. Raw llama documents follow the URI pattern /cleverllamas/llamaverse/raw/wild-llamas/llamas/{uuid}.json. We use the cleverllamas-llama user to read the data, as recommended.
The llamaverse sample data is freely available from github.com/cleverllamas/llamaverse — see the llamaverse article for full setup instructions.
The path expression examples in Section 3 use the llamaverse movement history document — a single JSON document at /cleverllamas/llamaverse/content/llama-movement/llama_location_history.json that records 3,360 GPS movement entries for multiple llama residents. An excerpt is shown inline in each example.
The filtered-versus-unfiltered path expression example uses Bradley's health record — an XML document introduced in llamaverse 2.0. The document structure is shown in full in the code group below.
cts:search() — Filtered by Default
cts:search() is filtered unless you explicitly say otherwise. MarkLogic resolves index candidates and then opens each document to confirm it genuinely matches before returning it.
Here we use a cts:near-query() to search for llama profiles where the words "llama" and "cooking" appear within 1 word of each other. Every llama profile contains both words, but they are always separated by many words in the description text — so the filter step removes every candidate:
xquery version "1.0-ml";
(: FILTERED search (the default). :)
(: MarkLogic uses the word index to find candidates, then opens each :)
(: document to verify the match. Only documents where "llama" and :)
(: "cooking" genuinely appear within 1 word of each other are returned. :)
let $query := cts:near-query((
cts:word-query("llama"),
cts:word-query("cooking")
), 1)
for $doc in cts:search(fn:collection("llamaverse"), $query)
return fn:string($doc/name)
(: empty sequence :)
(: The filter re-checks each candidate document and confirms that :)
(: "llama" and "cooking" are never within 1 word of each other. :)
(: No false positives reach the caller. :)
Explicitly Requesting Unfiltered
Passing "unfiltered" skips the document-opening step entirely. MarkLogic returns whatever the word index resolves to. Using the same near-query, every document that contains both "llama" and "cooking" is returned — regardless of how far apart those words actually are:
xquery version "1.0-ml";
(: UNFILTERED search. :)
(: MarkLogic returns whatever the word index resolves to, without :)
(: opening any document to verify proximity. :)
(: Every document containing both "llama" and "cooking" is a candidate — :)
(: and every one of those is a false positive. :)
let $query := cts:near-query((
cts:word-query("llama"),
cts:word-query("cooking")
), 1)
for $doc in cts:search(fn:collection("llamaverse"), $query, "unfiltered")[1 to 5]
return fn:string($doc/name)
"Aaron"
"Angela"
"Anthony"
"Bradley"
"Chloe"
(: ... 1,015 more llama names :)
(: All 1,020 documents containing both "llama" and "cooking" are returned.:)
(: The words exist in every document but are never adjacent — every :)
(: result here is a false positive. The filtered version returns zero. :)
Unfiltered Searches
How Unfiltered Search Works
Lexicon-based functions such as cts:uris(), cts:values(), and cts:collections() resolve purely from index data. No documents are opened at all. This makes them extremely fast, but the results reflect what the index contains, not necessarily what the documents currently contain.
cts:uris() — Unfiltered by Default
Here we retrieve the URIs of all llamaverse documents that contain the word "Bradley" — a llama resident of the sanctuary known for his love of poetry and chess:
xquery version "1.0-ml";
(: cts:uris() is UNFILTERED by default. :)
(: It resolves URIs entirely from the URI lexicon and word indexes — :)
(: no documents are opened. Very fast, but accuracy depends entirely on :)
(: the current state of the indexes. :)
cts:uris((), (), cts:and-query((
cts:collection-query("llamaverse"),
cts:word-query("Bradley")
)))
/cleverllamas/llamaverse/raw/wild-llamas/llamas/223a3aaa-3d69-4323-bdd9-a84e00f61631.json
Forcing Filtered Behaviour on cts:uris()
When accuracy matters more than speed, pass "filtered" as the fifth argument. MarkLogic will open and verify each candidate before including its URI in the results:
xquery version "1.0-ml";
(: FILTERED cts:uris() — more accurate, but each candidate document :)
(: must be opened and checked, so it is slower than the default. :)
cts:uris((), (), cts:and-query((
cts:collection-query("llamaverse"),
cts:word-query("Bradley")
)), (), "filtered")
/cleverllamas/llamaverse/raw/wild-llamas/llamas/223a3aaa-3d69-4323-bdd9-a84e00f61631.json
The difference becomes significant in large databases. Unfiltered results may include URIs for documents that matched at index time but no longer match — this commonly occurs with stemmed word forms or queries near fragment boundaries.
Path Expressions
Filtered and Unfiltered Produce Different Results Below the Fragment Root
When a path expression such as fn:doc(...)/parent/child is used as the root context in cts:search(), you are searching below the fragment root. In this situation, "filtered" and "unfiltered" do not behave the same way — and the difference is not a minor performance trade-off. Unfiltered actively returns the wrong answer.
Consider Bradley's health record — an XML document with two <checkup> nodes. Checkup 1 has vet=Dr. Aguilar, outcome=healthy. Checkup 2 has vet=Dr. Aguilar, outcome=treatment-required. We query for the checkup where outcome="treatment-required" — that is checkup 2. With "unfiltered", MarkLogic returns checkup 1: a false positive. It returns the first node in the fragment without verifying whether it genuinely satisfies the query. With "filtered", MarkLogic opens the document, validates each node, and correctly returns checkup 2.
<healthRecord llamaId="223a3aaa-3d69-4323-bdd9-a84e00f61631">
<checkup sequence="1">
<vet>Dr. Aguilar</vet>
<outcome>healthy</outcome>
<notes>Routine annual check. Weight within normal range. Fleece in excellent condition.</notes>
</checkup>
<checkup sequence="2">
<vet>Dr. Aguilar</vet>
<outcome>treatment-required</outcome>
<notes>Follow-up visit. Mild respiratory symptoms noted. Course of antibiotics prescribed.</notes>
</checkup>
</healthRecord>
xquery version "1.0-ml";
(: UNFILTERED search below the fragment root. :)
(: Bradley's health record contains two <checkup> nodes: :)
(: checkup 1: vet=Dr. Aguilar, outcome=healthy :)
(: checkup 2: vet=Dr. Aguilar, outcome=treatment-required :)
(: :)
(: We query for the checkup where outcome="treatment-required" — that :)
(: is checkup 2. With "unfiltered", MarkLogic returns the FIRST node in :)
(: the fragment regardless of whether it genuinely satisfies the query. :)
let $uri := "/cleverllamas/llamaverse/content/health-records/223a3aaa-3d69-4323-bdd9-a84e00f61631.xml"
let $query := cts:and-query((
cts:element-value-query(xs:QName("vet"), "Dr. Aguilar"),
cts:element-value-query(xs:QName("outcome"), "treatment-required")
))
return
cts:search(
fn:doc($uri)/healthRecord/checkup,
$query,
"unfiltered"
)
<checkup sequence="1">
<vet>Dr. Aguilar</vet>
<outcome>healthy</outcome>
<notes>Routine annual check. Weight within normal range. Fleece in excellent condition.</notes>
</checkup>
(: FALSE POSITIVE — checkup 1 is returned, but outcome="healthy", :)
(: not "treatment-required". MarkLogic returned the first node in the :)
(: fragment without checking whether it genuinely satisfies the query. :)
xquery version "1.0-ml";
(: The same query, using "filtered" (the default). :)
(: MarkLogic opens the document and validates each <checkup> node against :)
(: the query. Only the node that genuinely satisfies both conditions :)
(: is returned. :)
let $uri := "/cleverllamas/llamaverse/content/health-records/223a3aaa-3d69-4323-bdd9-a84e00f61631.xml"
let $query := cts:and-query((
cts:element-value-query(xs:QName("vet"), "Dr. Aguilar"),
cts:element-value-query(xs:QName("outcome"), "treatment-required")
))
return
cts:search(
fn:doc($uri)/healthRecord/checkup,
$query,
"filtered"
)
<checkup sequence="2">
<vet>Dr. Aguilar</vet>
<outcome>treatment-required</outcome>
<notes>Follow-up visit. Mild respiratory symptoms noted. Course of antibiotics prescribed.</notes>
</checkup>
(: CORRECT — checkup 2 is the only node that genuinely satisfies :)
(: both conditions. :)
Only the First Match Per Document is Returned
There is a related behaviour specific to unfiltered path expression searches: MarkLogic returns only the first matching node per fragment, even when multiple nodes in the same document match the query.
To illustrate this, consider the llamaverse movement history document. It contains 3,360 GPS entries across all llama residents — Aaron alone has 168 entries, each identified by his llamaId. Searching for Aaron's ID using an unfiltered path expression into the array returns only the first matching entry:
{
"envelope": {
"headers": {
"type": "llama-movement"
},
"instance": {
"llama-movement": [
{
"llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5",
"timestamp": "2025-05-18T12:00:00Z",
"coordinates": [ -9.9219172, 53.5136376 ]
},
{
"llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5",
"timestamp": "2025-05-18T13:00:00Z",
"coordinates": [ -9.9229262, 53.5143298 ]
},
{
"llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5",
"timestamp": "2025-05-18T14:00:00Z",
"coordinates": [ -9.9221079, 53.5136388 ]
}
]
}
}
}
xquery version "1.0-ml";
(: UNFILTERED cts:search() with a path expression root. :)
(: Even though Aaron has 168 movement entries in this document, :)
(: only the FIRST matching node is returned. This is unfiltered :)
(: behaviour below the fragment root — MarkLogic stops after the first :)
(: matching node per fragment. :)
let $uri := "/cleverllamas/llamaverse/content/llama-movement/llama_location_history.json"
let $query := cts:json-property-value-query(
"llamaId", "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5"
)
let $results :=
cts:search(
fn:doc($uri)/envelope/instance/array-node("llama-movement")/object-node(),
$query,
"unfiltered"
)
return (
fn:count($results), (: Returns 1, not 168 :)
$results (: Returns only the first matching movement entry :)
)
1
{"llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5", "timestamp": "2025-05-18T12:00:00Z", "coordinates": [-9.9219172, 53.5136376]}
This pattern — using fn:doc() directly and iterating with cts:contains() — is the recommended approach when you need node-level granularity within a document. Using a path expression as the cts:search() root is fine for document discovery when you need it to run unfiltered, but for collecting all matching nodes within a document, or for any use case where accuracy matters, it is not the right tool.
Configuration Settings That Affect Accuracy
Beyond the filtered and unfiltered options on individual queries, accuracy is also shaped by how the database is configured. The sections below summarise the configuration settings and query options where filtered versus unfiltered execution can materially affect result accuracy. The central pattern is consistent: filtered search validates candidate fragments; unfiltered search relies on index resolution, and the reliability of that resolution depends on the indexes that are available and how they are configured.
Search Execution Options
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
filtered / unfiltered | All cts:search, cts.search, Search API, REST/client APIs where options are passed through | Validates candidate fragments against the full query semantics. | Returns index-selected candidate fragments without validation. MarkLogic states that unfiltered search may return false positives, missed matches, or incorrect matches depending on the query, data structure, and database configuration. |
checked / unchecked | Phrase queries, proximity queries, positional matching | With checked, word positions are considered during index resolution. Filtering can still validate final results. | With unchecked, word positions are not considered during index resolution, increasing false-positive risk. MarkLogic states that unchecked searches can lead to false positives for phrases. |
Fragment Structure
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
| Fragment roots and searchable path expressions | Searches below document or fragment root; repeated elements; node-level result extraction | Filtering can identify the correct matching node or multiple matching nodes within a fragment. | Unfiltered search can return a candidate fragment even where the requested sub-node is not the real match; it can also miss matches where multiple candidate nodes exist in the same fragment. MarkLogic states that unfiltered searches should generally be performed on top-level nodes or fragment roots, because below-fragment searches can return unexpected answers and may stop after the first matching node per fragment. |
Position Indexes
Position indexes allow MarkLogic to validate word order and proximity during index resolution. Without them, unfiltered results for near and phrase queries are more likely to contain false positives.
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
word positions | cts:near-query, multi-word phrase search, ordered/unordered proximity | Filtering can examine the actual fragment text and validate proximity and phrase semantics. | Without suitable position indexes, index resolution may be less precise; unfiltered results can contain false positives for phrases or proximity. |
element word positions | cts:element-word-query, element phrase searches, element-scoped cts:near-query | Filtering can validate whether the element actually satisfies the phrase or proximity condition. | Element-scoped phrase and proximity queries are more exposed to false positives when unfiltered and position support is absent or not usable. |
element value positions | cts:element-value-query used in proximity contexts | Filtering can validate the actual element value context. | Unfiltered proximity involving element values depends more heavily on index-position availability. |
attribute value positions | cts:element-attribute-value-query; cts:element-query with attribute-query patterns | Filtering can validate the attribute context and surrounding query constraints. | Unfiltered resolution can be less precise where attribute-position information is unavailable. |
field value positions | cts:field-value-query, field value proximity | Filtering can validate actual field-value context. | Unfiltered field-value proximity depends on whether field value positions are available and appropriate. |
triple positions | cts:triple-range-query, cts:near-query involving triples, cts:element-query involving triples, cts:triples with item-frequency | Filtering can validate the complete document or fragment context. | Without triple positions, triple proximity and position-sensitive triple use cases may be less accurately resolved from indexes. |
Phrase Indexes and Boundary Settings
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
fast phrase searches | Phrase queries, especially multi-word phrases | Filtering can remove phrase false positives. | Unfiltered phrase searches — especially those with three or more words — can produce false positives when index resolution is not sufficiently precise. |
fast element phrase searches | Element-scoped phrase queries | Filtering can validate that the phrase occurs in the intended element context. | Unfiltered element phrase searches are more dependent on the element phrase index for precision. |
phrase-through | Phrase queries crossing markup boundaries | Filtering can validate phrase semantics according to configured phrase rules. | Unfiltered search relies on whether the correct phrase-through information was indexed at load or reindex time. |
element-word-query-through | cts:element-word-query over parent elements | Filtering can validate element-word-query semantics according to configured rules. | Unfiltered search depends on whether the element-word-query-through configuration is reflected in the indexes. |
Sensitivity Settings
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
fast case sensitive searches | Case-sensitive word and value queries | Filtering can remove candidates that match only in a different case. | Without the relevant index support, unfiltered case-sensitive searches can return case-insensitive false positives. MarkLogic gives an example where the index may nominate fragments containing dog even when the query requires Dog; filtering then removes the false-positive fragment. |
fast diacritic sensitive searches | Diacritic-sensitive word and value queries | Filtering can validate the actual diacritic-sensitive match. | Unfiltered diacritic-sensitive searches can contain false positives where the index resolution is broader than the requested sensitivity. |
| Punctuation-sensitive query behaviour | Punctuation-sensitive phrase and word queries | Filtering can validate actual punctuation-sensitive matches. | Unfiltered searches can return false positives when the index candidate matches the words but not the punctuation semantics. MarkLogic's example shows one! two three returning false positives against one two three under unfiltered search. |
| Word query includes / excludes | cts:word-query, cts:words, cts:word-match, positional contexts | Filtering can validate the final match, but query semantics may already be constrained by the word-query configuration. | Word-query exclusions can prevent MarkLogic from using positions — even where positions are enabled — leading to false positives in positional contexts such as near queries. |
Wildcard Searches
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
three character searches | Wildcard queries such as abc*x, *abc, a?bcd | Filtering can validate the final wildcard match. | Unfiltered wildcard searches can return candidates that satisfy index-token conditions but not the intended wildcard expression. |
two character searches | Short wildcard patterns | Filtering can validate final wildcard semantics. | Unfiltered search depends on character index granularity; insufficient wildcard index support can affect availability, performance, or precision. |
one character searches | Very short wildcard patterns | Filtering can validate final wildcard semantics. | Most exposed where broad wildcard expansion nominates many candidate fragments. |
trailing wildcard searches | Patterns such as abc* | Filtering can validate actual wildcard expansion. | Result quality depends on the trailing wildcard index and related word/token configuration. |
fast element trailing wildcard searches | Element-scoped trailing wildcard queries | Filtering can validate element scope and wildcard semantics. | Unfiltered element-scoped wildcard searches are more exposed to candidate-level false positives without suitable element wildcard indexes. |
three character word positions | Wildcard terms inside cts:near-query or multi-word phrase queries | Filtering can validate the actual phrase or proximity match. | Unfiltered wildcard-plus-proximity queries can be false-positive prone if wildcard position indexes are not available. |
trailing wildcard word positions | Trailing wildcard terms inside proximity or phrase queries | Filtering can validate actual distance and order. | Unfiltered trailing-wildcard proximity can be less precise without trailing wildcard word positions. |
word lexicon (especially codepoint collation) | Wildcard search expansion and lexicon-assisted wildcard resolution | Filtering can validate final matches. | Unfiltered wildcard accuracy and performance are more dependent on lexicon-backed expansion and the available wildcard indexes. |
lexicon-expansion-limit, limit-check, no-limit-check | Wildcarded word and value queries | Filtering can compensate for broad or approximate wildcard expansion. | In unfiltered search, wildcard expansion trade-offs can directly affect result accuracy. |
Tokenization
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
| Field tokenizer overrides | Field word queries, wildcard-enabled field queries, normalised identifiers such as phone numbers | Filtering can validate actual textual content after tokenization rules are applied. | Incorrect tokenizer configuration can cause broad candidate matches. MarkLogic provides an example where a three-character wildcard search on phone numbers returns unfiltered false positives, then shows field tokenizer overrides eliminating those false positives. |
Geospatial Searches
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
| Geospatial point indexes | cts:element-geospatial-query, cts:json-property-geospatial-query, point-in-region queries | Filtering can validate candidate matches against the geospatial predicate. | Unfiltered geospatial search relies more directly on geospatial index candidates; index scope, point format, coordinate system, and precision become more significant. |
| Geospatial region path index | cts:geospatial-region-query | Filtering can validate candidate fragments, but region query execution remains tied to the configured geospatial region index. | Region search is index-driven; incorrect precision, coordinate system, or reference configuration can materially affect the candidate set. |
| Coordinate system, precision, and tolerance | Geospatial point and region comparisons | Filtering can validate using configured geospatial semantics, but does not change the configured precision model. | Unfiltered search exposes index precision and tolerance decisions more directly. |
| Geospatial index scope and point format | KML, GeoJSON, element-child geospatial queries, path geospatial queries | Filtering can remove some false candidates, but an overly broad geospatial location path can still create business-level ambiguity. | Unfiltered search can surface false positives where the indexed coordinate location is too broad or does not distinguish point data from other region data. MarkLogic's geospatial example states that limiting scope to coordinates in a Point element prevents false positives from documents containing other kinds of regions. |
| Reverse geospatial queries (e.g. polygon / region containment) | cts:element-geospatial-query, cts:json-property-geospatial-query, and related region-containment queries where the point is tested against a region boundary | Filtering validates each candidate against the actual geospatial predicate, correctly excluding documents whose coordinates fall outside the region. | Unfiltered reverse geospatial queries — such as polygon containment tests — can return the entire dataset as false positives. This is not documented in MarkLogic's public documentation. Always use filtered search for reverse geospatial / polygon containment queries. A higher-performance alternative is a hybrid approach: split the query so that everything except the geospatial predicate runs as a standard unfiltered query to narrow the candidate set, store and index the polygons as geospatial region indexes, then combine the two with cts:and-query. This avoids scanning the full dataset while still validating coordinate containment accurately. Contact us if you need assistance designing this pattern for your use case. |
Negation Queries
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
Index accuracy for subqueries inside cts:not-query | cts:not-query | Even filtered searches can miss results if the negated query is not accurate from index resolution. | Unfiltered false positives inside the negated query can become false negatives in the overall result. MarkLogic states that cts:not-query is only guaranteed accurate if the negated query is accurate from index resolution. |
Index accuracy for the negative side of cts:and-not-query | cts:and-not-query | Filtering does not fully protect against inaccurate index resolution of the negative query. | False positives in the negative query can exclude valid results. MarkLogic states that cts:and-not-query can miss results even with filtered searches if the negative query has false positives. |
Position indexes for cts:not-in-query | cts:not-in-query | Filtered searches always have access to positions for validation. | Without positions enabled, unfiltered searches can produce surprising results and false positives. MarkLogic states that positions are required to accurately resolve cts:not-in-query from indexes. |
Reindexing and Operational State
| Setting / Option | Features Affected | Filtered Behaviour | Unfiltered Behaviour / Risk |
|---|---|---|---|
reindexer enable, reindex completion, lowest-common-denominator index state | All index-dependent queries after index configuration changes | Filtering can validate candidates, but cannot use index structures that are not yet consistently available. | Unfiltered search is especially exposed when old and new fragments are indexed differently, or when new index settings are not yet usable. MarkLogic states that after index or fragmentation changes, queries cannot take advantage of new settings until index settings meet the lowest-common-denominator criteria. |
Quick Reference
The tables below summarise the filtered/unfiltered default behaviour across MarkLogic's retrieval and query-constrained lexicon interfaces. Functions where filtered/unfiltered is not meaningful — such as direct URI reads via fn:doc, cts:doc, REST /v1/documents, and SPARQL endpoints — are excluded.
Core CTS Search
| Feature / Interface | Default | Notes |
|---|---|---|
cts:search | filtered | If neither filtered nor unfiltered is specified, the default is filtered. Source: cts.search. |
cts.search | filtered | Same default as cts:search; the JavaScript interface exposes the same semantics. Source: cts.search. |
| XPath search in XQuery | filtered | XPath resolution uses index resolution followed by filtering, like cts:search. Source: Understanding Unfiltered Searches. |
cts:contains | filtered | Checks whether supplied nodes or values match a query — used as the filtering step when paired with unfiltered index resolution. Source: cts:contains. |
cts.contains | filtered | Same operational role as cts:contains. Source: cts.contains. |
cts:registered-query | filtered* | Documented default is filtered, but filtered registered queries are not currently available — unfiltered is required in practice. Source: cts:registered-query. |
Search API
| Feature / Interface | Default | Notes |
|---|---|---|
search:search | filtered | Underlying CTS default is filtered; however the total estimate in the response is based on index resolution and is not filtered for accuracy. Source: Search API Options. |
search.search | filtered | Same as search:search. Source: search.search. |
search:resolve | filtered | Absent a search-option override, the CTS default applies. Source: Query Options. |
search.resolve | filtered | Same as search:resolve. Source: search.resolve. |
search:estimate | unfiltered | Result reflects index resolution of the search. Source: search:estimate. |
search.estimate | unfiltered | Same as search:estimate. Source: search.estimate. |
search:suggest | unfiltered | Suggestions are based on lexicon-backed constraints resolved through index resolution. Source: REST Suggest. |
search.suggest | unfiltered | Same as search:suggest. Source: search.suggest. |
return-facets | unfiltered | Facet values and counts are based on index resolution of the cts:query. Source: Search API. |
REST and Client APIs
REST Client API
| Feature / Interface | Default | Notes |
|---|---|---|
GET /v1/search | unfiltered | Default search behaviour for the REST Client API is unfiltered unless overridden with query options. Source: Release Notes. |
POST /v1/search | unfiltered | Same default as GET /v1/search. Source: Release Notes. |
GET /v1/qbe | unfiltered | The boolean filtered flag controls behaviour; unfiltered is the default. Source: Query By Example. |
POST /v1/qbe | unfiltered | Same QBE default as GET /v1/qbe. Source: Query By Example. |
GET /v1/values/{name} | unfiltered | Retrieves lexicon/range-index values or co-occurrences. Source: GET /v1/values/{name}. |
POST /v1/values/{name} | unfiltered | Same values/co-occurrence semantics as GET /v1/values/{name}. Source: POST /v1/values/{name}. |
Java Client API
| Feature / Interface | Default | Notes |
|---|---|---|
QueryManager.search | unfiltered | Predefined persistent query options are selected for performance; searches run unfiltered unless changed. Source: Java Query Options. |
QueryManager.values | unfiltered | Values queries are lexicon/range-index operations resolved through index resolution. Source: Java Searches. |
Node.js Client API
| Feature / Interface | Default | Notes |
|---|---|---|
DatabaseClient.documents.query | unfiltered | Node.js Client API searches default to unfiltered search. Source: Node.js Search. |
queryBuilder.byExample | unfiltered | QBE uses unfiltered search by default for performance; $filtered: true forces filtered. Source: Node.js Search. |
DatabaseClient.values.read | unfiltered | Lexicon/range-index queries resolved in the unfiltered/index-resolution manner. Source: Node.js Search. |
Server-Side JavaScript (JSearch)
| Feature / Interface | Default | Notes |
|---|---|---|
jsearch.documents | unfiltered | JSearch document search is unfiltered by default. Calling .filter() switches to filtered. Source: DocumentsSearch.filter. |
DocumentsSearch.filter | filtered | Explicitly inspects matched and ordered documents. Source: DocumentsSearch.filter. |
jsearch.values | unfiltered | Values operate over lexicons/range indexes via index resolution. Source: ValuesSearch. |
ValuesSearch.where | unfiltered | Query constraints scope values through index resolution rather than full document filtering. Source: ValuesSearch.where. |
jsearch.facets | unfiltered | Facets are generated from lexicon or range-index values. Source: FacetsSearch. |
FacetsSearch.where | unfiltered | Qualifies facet values using cts.query values; resolved via index resolution. Source: FacetsSearch. |
CTS Lexicons
Value Lexicons
| Feature / Interface | Default | Notes |
|---|---|---|
cts:values | unfiltered | Fragments are selected in the same manner as unfiltered cts:search. Source: cts:values. |
cts.values | unfiltered | Same as cts:values. Source: cts.values. |
cts:element-values | unfiltered | Same lexicon-constrained behaviour as cts:values. Source: cts.elementValues. |
cts.elementValues | unfiltered | Same as cts:element-values. Source: cts.elementValues. |
cts:element-attribute-values | unfiltered | Same lexicon-constrained behaviour as cts:values. Source: cts.elementAttributeValues. |
cts.elementAttributeValues | unfiltered | Same as cts:element-attribute-values. Source: cts.elementAttributeValues. |
cts:field-values | unfiltered | Field value lexicons follow the same query-constrained lexicon behaviour. Source: cts:field-values. |
cts.fieldValues | unfiltered | Same as cts:field-values. Source: cts.fieldValues. |
Word Lexicons
| Feature / Interface | Default | Notes |
|---|---|---|
cts:word-match | unfiltered | Query-constrained word-match fragments are selected in the same manner as unfiltered cts:search. Source: cts.wordMatch. |
cts.wordMatch | unfiltered | Same as cts:word-match. Source: cts.wordMatch. |
cts:element-word-match | unfiltered | Same word-lexicon behaviour. Source: cts:element-word-match. |
cts.elementWordMatch | unfiltered | Same as cts:element-word-match. Source: cts.elementWordMatch. |
URI and Collection Lexicons
| Feature / Interface | Default | Notes |
|---|---|---|
cts:uris | unfiltered | Fragments are selected as in unfiltered cts:search; no documents are opened. Source: cts:uris. |
cts.uris | unfiltered | Same as cts:uris. Source: cts.uris. |
cts:collections | unfiltered | Query-constrained collection lexicon results are selected in the same manner as unfiltered cts:search. Source: cts.collections. |
cts.collections | unfiltered | Same as cts:collections. Source: cts.collections. |
Tuples, Co-occurrences and Ranges
| Feature / Interface | Default | Notes |
|---|---|---|
cts:value-tuples | unfiltered | Fragments are selected in the same manner as unfiltered cts:search. Source: cts:value-tuples. |
cts.valueTuples | unfiltered | Same as cts:value-tuples. Source: cts.valueTuples. |
cts:value-co-occurrences | unfiltered | Same query-constrained lexicon behaviour as cts:value-tuples. Source: cts:value-co-occurrences. |
cts.valueCoOccurrences | unfiltered | Same as cts:value-co-occurrences. Source: cts.valueCoOccurrences. |
cts:value-ranges | unfiltered | Fragments are not filtered but selected in the same manner as unfiltered cts:search. Source: cts:value-ranges. |
cts.valueRanges | unfiltered | Same as cts:value-ranges. Source: cts.valueRanges. |
Triples
| Feature / Interface | Default | Notes |
|---|---|---|
cts:triples | unfiltered | Fragments are selected in the same manner as unfiltered cts:search. Source: cts.triples. |
cts.triples | unfiltered | Same as cts:triples. Source: cts.triples. |
Optic
| Feature / Interface | Default | Notes |
|---|---|---|
op:from-search | unfiltered | Constructs a row set from a cts:query; fragments are selected in the same manner as unfiltered cts:search. Source: op:from-search. |
op.fromSearch | unfiltered | Same as op:from-search. Source: op.fromSearch. |
Need Some Help?
Looking for more information on this subject or any other topic related to MarkLogic? Contact Us (info@cleverllamas.com) to find out how we can assist you with consulting or training!
- Filtered Searches
- How Filtered Search Works
- Sample Data
- cts:search() — Filtered by Default
- Explicitly Requesting Unfiltered
- Unfiltered Searches
- How Unfiltered Search Works
- cts:uris() — Unfiltered by Default
- Forcing Filtered Behaviour on cts:uris()
- Path Expressions
- Filtered and Unfiltered Produce Different Results Below the Fragment Root
- Only the First Match Per Document is Returned
- Configuration Settings That Affect Accuracy
- Fragment Structure
- Position Indexes
- Phrase Indexes and Boundary Settings
- Sensitivity Settings
- Wildcard Searches
- Tokenization
- Geospatial Searches
- Negation Queries
- Reindexing and Operational State
- Quick Reference
- Core CTS Search
- Search API
- REST and Client APIs
- CTS Lexicons
- Optic