Filtered vs Unfiltered Searches

Extended Llama Lesson

This is not a typical article. It is an Extended Llama Lesson — a deep dive that takes you further into a topic than you would normally be able to go on your own. The insights here come from years of real-world experience working with MarkLogic at scale: the kind of knowledge that only surfaces when you have hit the edges, debugged the unexpected, and worked through the harder corners of a platform. We hope it saves you some of that journey.

If any of this raises questions about your own implementation — or if you need expert help with search performance, database configuration, or anything else MarkLogic — reach out to us. This is exactly the kind of work we specialise in.

Understanding when MarkLogic uses filtered versus unfiltered searches is essential for both accuracy and performance. The default behaviour varies by function, which surprises many developers. A filtered search uses indexes to find candidates, then opens each document to confirm the match. An unfiltered search relies solely on index resolution — faster, but able to return false positives.

cts:search() defaults to filtered; cts:uris() and other lexicon-based functions default to unfiltered. cts:search() returns full documents where accuracy matters; lexicon functions are optimised for speed. Path expressions in XQuery (like fn:doc(...)/parent/child) are a special case — when used as the root context in cts:search(), "filtered" and "unfiltered" produce materially different results. Unfiltered returns the first node in the fragment regardless of whether it genuinely matches; filtered validates each node and returns only genuine matches. The MarkLogic docs warn that unfiltered searches should only be performed at top-level nodes or fragment roots, and that path expressions below the fragment root can produce unexpected answers and stop after the first matching node per fragment.

Unfiltered is not a performance option — it is a decision to trust index resolution. That trust is reasonable for many workloads, but becomes a material risk with proximity or near queries, phrase queries (especially three or more words), wildcard queries, case- or diacritic-sensitive searches, geospatial precision requirements, reverse geospatial queries such as polygon containment (which can return the entire dataset as false positives when unfiltered), negation, repeated substructures within fragments, searches below fragment root, or index transition states during a reindex. For business-critical retrieval, audit, compliance, or migration workflows, treat filtered execution as the safer default unless the query shape, database configuration, and index coverage are known to support accurate unfiltered resolution.

One instructive edge case: cts:column-range-query() — a query constrained by a TDE column. cts:uris((), (), cts:column-range-query("schema", "view", "column", "=", "value")) returns the expected URIs. cts:search(fn:collection(), cts:column-range-query("schema", "view", "column", "=", "value")) with the same query may return zero results. Query tracing shows that cts:search() did correctly resolve the matching fragments — then discarded all of them in the filter pass. The likely cause is that the SPARQL resolution powering cts:column-range-query() interacts with filtering in a way that eliminates every candidate. The fix is to pass "unfiltered" to cts:search(). This is the kind of behaviour that costs real debugging time if you do not know it exists.

Filtered Searches

How Filtered Search Works

When a search runs in filtered mode, MarkLogic performs a two-phase operation:

Index resolution — the query is evaluated against the indexes to produce a set of candidate document fragments.
Filtering — each candidate is opened and examined against the query to confirm it genuinely matches.

This guarantees accuracy at the cost of document retrieval. In most transactional workloads this is the right trade-off, but in high-throughput read scenarios the per-document cost can add up.

Sample Data

The examples in Sections 1 and 2 assume the llamaverse (v2.0+) is deployed. The llamaverse contains JSON documents for each llama resident, stored in the llamaverse collection. Raw llama documents follow the URI pattern /cleverllamas/llamaverse/raw/wild-llamas/llamas/{uuid}.json. We use the cleverllamas-llama user to read the data, as recommended.

The llamaverse sample data is freely available from github.com/cleverllamas/llamaverse — see the llamaverse article for full setup instructions.

The path expression examples in Section 3 use the llamaverse movement history document — a single JSON document at /cleverllamas/llamaverse/content/llama-movement/llama_location_history.json that records 3,360 GPS movement entries for multiple llama residents. An excerpt is shown inline in each example.

The filtered-versus-unfiltered path expression example uses Bradley's health record — an XML document introduced in llamaverse 2.0. The document structure is shown in full in the code group below.

cts:search() — Filtered by Default

cts:search() is filtered unless you explicitly say otherwise. MarkLogic resolves index candidates and then opens each document to confirm it genuinely matches before returning it.

Here we use a cts:near-query() to search for llama profiles where the words "llama" and "cooking" appear within 1 word of each other. Every llama profile contains both words, but they are always separated by many words in the description text — so the filter step removes every candidate:

xquery version "1.0-ml";

(: FILTERED search (the default).                                         :)
(: MarkLogic uses the word index to find candidates, then opens each      :)
(: document to verify the match. Only documents where "llama" and         :)
(: "cooking" genuinely appear within 1 word of each other are returned.   :)
let $query := cts:near-query((
  cts:word-query("llama"),
  cts:word-query("cooking")
), 1)
for $doc in cts:search(fn:collection("llamaverse"), $query)
return fn:string($doc/name)

(: empty sequence :)

(: The filter re-checks each candidate document and confirms that         :)
(: "llama" and "cooking" are never within 1 word of each other.           :)
(: No false positives reach the caller.                                   :)

Explicitly Requesting Unfiltered

Passing "unfiltered" skips the document-opening step entirely. MarkLogic returns whatever the word index resolves to. Using the same near-query, every document that contains both "llama" and "cooking" is returned — regardless of how far apart those words actually are:

xquery version "1.0-ml";

(: UNFILTERED search.                                                     :)
(: MarkLogic returns whatever the word index resolves to, without         :)
(: opening any document to verify proximity.                              :)
(: Every document containing both "llama" and "cooking" is a candidate — :)
(: and every one of those is a false positive.                            :)
let $query := cts:near-query((
  cts:word-query("llama"),
  cts:word-query("cooking")
), 1)
for $doc in cts:search(fn:collection("llamaverse"), $query, "unfiltered")[1 to 5]
return fn:string($doc/name)

"Aaron"
"Angela"
"Anthony"
"Bradley"
"Chloe"
(: ... 1,015 more llama names :)

(: All 1,020 documents containing both "llama" and "cooking" are returned.:)
(: The words exist in every document but are never adjacent — every       :)
(: result here is a false positive. The filtered version returns zero.    :)

Unfiltered Searches

How Unfiltered Search Works

Lexicon-based functions such as cts:uris(), cts:values(), and cts:collections() resolve purely from index data. No documents are opened at all. This makes them extremely fast, but the results reflect what the index contains, not necessarily what the documents currently contain.

cts:uris() — Unfiltered by Default

Here we retrieve the URIs of all llamaverse documents that contain the word "Bradley" — a llama resident of the sanctuary known for his love of poetry and chess:

xquery version "1.0-ml";

(: cts:uris() is UNFILTERED by default.                                   :)
(: It resolves URIs entirely from the URI lexicon and word indexes —      :)
(: no documents are opened. Very fast, but accuracy depends entirely on  :)
(: the current state of the indexes.                                      :)
cts:uris((), (), cts:and-query((
  cts:collection-query("llamaverse"),
  cts:word-query("Bradley")
)))

/cleverllamas/llamaverse/raw/wild-llamas/llamas/223a3aaa-3d69-4323-bdd9-a84e00f61631.json

Forcing Filtered Behaviour on cts:uris()

When accuracy matters more than speed, pass "filtered" as the fifth argument. MarkLogic will open and verify each candidate before including its URI in the results:

xquery version "1.0-ml";

(: FILTERED cts:uris() — more accurate, but each candidate document      :)
(: must be opened and checked, so it is slower than the default.         :)
cts:uris((), (), cts:and-query((
  cts:collection-query("llamaverse"),
  cts:word-query("Bradley")
)), (), "filtered")

/cleverllamas/llamaverse/raw/wild-llamas/llamas/223a3aaa-3d69-4323-bdd9-a84e00f61631.json

The difference becomes significant in large databases. Unfiltered results may include URIs for documents that matched at index time but no longer match — this commonly occurs with stemmed word forms or queries near fragment boundaries.

Path Expressions

Filtered and Unfiltered Produce Different Results Below the Fragment Root

When a path expression such as fn:doc(...)/parent/child is used as the root context in cts:search(), you are searching below the fragment root. In this situation, "filtered" and "unfiltered" do not behave the same way — and the difference is not a minor performance trade-off. Unfiltered actively returns the wrong answer.

Consider Bradley's health record — an XML document with two <checkup> nodes. Checkup 1 has vet=Dr. Aguilar, outcome=healthy. Checkup 2 has vet=Dr. Aguilar, outcome=treatment-required. We query for the checkup where outcome="treatment-required" — that is checkup 2. With "unfiltered", MarkLogic returns checkup 1: a false positive. It returns the first node in the fragment without verifying whether it genuinely satisfies the query. With "filtered", MarkLogic opens the document, validates each node, and correctly returns checkup 2.

<healthRecord llamaId="223a3aaa-3d69-4323-bdd9-a84e00f61631">
  <checkup sequence="1">
    <vet>Dr. Aguilar</vet>
    <outcome>healthy</outcome>
    <notes>Routine annual check. Weight within normal range. Fleece in excellent condition.</notes>
  </checkup>
  <checkup sequence="2">
    <vet>Dr. Aguilar</vet>
    <outcome>treatment-required</outcome>
    <notes>Follow-up visit. Mild respiratory symptoms noted. Course of antibiotics prescribed.</notes>
  </checkup>
</healthRecord>

xquery version "1.0-ml";

(: UNFILTERED search below the fragment root.                             :)
(: Bradley's health record contains two <checkup> nodes:                  :)
(:   checkup 1: vet=Dr. Aguilar, outcome=healthy                          :)
(:   checkup 2: vet=Dr. Aguilar, outcome=treatment-required               :)
(:                                                                        :)
(: We query for the checkup where outcome="treatment-required" — that     :)
(: is checkup 2. With "unfiltered", MarkLogic returns the FIRST node in   :)
(: the fragment regardless of whether it genuinely satisfies the query.   :)
let $uri   := "/cleverllamas/llamaverse/content/health-records/223a3aaa-3d69-4323-bdd9-a84e00f61631.xml"
let $query := cts:and-query((
  cts:element-value-query(xs:QName("vet"),     "Dr. Aguilar"),
  cts:element-value-query(xs:QName("outcome"), "treatment-required")
))
return
  cts:search(
    fn:doc($uri)/healthRecord/checkup,
    $query,
    "unfiltered"
  )

<checkup sequence="1">
  <vet>Dr. Aguilar</vet>
  <outcome>healthy</outcome>
  <notes>Routine annual check. Weight within normal range. Fleece in excellent condition.</notes>
</checkup>

(: FALSE POSITIVE — checkup 1 is returned, but outcome="healthy",        :)
(: not "treatment-required". MarkLogic returned the first node in the    :)
(: fragment without checking whether it genuinely satisfies the query.   :)

xquery version "1.0-ml";

(: The same query, using "filtered" (the default).                        :)
(: MarkLogic opens the document and validates each <checkup> node against :)
(: the query. Only the node that genuinely satisfies both conditions      :)
(: is returned.                                                           :)
let $uri   := "/cleverllamas/llamaverse/content/health-records/223a3aaa-3d69-4323-bdd9-a84e00f61631.xml"
let $query := cts:and-query((
  cts:element-value-query(xs:QName("vet"),     "Dr. Aguilar"),
  cts:element-value-query(xs:QName("outcome"), "treatment-required")
))
return
  cts:search(
    fn:doc($uri)/healthRecord/checkup,
    $query,
    "filtered"
  )

<checkup sequence="2">
  <vet>Dr. Aguilar</vet>
  <outcome>treatment-required</outcome>
  <notes>Follow-up visit. Mild respiratory symptoms noted. Course of antibiotics prescribed.</notes>
</checkup>

(: CORRECT — checkup 2 is the only node that genuinely satisfies         :)
(: both conditions.                                                       :)

Only the First Match Per Document is Returned

There is a related behaviour specific to unfiltered path expression searches: MarkLogic returns only the first matching node per fragment, even when multiple nodes in the same document match the query.

To illustrate this, consider the llamaverse movement history document. It contains 3,360 GPS entries across all llama residents — Aaron alone has 168 entries, each identified by his llamaId. Searching for Aaron's ID using an unfiltered path expression into the array returns only the first matching entry:

{
  "envelope": {
    "headers": {
      "type": "llama-movement"
    },
    "instance": {
      "llama-movement": [
        {
          "llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5",
          "timestamp": "2025-05-18T12:00:00Z",
          "coordinates": [ -9.9219172, 53.5136376 ]
        },
        {
          "llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5",
          "timestamp": "2025-05-18T13:00:00Z",
          "coordinates": [ -9.9229262, 53.5143298 ]
        },
        {
          "llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5",
          "timestamp": "2025-05-18T14:00:00Z",
          "coordinates": [ -9.9221079, 53.5136388 ]
        }
      ]
    }
  }
}

xquery version "1.0-ml";

(: UNFILTERED cts:search() with a path expression root.                   :)
(: Even though Aaron has 168 movement entries in this document,           :)
(: only the FIRST matching node is returned. This is unfiltered           :)
(: behaviour below the fragment root — MarkLogic stops after the first   :)
(: matching node per fragment.                                            :)
let $uri := "/cleverllamas/llamaverse/content/llama-movement/llama_location_history.json"
let $query := cts:json-property-value-query(
  "llamaId", "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5"
)
let $results :=
  cts:search(
    fn:doc($uri)/envelope/instance/array-node("llama-movement")/object-node(),
    $query,
    "unfiltered"
  )
return (
  fn:count($results),    (: Returns 1, not 168 :)
  $results               (: Returns only the first matching movement entry :)
)

1
{"llamaId": "0c8bdb0d-ac62-49b7-ac74-94dbba46efa5", "timestamp": "2025-05-18T12:00:00Z", "coordinates": [-9.9219172, 53.5136376]}

This pattern — using fn:doc() directly and iterating with cts:contains() — is the recommended approach when you need node-level granularity within a document. Using a path expression as the cts:search() root is fine for document discovery when you need it to run unfiltered, but for collecting all matching nodes within a document, or for any use case where accuracy matters, it is not the right tool.

Configuration Settings That Affect Accuracy

Beyond the filtered and unfiltered options on individual queries, accuracy is also shaped by how the database is configured. The sections below summarise the configuration settings and query options where filtered versus unfiltered execution can materially affect result accuracy. The central pattern is consistent: filtered search validates candidate fragments; unfiltered search relies on index resolution, and the reliability of that resolution depends on the indexes that are available and how they are configured.

Search Execution Options

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
`filtered` / `unfiltered`	All `cts:search`, `cts.search`, Search API, REST/client APIs where options are passed through	Validates candidate fragments against the full query semantics.	Returns index-selected candidate fragments without validation. MarkLogic states that unfiltered search may return false positives, missed matches, or incorrect matches depending on the query, data structure, and database configuration.
`checked` / `unchecked`	Phrase queries, proximity queries, positional matching	With `checked`, word positions are considered during index resolution. Filtering can still validate final results.	With `unchecked`, word positions are not considered during index resolution, increasing false-positive risk. MarkLogic states that `unchecked` searches can lead to false positives for phrases.

Fragment Structure

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
Fragment roots and searchable path expressions	Searches below document or fragment root; repeated elements; node-level result extraction	Filtering can identify the correct matching node or multiple matching nodes within a fragment.	Unfiltered search can return a candidate fragment even where the requested sub-node is not the real match; it can also miss matches where multiple candidate nodes exist in the same fragment. MarkLogic states that unfiltered searches should generally be performed on top-level nodes or fragment roots, because below-fragment searches can return unexpected answers and may stop after the first matching node per fragment.

Position Indexes

Position indexes allow MarkLogic to validate word order and proximity during index resolution. Without them, unfiltered results for near and phrase queries are more likely to contain false positives.

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
`word positions`	`cts:near-query`, multi-word phrase search, ordered/unordered proximity	Filtering can examine the actual fragment text and validate proximity and phrase semantics.	Without suitable position indexes, index resolution may be less precise; unfiltered results can contain false positives for phrases or proximity.
`element word positions`	`cts:element-word-query`, element phrase searches, element-scoped `cts:near-query`	Filtering can validate whether the element actually satisfies the phrase or proximity condition.	Element-scoped phrase and proximity queries are more exposed to false positives when unfiltered and position support is absent or not usable.
`element value positions`	`cts:element-value-query` used in proximity contexts	Filtering can validate the actual element value context.	Unfiltered proximity involving element values depends more heavily on index-position availability.
`attribute value positions`	`cts:element-attribute-value-query`; `cts:element-query` with attribute-query patterns	Filtering can validate the attribute context and surrounding query constraints.	Unfiltered resolution can be less precise where attribute-position information is unavailable.
`field value positions`	`cts:field-value-query`, field value proximity	Filtering can validate actual field-value context.	Unfiltered field-value proximity depends on whether field value positions are available and appropriate.
`triple positions`	`cts:triple-range-query`, `cts:near-query` involving triples, `cts:element-query` involving triples, `cts:triples` with `item-frequency`	Filtering can validate the complete document or fragment context.	Without triple positions, triple proximity and position-sensitive triple use cases may be less accurately resolved from indexes.

Phrase Indexes and Boundary Settings

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
`fast phrase searches`	Phrase queries, especially multi-word phrases	Filtering can remove phrase false positives.	Unfiltered phrase searches — especially those with three or more words — can produce false positives when index resolution is not sufficiently precise.
`fast element phrase searches`	Element-scoped phrase queries	Filtering can validate that the phrase occurs in the intended element context.	Unfiltered element phrase searches are more dependent on the element phrase index for precision.
`phrase-through`	Phrase queries crossing markup boundaries	Filtering can validate phrase semantics according to configured phrase rules.	Unfiltered search relies on whether the correct phrase-through information was indexed at load or reindex time.
`element-word-query-through`	`cts:element-word-query` over parent elements	Filtering can validate element-word-query semantics according to configured rules.	Unfiltered search depends on whether the element-word-query-through configuration is reflected in the indexes.

Sensitivity Settings

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
`fast case sensitive searches`	Case-sensitive word and value queries	Filtering can remove candidates that match only in a different case.	Without the relevant index support, unfiltered case-sensitive searches can return case-insensitive false positives. MarkLogic gives an example where the index may nominate fragments containing `dog` even when the query requires `Dog`; filtering then removes the false-positive fragment.
`fast diacritic sensitive searches`	Diacritic-sensitive word and value queries	Filtering can validate the actual diacritic-sensitive match.	Unfiltered diacritic-sensitive searches can contain false positives where the index resolution is broader than the requested sensitivity.
Punctuation-sensitive query behaviour	Punctuation-sensitive phrase and word queries	Filtering can validate actual punctuation-sensitive matches.	Unfiltered searches can return false positives when the index candidate matches the words but not the punctuation semantics. MarkLogic's example shows `one! two three` returning false positives against `one two three` under unfiltered search.
Word query includes / excludes	`cts:word-query`, `cts:words`, `cts:word-match`, positional contexts	Filtering can validate the final match, but query semantics may already be constrained by the word-query configuration.	Word-query exclusions can prevent MarkLogic from using positions — even where positions are enabled — leading to false positives in positional contexts such as near queries.

Wildcard Searches

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
`three character searches`	Wildcard queries such as `abcx`, `abc`, `a?bcd`	Filtering can validate the final wildcard match.	Unfiltered wildcard searches can return candidates that satisfy index-token conditions but not the intended wildcard expression.
`two character searches`	Short wildcard patterns	Filtering can validate final wildcard semantics.	Unfiltered search depends on character index granularity; insufficient wildcard index support can affect availability, performance, or precision.
`one character searches`	Very short wildcard patterns	Filtering can validate final wildcard semantics.	Most exposed where broad wildcard expansion nominates many candidate fragments.
`trailing wildcard searches`	Patterns such as `abc*`	Filtering can validate actual wildcard expansion.	Result quality depends on the trailing wildcard index and related word/token configuration.
`fast element trailing wildcard searches`	Element-scoped trailing wildcard queries	Filtering can validate element scope and wildcard semantics.	Unfiltered element-scoped wildcard searches are more exposed to candidate-level false positives without suitable element wildcard indexes.
`three character word positions`	Wildcard terms inside `cts:near-query` or multi-word phrase queries	Filtering can validate the actual phrase or proximity match.	Unfiltered wildcard-plus-proximity queries can be false-positive prone if wildcard position indexes are not available.
`trailing wildcard word positions`	Trailing wildcard terms inside proximity or phrase queries	Filtering can validate actual distance and order.	Unfiltered trailing-wildcard proximity can be less precise without trailing wildcard word positions.
`word lexicon` (especially codepoint collation)	Wildcard search expansion and lexicon-assisted wildcard resolution	Filtering can validate final matches.	Unfiltered wildcard accuracy and performance are more dependent on lexicon-backed expansion and the available wildcard indexes.
`lexicon-expansion-limit`, `limit-check`, `no-limit-check`	Wildcarded word and value queries	Filtering can compensate for broad or approximate wildcard expansion.	In unfiltered search, wildcard expansion trade-offs can directly affect result accuracy.

Tokenization

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
Field tokenizer overrides	Field word queries, wildcard-enabled field queries, normalised identifiers such as phone numbers	Filtering can validate actual textual content after tokenization rules are applied.	Incorrect tokenizer configuration can cause broad candidate matches. MarkLogic provides an example where a three-character wildcard search on phone numbers returns unfiltered false positives, then shows field tokenizer overrides eliminating those false positives.

Geospatial Searches

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
Geospatial point indexes	`cts:element-geospatial-query`, `cts:json-property-geospatial-query`, point-in-region queries	Filtering can validate candidate matches against the geospatial predicate.	Unfiltered geospatial search relies more directly on geospatial index candidates; index scope, point format, coordinate system, and precision become more significant.
Geospatial region path index	`cts:geospatial-region-query`	Filtering can validate candidate fragments, but region query execution remains tied to the configured geospatial region index.	Region search is index-driven; incorrect precision, coordinate system, or reference configuration can materially affect the candidate set.
Coordinate system, precision, and tolerance	Geospatial point and region comparisons	Filtering can validate using configured geospatial semantics, but does not change the configured precision model.	Unfiltered search exposes index precision and tolerance decisions more directly.
Geospatial index scope and point format	KML, GeoJSON, element-child geospatial queries, path geospatial queries	Filtering can remove some false candidates, but an overly broad geospatial location path can still create business-level ambiguity.	Unfiltered search can surface false positives where the indexed coordinate location is too broad or does not distinguish point data from other region data. MarkLogic's geospatial example states that limiting scope to coordinates in a `Point` element prevents false positives from documents containing other kinds of regions.
Reverse geospatial queries (e.g. polygon / region containment)	`cts:element-geospatial-query`, `cts:json-property-geospatial-query`, and related region-containment queries where the point is tested against a region boundary	Filtering validates each candidate against the actual geospatial predicate, correctly excluding documents whose coordinates fall outside the region.	Unfiltered reverse geospatial queries — such as polygon containment tests — can return the entire dataset as false positives. This is not documented in MarkLogic's public documentation. Always use filtered search for reverse geospatial / polygon containment queries. A higher-performance alternative is a hybrid approach: split the query so that everything except the geospatial predicate runs as a standard unfiltered query to narrow the candidate set, store and index the polygons as geospatial region indexes, then combine the two with `cts:and-query`. This avoids scanning the full dataset while still validating coordinate containment accurately. Contact us if you need assistance designing this pattern for your use case.

Negation Queries

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
Index accuracy for subqueries inside `cts:not-query`	`cts:not-query`	Even filtered searches can miss results if the negated query is not accurate from index resolution.	Unfiltered false positives inside the negated query can become false negatives in the overall result. MarkLogic states that `cts:not-query` is only guaranteed accurate if the negated query is accurate from index resolution.
Index accuracy for the negative side of `cts:and-not-query`	`cts:and-not-query`	Filtering does not fully protect against inaccurate index resolution of the negative query.	False positives in the negative query can exclude valid results. MarkLogic states that `cts:and-not-query` can miss results even with filtered searches if the negative query has false positives.
Position indexes for `cts:not-in-query`	`cts:not-in-query`	Filtered searches always have access to positions for validation.	Without positions enabled, unfiltered searches can produce surprising results and false positives. MarkLogic states that positions are required to accurately resolve `cts:not-in-query` from indexes.

Reindexing and Operational State

Setting / Option	Features Affected	Filtered Behaviour	Unfiltered Behaviour / Risk
`reindexer enable`, reindex completion, lowest-common-denominator index state	All index-dependent queries after index configuration changes	Filtering can validate candidates, but cannot use index structures that are not yet consistently available.	Unfiltered search is especially exposed when old and new fragments are indexed differently, or when new index settings are not yet usable. MarkLogic states that after index or fragmentation changes, queries cannot take advantage of new settings until index settings meet the lowest-common-denominator criteria.

Quick Reference

The tables below summarise the filtered/unfiltered default behaviour across MarkLogic's retrieval and query-constrained lexicon interfaces. Functions where filtered/unfiltered is not meaningful — such as direct URI reads via fn:doc, cts:doc, REST /v1/documents, and SPARQL endpoints — are excluded.

Core CTS Search

Feature / Interface	Default	Notes
`cts:search`	filtered	If neither `filtered` nor `unfiltered` is specified, the default is `filtered`. Source: cts.search.
`cts.search`	filtered	Same default as `cts:search`; the JavaScript interface exposes the same semantics. Source: cts.search.
XPath search in XQuery	filtered	XPath resolution uses index resolution followed by filtering, like `cts:search`. Source: Understanding Unfiltered Searches.
`cts:contains`	filtered	Checks whether supplied nodes or values match a query — used as the filtering step when paired with unfiltered index resolution. Source: cts:contains.
`cts.contains`	filtered	Same operational role as `cts:contains`. Source: cts.contains.
`cts:registered-query`	filtered*	Documented default is `filtered`, but filtered registered queries are not currently available — `unfiltered` is required in practice. Source: cts:registered-query.

Search API

Feature / Interface	Default	Notes
`search:search`	filtered	Underlying CTS default is filtered; however the `total` estimate in the response is based on index resolution and is not filtered for accuracy. Source: Search API Options.
`search.search`	filtered	Same as `search:search`. Source: search.search.
`search:resolve`	filtered	Absent a `search-option` override, the CTS default applies. Source: Query Options.
`search.resolve`	filtered	Same as `search:resolve`. Source: search.resolve.
`search:estimate`	unfiltered	Result reflects index resolution of the search. Source: search:estimate.
`search.estimate`	unfiltered	Same as `search:estimate`. Source: search.estimate.
`search:suggest`	unfiltered	Suggestions are based on lexicon-backed constraints resolved through index resolution. Source: REST Suggest.
`search.suggest`	unfiltered	Same as `search:suggest`. Source: search.suggest.
`return-facets`	unfiltered	Facet values and counts are based on index resolution of the `cts:query`. Source: Search API.

REST and Client APIs

REST Client API

Feature / Interface	Default	Notes
`GET /v1/search`	unfiltered	Default search behaviour for the REST Client API is unfiltered unless overridden with query options. Source: Release Notes.
`POST /v1/search`	unfiltered	Same default as `GET /v1/search`. Source: Release Notes.
`GET /v1/qbe`	unfiltered	The boolean `filtered` flag controls behaviour; unfiltered is the default. Source: Query By Example.
`POST /v1/qbe`	unfiltered	Same QBE default as `GET /v1/qbe`. Source: Query By Example.
`GET /v1/values/{name}`	unfiltered	Retrieves lexicon/range-index values or co-occurrences. Source: GET /v1/values/{name}.
`POST /v1/values/{name}`	unfiltered	Same values/co-occurrence semantics as `GET /v1/values/{name}`. Source: POST /v1/values/{name}.

Java Client API

Feature / Interface	Default	Notes
`QueryManager.search`	unfiltered	Predefined persistent query options are selected for performance; searches run unfiltered unless changed. Source: Java Query Options.
`QueryManager.values`	unfiltered	Values queries are lexicon/range-index operations resolved through index resolution. Source: Java Searches.

Node.js Client API

Feature / Interface	Default	Notes
`DatabaseClient.documents.query`	unfiltered	Node.js Client API searches default to unfiltered search. Source: Node.js Search.
`queryBuilder.byExample`	unfiltered	QBE uses unfiltered search by default for performance; `$filtered: true` forces filtered. Source: Node.js Search.
`DatabaseClient.values.read`	unfiltered	Lexicon/range-index queries resolved in the unfiltered/index-resolution manner. Source: Node.js Search.

Server-Side JavaScript (JSearch)

Feature / Interface	Default	Notes
`jsearch.documents`	unfiltered	JSearch document search is unfiltered by default. Calling `.filter()` switches to filtered. Source: DocumentsSearch.filter.
`DocumentsSearch.filter`	filtered	Explicitly inspects matched and ordered documents. Source: DocumentsSearch.filter.
`jsearch.values`	unfiltered	Values operate over lexicons/range indexes via index resolution. Source: ValuesSearch.
`ValuesSearch.where`	unfiltered	Query constraints scope values through index resolution rather than full document filtering. Source: ValuesSearch.where.
`jsearch.facets`	unfiltered	Facets are generated from lexicon or range-index values. Source: FacetsSearch.
`FacetsSearch.where`	unfiltered	Qualifies facet values using `cts.query` values; resolved via index resolution. Source: FacetsSearch.

CTS Lexicons

Value Lexicons

Feature / Interface	Default	Notes
`cts:values`	unfiltered	Fragments are selected in the same manner as unfiltered `cts:search`. Source: cts:values.
`cts.values`	unfiltered	Same as `cts:values`. Source: cts.values.
`cts:element-values`	unfiltered	Same lexicon-constrained behaviour as `cts:values`. Source: cts.elementValues.
`cts.elementValues`	unfiltered	Same as `cts:element-values`. Source: cts.elementValues.
`cts:element-attribute-values`	unfiltered	Same lexicon-constrained behaviour as `cts:values`. Source: cts.elementAttributeValues.
`cts.elementAttributeValues`	unfiltered	Same as `cts:element-attribute-values`. Source: cts.elementAttributeValues.
`cts:field-values`	unfiltered	Field value lexicons follow the same query-constrained lexicon behaviour. Source: cts:field-values.
`cts.fieldValues`	unfiltered	Same as `cts:field-values`. Source: cts.fieldValues.

Word Lexicons

Feature / Interface	Default	Notes
`cts:word-match`	unfiltered	Query-constrained word-match fragments are selected in the same manner as unfiltered `cts:search`. Source: cts.wordMatch.
`cts.wordMatch`	unfiltered	Same as `cts:word-match`. Source: cts.wordMatch.
`cts:element-word-match`	unfiltered	Same word-lexicon behaviour. Source: cts:element-word-match.
`cts.elementWordMatch`	unfiltered	Same as `cts:element-word-match`. Source: cts.elementWordMatch.

URI and Collection Lexicons

Feature / Interface	Default	Notes
`cts:uris`	unfiltered	Fragments are selected as in unfiltered `cts:search`; no documents are opened. Source: cts:uris.
`cts.uris`	unfiltered	Same as `cts:uris`. Source: cts.uris.
`cts:collections`	unfiltered	Query-constrained collection lexicon results are selected in the same manner as unfiltered `cts:search`. Source: cts.collections.
`cts.collections`	unfiltered	Same as `cts:collections`. Source: cts.collections.

Tuples, Co-occurrences and Ranges

Feature / Interface	Default	Notes
`cts:value-tuples`	unfiltered	Fragments are selected in the same manner as unfiltered `cts:search`. Source: cts:value-tuples.
`cts.valueTuples`	unfiltered	Same as `cts:value-tuples`. Source: cts.valueTuples.
`cts:value-co-occurrences`	unfiltered	Same query-constrained lexicon behaviour as `cts:value-tuples`. Source: cts:value-co-occurrences.
`cts.valueCoOccurrences`	unfiltered	Same as `cts:value-co-occurrences`. Source: cts.valueCoOccurrences.
`cts:value-ranges`	unfiltered	Fragments are not filtered but selected in the same manner as unfiltered `cts:search`. Source: cts:value-ranges.
`cts.valueRanges`	unfiltered	Same as `cts:value-ranges`. Source: cts.valueRanges.

Triples

Feature / Interface	Default	Notes
`cts:triples`	unfiltered	Fragments are selected in the same manner as unfiltered `cts:search`. Source: cts.triples.
`cts.triples`	unfiltered	Same as `cts:triples`. Source: cts.triples.

Optic

Feature / Interface	Default	Notes
`op:from-search`	unfiltered	Constructs a row set from a `cts:query`; fragments are selected in the same manner as unfiltered `cts:search`. Source: op:from-search.
`op.fromSearch`	unfiltered	Same as `op:from-search`. Source: op.fromSearch.

Need Some Help?

Looking for more information on this subject or any other topic related to MarkLogic? Contact Us (info@cleverllamas.com) to find out how we can assist you with consulting or training!

Filtered vs Unfiltered Searches

# Filtered Searches

# How Filtered Search Works

# Sample Data

# cts:search() — Filtered by Default

# Explicitly Requesting Unfiltered

# Unfiltered Searches

# How Unfiltered Search Works

# cts:uris() — Unfiltered by Default

# Forcing Filtered Behaviour on cts:uris()

# Path Expressions

# Filtered and Unfiltered Produce Different Results Below the Fragment Root

# Only the First Match Per Document is Returned

# Configuration Settings That Affect Accuracy

# Search Execution Options

# Fragment Structure

# Position Indexes

# Phrase Indexes and Boundary Settings

# Sensitivity Settings

# Wildcard Searches

# Tokenization

# Geospatial Searches

# Negation Queries

# Reindexing and Operational State

# Quick Reference

# Core CTS Search

# Search API

# REST and Client APIs

# REST Client API

# Java Client API

# Node.js Client API

# Server-Side JavaScript (JSearch)

# CTS Lexicons

# Value Lexicons

# Word Lexicons

# URI and Collection Lexicons

# Tuples, Co-occurrences and Ranges

# Triples

# Optic