All About Fragments
A Complete MarkLogic Guide to Fragment Behaviour
Extended Llama Lesson
This is not a typical article. It is an Extended Llama Lesson — a deep dive that takes you further into a topic than you would normally be able to go on your own. The insights here come from years of real-world experience working with MarkLogic at scale: the kind of knowledge that only surfaces when you have hit the edges, debugged the unexpected, and worked through the harder corners of a platform. We hope it saves you some of that journey.
If any of this raises questions about your own implementation — or if you need expert help with fragment roots, index configuration, or anything else MarkLogic — reach out to us. This is exactly the kind of work we specialise in.
By default, each document in MarkLogic is a single fragment. The fragment is the fundamental unit of indexing: word indexes, range indexes, positional data, and near-query evaluation are all scoped to a fragment. For most databases this default is fine and you never need to think about it. But if you work with large XML documents that contain repeating semi-independent structures — chapters, records, articles, products — fragment root configuration can change both what your queries return and how efficiently your indexes are maintained.
JSON Documents: No Need to Read Further
If your data model is entirely JSON, this is your stop point. JSON documents do not support fragment roots. Every JSON document is always a single fragment regardless of nesting depth.
If you need finer granularity in JSON, split large payloads into multiple documents at ingest time.
What Is a Fragment? (Deep Model)
A fragment is the indexing unit MarkLogic uses for search and evaluation. Every index entry is bound to a fragment, not directly to a URI. This matters because query correctness depends on fragment boundaries, not visual adjacency in your XML file.
Think in two identities at once:
- Storage identity: one URI identifies one stored document.
- Search identity: one query hit may represent one fragment within that document.
That split explains why teams see "duplicate" hits for one URI, why near queries can fail without errors, and why application result mapping must often resolve parent-document context explicitly.
Legacy Disclaimer and Preferred Modern Pattern
In many systems, heavy reliance on fragment roots is a legacy-era design choice. If your application constantly needs to resolve fragment hits back to parent identity, consider modelling each logical child as its own document instead.
A practical modern pattern:
- Store one logical child per URI (for example, one chapter per document).
- Keep related children together in a collection (for example
handbook:llama-handbook). - Add a shared document identifier field (for example
<handbookId>llama-handbook</handbookId>). - Keep a child identifier field (for example
<chapterId>behaviour</chapterId>).
This preserves whole-document grouping without fragment-level ambiguity, and usually simplifies search APIs, pagination, and operational troubleshooting.
One Fragment vs Multile Fragments
The examples in this article use a self-contained handbook:
<handbook id="llama-handbook">
<title>The Complete Llama Handbook</title>
<content>
<chapter id="intro">
<title>Introduction</title>
<content>Llamas are remarkable animals native to the Andes mountains of South
America. They have been domesticated for thousands of years and are valued for
their wool, carrying ability, and calm temperament.</content>
</chapter>
<chapter id="behaviour">
<title>Behaviour and Social Structure</title>
<content>Llamas are highly social animals that live in herds. They communicate
through body posture, ear position, and vocalisation. Llamas are known to spit
when threatened, though this behaviour is usually directed at other llamas
rather than humans.</content>
</chapter>
<chapter id="care">
<title>Care and Feeding</title>
<content>Llamas are relatively easy to care for. They are grazers and prefer
grass and hay. They require shelter from extreme weather and regular shearing
to maintain their wool in good condition.</content>
</chapter>
</content>
</handbook>
In this sample, all <chapter> elements sit inside a <content> container. We will use that exact shape later as a concrete fragment-parent example.
Without fragment roots or fragment parents configured, the whole document is one fragment:
xquery version "1.0-ml";
(: Without fragment roots on <chapter>, a word query returns the whole :)
(: document regardless of which chapter contains the matching term. :)
for $doc in cts:search(
fn:collection("handbooks"),
cts:word-query("spit")
)
return fn:base-uri($doc)
/handbooks/llama-handbook.xml
<handbook id="llama-handbook">
<title>The Complete Llama Handbook</title>
<content>
<chapter id="intro">
<title>Introduction</title>
<content>Llamas are remarkable animals native to the Andes mountains of South
America. They have been domesticated for thousands of years and are valued for
their wool, carrying ability, and calm temperament.</content>
</chapter>
<chapter id="behaviour">
<title>Behaviour and Social Structure</title>
<content>Llamas are highly social animals that live in herds. They communicate
through body posture, ear position, and vocalisation. Llamas are known to spit
when threatened, though this behaviour is usually directed at other llamas
rather than humans.</content>
</chapter>
<chapter id="care">
<title>Care and Feeding</title>
<content>Llamas are relatively easy to care for. They are grazers and prefer
grass and hay. They require shelter from extreme weather and regular shearing
to maintain their wool in good condition.</content>
</chapter>
</content>
</handbook>
(: The entire document is returned because the document is a single fragment. :)
(: "spit" appears only in the behaviour chapter, but the whole document :)
(: matches — there is no finer-grained index unit to narrow the result. :)
With chapter elements configured to be separate fragments, the results differ:
xquery version "1.0-ml";
(: With a fragment root configured on <chapter>, each chapter is indexed as :)
(: its own fragment. Searching within chapter elements returns only the :)
(: fragments (chapters) that actually contain the matching term. :)
for $chapter in cts:search(
fn:collection("handbooks")//chapter,
cts:word-query("spit")
)
return fn:string($chapter/title)
<chapter id="behaviour">
<title>Behaviour and Social Structure</title>
<content>Llamas are highly social animals that live in herds. They communicate
through body posture, ear position, and vocalisation. Llamas are known to spit
when threatened, though this behaviour is usually directed at other llamas
rather than humans.</content>
</chapter>
(: Only the chapter that contains "spit" is returned as a fragment. :)
(: The other two chapters are independently indexed and do not match. :)
When to Strongly Consider TDE Instead
If you are modelling data as very small documents, or adding fragment roots/parents mainly to get targeted results from repeating XML sections, strongly consider Template Driven Extraction (TDE) instead.
TDE is often a better fit when the real goal is projection rather than fragment navigation.
Repeating XML sections can be projected directly into rows You can map each repeating section into a row-oriented view that downstream SQL/Optic workloads can query without complex fragment-aware application logic.
The same structures can be projected as triples If your use case benefits from graph-style relationships, TDE can emit triples from those repeating sections without requiring document reshaping just for query targeting.
Query contracts become more stable Instead of returning fragment payloads and then reconstructing parent context, consumers can query a well-defined projected shape designed for analytics, search augmentation, or integration.
Model pressure is reduced You avoid overfitting the primary XML storage model solely to satisfy reporting/projection patterns. The source document can remain coherent while TDE provides purpose-built access paths.
Operational intent is clearer Fragment roots/parents are excellent for fragment-aware search behaviour, but TDE is usually the cleaner abstraction when your requirement is "extract and project repeating structures efficiently".
A practical rule of thumb: if you are introducing tiny-document modelling or fragment configuration primarily to produce targeted tabular/graph projections, evaluate TDE first.
Benefits of Fragments
Fragments provide practical advantages when your XML documents are large, repetitive, or naturally partitioned into semi-independent units.
More precise result payloads With fragment roots, queries can return only the matching logical unit (for example a chapter) instead of always returning the full document.
Better relevance locality Scoring is calculated against the fragment that matched. This can improve ranking quality for section-centric search experiences.
Cleaner user-facing snippets and highlights When matches are fragment-local, snippet generation can focus on the relevant section rather than mixing context from unrelated parts of a large document.
Improved application routing patterns Fragment-level matches paired with parent mapping enable APIs that expose both section identity and stable document identity in one response.
Operational flexibility for XML-heavy legacy models For systems that already store large aggregate XML documents, fragment roots can introduce finer search granularity without requiring an immediate full re-modelling to document-per-entity.
Stronger troubleshooting clarity Thinking in fragments helps teams explain near-query surprises, ranking anomalies, and duplicated URI hits using explicit index boundaries instead of ambiguous query folklore.
Fragments are most valuable when fragment boundaries align with real business units. If boundaries do not align with how the application consumes data, document-per-entity models can still be simpler.
Gotchas and Anomalies
The most common first surprise is near-query behaviour: cts:near-query() never matches terms across fragment boundaries, even when the terms look adjacent in the source document.
Use this table as the primary checklist for fragment-related anomalies. The evidence column marks whether each item is directly documented in MarkLogic function docs or derived from production behaviour seen in real implementations.
| Symptom or anomaly | Why it happens | What to do | Evidence |
|---|---|---|---|
| Near-query unexpectedly returns no results | Terms are in different fragments; cts:near-query() does not cross fragment boundaries | Keep near terms in the same fragment or redesign query/model | Documented + field validated |
| Search returns fragment nodes instead of full document | Fragment roots are configured and query matches child fragment | Use fn:root() and fn:base-uri() in result mapping | Field validated |
| One URI appears multiple times in results | Multiple matching fragments exist in the same document | Group results by parent URI when business logic is document-centric | Field validated |
| Snippets look duplicated | Snippet generation can occur per fragment, not per URI | De-duplicate at URI level or present fragment-level context intentionally | Field validated |
| Relevance ordering looks strange | Scores are computed per fragment; URI-level merging can change the apparent order | Normalise score handling when merging fragment hits | Documented + field validated |
| Facet counts are lower or higher than expected | Faceting can reflect fragment-level matches when query scope is fragment-oriented | Aggregate by parent URI when your business unit is document-level | Field validated |
| Expected document-level range values are missing from hit context | Matched fragment does not contain those values even though parent document does | Resolve parent document before extracting document-level metadata | Field validated |
| Update causes larger reindex cost than expected | Adding or removing fragment roots triggers reindex work and can increase background load | Treat fragment-root updates as migration events with capacity planning | Field validated |
| Lock contention changes after model update | Different fragment granularity can change lock scope and concurrency profile | Re-test write concurrency after fragment configuration changes | Field validated |
| Alerting or downstream processing becomes noisy | More granular matches can increase event volume in search-driven pipelines | Audit downstream assumptions after enabling fragment roots | Field validated |
| Search works in one environment but not another | Fragment root config differs across environments | Version-control config and verify in deployment checks | Field validated |
| Performance regresses on small documents | Extra fragment metadata overhead provides no benefit | Keep default single-fragment model for naturally small documents | Field validated |
| Parent-child assumptions break in API responses | API returns fragment payload without parent enrichment | Always include parent URI and business key in response contract | Field validated |
| JSON content unaffected by fragment config | JSON does not support fragment roots | Split JSON into multiple documents if finer granularity is required | Documented + field validated |
Fragment Roots and Fragment Parents
When working with fragment-level results, you can always use xdmp:node-uri($fragment) to resolve which document URI the fragment belongs to.
Fragment Roots
Fragment roots control where MarkLogic starts a new fragment boundary inside an XML document. If you configure an element such as <chapter> as a fragment root, each <chapter> instance becomes its own searchable fragment instead of being only part of one document-wide fragment.
This changes behaviour in three important ways:
- Match scope: search matches are evaluated per fragment, not implicitly across the entire URI.
- Result shape: queries can return a fragment node (for example one chapter) rather than the whole document node.
- Scoring and ranking: relevance is computed against the matched fragment content.
Once <chapter> is configured as a root, query results can return chapter fragments rather than the whole document:
xquery version "1.0-ml";
(: With a fragment root configured on <chapter>, each chapter is indexed as :)
(: its own fragment. Searching within chapter elements returns only the :)
(: fragments (chapters) that actually contain the matching term. :)
for $chapter in cts:search(
fn:collection("handbooks")//chapter,
cts:word-query("spit")
)
return fn:string($chapter/title)
<chapter id="behaviour">
<title>Behaviour and Social Structure</title>
<content>Llamas are highly social animals that live in herds. They communicate
through body posture, ear position, and vocalisation. Llamas are known to spit
when threatened, though this behaviour is usually directed at other llamas
rather than humans.</content>
</chapter>
(: Only the chapter that contains "spit" is returned as a fragment. :)
(: The other two chapters are independently indexed and do not match. :)
Fragment Parents
Fragment parents are related to roots, but they are not the same concept. A fragment root says "start a fragment here." A fragment parent defines parent context for that fragment so MarkLogic can maintain useful ancestry and navigation semantics for fragment-level matches.
In this sample document, chapters are wrapped inside a <content> container. That gives a concrete parent example: <chapter> is the fragment root element, while <content> is the immediate container parent for each chapter fragment.
In practice, fragment parents matter when you need to bridge fragment-level search with document-level application logic:
- You may match a child fragment, but your API contract usually still needs stable parent identity.
- Highlighting, snippets, and result presentation often depend on consistent parent context.
- Without explicit parent-aware mapping, client code can make incorrect assumptions about one-result-per-URI behaviour.
Operationally, the safest model is: search at fragment level, then always resolve and return both container parent context (such as <content>) and document-level identifiers for downstream use.
How Fragment Roots and Fragment Parents Can Be Configured
There are four common configuration routes. They all change the same database settings; they differ in workflow and automation level.
Admin UI Configure roots and parents in the database configuration screens in the Admin interface. This is the fastest route for exploration and one-off troubleshooting.
XQuery Admin API module Apply the same database settings programmatically from server-side XQuery using the Admin library. This is useful for controlled operational scripts and repeatable environment bootstrapping.
Management REST API Update database properties through the Management API. This suits CI/CD pipelines and external automation systems that manage infrastructure declaratively.
Configuration-as-code tooling Use deployment tools that wrap Management/Admin APIs (for example project deployment tooling) so fragment roots and fragment parent settings are versioned alongside other environment config.
Regardless of route, treat changes as migration events: apply in non-production first, monitor reindex progress, and regression-test query behaviour before promoting.
Fragment Parents: Expanded Examples
When fragment roots are in use, parent lookup is a first-class requirement, not an optional extra. The standard pattern is: query at fragment level, then resolve parent URI and document metadata.
xquery version "1.0-ml";
(: Given a chapter fragment, navigate back to the containing document. :)
for $chapter in cts:search(
fn:collection("handbooks")//chapter,
cts:word-query("spit")
)
return (
"Chapter: " || fn:string($chapter/title),
"Document: " || fn:base-uri(fn:root($chapter))
)
Chapter: Behaviour and Social Structure
Document: /handbooks/llama-handbook.xml
(: fn:root() returns the document node that contains the fragment. :)
(: fn:base-uri() on that document node gives the URI. :)
For API design, return both fragment and parent identifiers in one payload:
xquery version "1.0-ml";
let $fragment :=
cts:search(
fn:collection(),
cts:and-query((
cts:directory-query("/handbooks/"),
cts:word-query("spit"),
cts:element-query(xs:QName("chapter"), cts:true-query())
))
)[1]
let $parent-document := fn:root($fragment)
return
<fragment-context>
<fragment-id>{fn:data($fragment/@id)}</fragment-id>
<fragment-title>{fn:string($fragment/title)}</fragment-title>
<fragment-parent-element>{fn:name($fragment/..)}</fragment-parent-element>
<parent-uri>{fn:base-uri($parent-document)}</parent-uri>
<parent-title>{fn:string($parent-document/handbook/title)}</parent-title>
</fragment-context>
<fragment-context>
<fragment-id>behaviour</fragment-id>
<fragment-title>Behaviour and Social Structure</fragment-title>
<fragment-parent-element>content</fragment-parent-element>
<parent-uri>/handbooks/llama-handbook.xml</parent-uri>
<parent-title>The Complete Llama Handbook</parent-title>
</fragment-context>
(: Fragment-level result mapped back to document-level identity and metadata. :)
This avoids brittle client-side assumptions and keeps document-level routing stable even when one URI yields multiple fragment matches.
Final Takeaways
Fragments are not a minor implementation detail. They define the unit of search truth in MarkLogic. For XML workloads, treat fragment configuration as architecture: verify fragment topology, test gotchas early, and decide intentionally between fragment roots and document-per-entity modelling.
Need Some Help?
Looking for more information on this subject or any other topic related to MarkLogic? Contact Us (info@cleverllamas.com) to find out how we can assist you with consulting or training!
- JSON Documents: No Need to Read Further
- What Is a Fragment? (Deep Model)
- Legacy Disclaimer and Preferred Modern Pattern
- One Fragment vs Multile Fragments
- When to Strongly Consider TDE Instead
- Benefits of Fragments
- Gotchas and Anomalies
- Fragment Roots and Fragment Parents
- Fragment Roots
- Fragment Parents
- How Fragment Roots and Fragment Parents Can Be Configured
- Final Takeaways