Optic Update
Released in MarkLogic 11
MarkLogic has released Optic Update as a preview release. It is expected that details of this feature will change over time with user feedback. This feature is a preview release. The online documentation gives a few simple examples. To see what makes it work, we went under the hood and walked through the code line-by-line. Optic update is a set of dedicated libraries as well as additional code extending some of the base implementation.
Basic information
SJS Only
This marks the first item that we can think of that is SJS only. Some of the code is wired into the implementing SJS libraries. The rest is wired into 2 new SJS libraries. Interestingly enough, those two libraries could likely have been created in xQuery - making them more easily re-usable. It will be interesting to see if this stays only as a SJS implementation.
Doc Descriptors
Doc Descriptors are just simple JSON objects where the code farms them and injects them into a xdmp:document-insert()
or temporal:document-insert()
.
op.fromDocDescriptors
is like op.fromLiterals()
on the surface. However, it enforces that the columns created are those needed to support the update/insert of a document as well as defined datatype etc. This is closely related to op.fromParam()
. When you trace the logic, you find that the DocDescriptorsPlan class extends the ParamPlan class. We can explore a bit and prove that what is created are regular column references (inspect the column types):
const op = require("/MarkLogic/optic");
op.fromDocDescriptors({
uri: "/llama.json",
doc: { name: "Jacinta" },
collections: ["llamaverse"],
})._colTypes;
Returns:
[
{
column: "collections",
type: "none",
nullable: true,
},
{
column: "doc",
type: "none",
nullable: true,
},
{
column: "uri",
type: "string",
nullable: true,
},
];
It appears the type none
allows one to add content other than simple objects. That will be an interesting thing to try in the future using op.fromParam()
.
However, if we try to add anything unexpected, then it fails (because the extra stuff is not part of a document descriptor). This helps keep the code clean and predictable but also limits the initial columns to those that a docDescriptor expects.
const op = require("/MarkLogic/optic");
op.fromDocDescriptors({
uri: "/llama.json",
doc: { name: "Jacinta" },
collections: ["llamaverse"],
llamasAreGreat: true,
})._colTypes;
In this case, it returns an error about the extra property.
Likewise - even though it 'feels' like op.fromLiterals()
, it is not. op.fromDocDescriptors()
allows for columns other than simple types. Below we have the same sample. However, upon exchanging op.fromDocDescriptors()
with op.fromLiterals()
we get an error for the 'doc' property. (The somewhat cryptic XDMP-BADRDFVAL can be translated to mean "Unsupported Datatype")
"use strict";
const op = require("/MarkLogic/optic");
op.fromLiterals([
{
uri: "/llama.json",
doc: { name: "Jacinta" },
collections: ["llamaverse"],
},
]);
op.fromDocUris
I would suggest stepping back and considering if this convenience function is what you need. You get a URIs column as well as a fragment reference (since this is a wrapper to op.fromLexicons()
). If you actually wanted to join the doc on later, then you could already do that with a URI alone.
Furthermore op:fromLexicons()
has traditionally suffered from performance issues in the past. However, with the same release as this feature, op.fromLexicons()
has had a performance boost that should make it run as fast as cts.valueTuples()
. In a separate post we will explore the details around that. For now, just consider that if you use op.fromDocUris()
that you use a scoping query to limit the results as much as possible up-front.
op.lockForUpdate()
This does exactly as one might expect: it uses xdmp.lockForUpdate()
. This type of lock serializes all write access to the URI in question. As the API docs state: this will get an early lock on documents that will be updated later. If used in a plan with a prolonged execution time, other threads waiting to update those same fragments will have to wait.
Temporal
The library supports temporal actions. This is one of the properties in docDescriptor object. Under the hood, it's just choosing the xdmp.xyz()
function or the temporal.xyz()
function. No magic - it just does the job. Of course, if you configure the temporal collection that does not exist, then an error is thrown from the temporal layer.
Transforms
For this article, we won't address transforms. That can be a post on it's own. Transforms are pretty straight forward when you dig deep enough. Eventually you see familiar executing code like xdmp.invoke()
or in the case of SJS (known also as MJS), then xdmp.xsltInvoke()
.
NOTE
When using a Javascript transform, the code is invoked with AMPs disabled ('ignoreAmps':true). Because this is not the default behavior of xdmp.invoke()
, you may get surprising results if your invoking code is already AMPed.
Validation
As with transforms, we'll keep the details of validation out of this first post. That way we can concentrate on samples that do not require content (schemas) to be inserted first.
Read of document immediately
This probably makes sense to most people based on how transactions work in MarkLogic. But to be clear: A document that is being inserted is not available to be joined on immediately.
declareUpdate();
const op = require("/MarkLogic/optic");
//see a future post about pitfalls of prototyping as admin
const defaultPermissions = xdmp.defaultPermissions();
op.fromDocDescriptors([
{
uri: "/llama7.json",
doc: { name: "Peter" },
collections: ["llamaverse"],
permissions: defaultPermissions,
},
])
.write()
.joinDoc(op.col("doc-inserted"), op.col("uri"))
.result();
This will not give a result. However, running it a second time will result in the doc-inserted
column having content. This is just because the second run is after the document is actually in the system.
Samples
Simple Example
The first sample is actually above - a simple insert. The sample above uses .result() because of trying to read the document as inserted. For Optic Update, you can also just op.execute()
with no results given.
declareUpdate();
const op = require("/MarkLogic/optic");
//see a future post about pitfalls of prototyping as admin
const defaultPermissions = xdmp.defaultPermissions();
op.fromDocDescriptors([
{
uri: "/llama8.json",
doc: { name: "Niamh" },
collections: ["llamaverse"],
permissions: defaultPermissions,
},
])
.write()
.joinDoc(op.col("doc-inserted"), op.col("uri"))
.execute();
Complex Example
TASK
Generate a report per clan with all of the llamas listed for that clan. Store this information as JSON documents with the clan name as part of the URI.
Data
Clans
Name | Id | Main Seat location |
---|---|---|
Kirwan | k | Corrandulla |
Lynch | l | Galway |
Skerritt | s | Ballinduff |
Llamas
Name | Clan ID |
---|---|
Jane | k |
Robert | l |
Skeff | s |
Sean | s |
Orla | s |
Maeve | l |
Code
- Data is in-line so that we can keep the sample self-contained
- Join the llamas to the clans
- Use the handy arrayAggregate to pack the llamas into an array.
- Dynamically construct a JSON document. Honestly, I thought I would have to put more muscle work into this by constructing the object. However, the
op.col()
references in the JSON payload expand to their values properly. - Generate a URI on-the-fly
- Generate a collection on-the-fly
- Add default permissions. (not needed as admin - but I usually don't prototype as admin - feel free to ask me why..)
declareUpdate();
const op = require("/MarkLogic/optic");
//see a future post about pitfalls of prototyping as admin
const defaultPermissions = xdmp.defaultPermissions();
const clans = op.fromLiterals(
// likely would have come from view
[
{ clanName: "Kirwan", clanId: "k", location: "Corrandulla" },
{ clanName: "Lynch", clanId: "l", location: "Galway" },
{ clanName: "Skerritt", clanId: "s", location: "Ballinduff" },
],
"clans"
);
const llamas = op.fromLiterals(
// likely would have come from view
[
{ name: "Jane", clanId: "k" },
{ name: "Robert", clanId: "l" },
{ name: "Skeff", clanId: "s" },
{ name: "Sean", clanId: "s" },
{ name: "Orla", clanId: "s" },
{ name: "Maeve", clanId: "l" },
],
"llamas"
);
clans
.joinLeftOuter(llamas, op.on(clans.col("clanId"), llamas.col("clanId")))
.groupBy(
[op.viewCol("clans", "clanName"), clans.col("clanId")],
op.arrayAggregate("members", llamas.col("name"))
)
.bind([
op.as("doc", {
clanName: op.col("clanName"),
members: op.col("members"),
}), // This works.. I expected to have to build the object
op.as(
"uri",
op.fn.concat(
"/clever-llamas/test/optic/update/",
op.fn.lowerCase(op.xs.string(clans.col("clanName"))),
".json"
)
),
op.as(
"collections",
op.fn.concat(
"clan/",
op.fn.lowerCase(op.xs.string(clans.col("clanName")))
)
),
op.as("permissions", defaultPermissions),
])
.write()
.execute();
Result As expected, we end up with three new documents in the system which were dynamically generated from the join of 2 datasets.
A sample of one document can be seen below:
Conclusion
Although a preview-stage feature, it could have immediate value for those that work within the Optic space. Knowing that everything from the lock to inserts to validation to transformation are directly built off core items in the system already immediately gives an extra boost of confidence in the efficiency and stability of the feature. Valuable already - curious where future releases will take this.
Need Some Help?
Looking for more information on this subject or any other topic related to MarkLogic? Contact Us (info@cleverllamas.com) to find out how we can assist you with consulting or training!
- Basic information
- SJS Only
- Doc Descriptors
- op.fromDocUris
- op.lockForUpdate()
- Temporal
- Transforms
- Validation
- Read of document immediately
- Samples
- Simple Example
- Complex Example
- Data
- Code
- Conclusion