Sunday, May 8, 2016

def:WhereClause in XQuery

Today, I worked on the PMDA rule SDTM-CT2005: "Variable should be populated with terms from its CDISC controlled terminology codelist, when its value level condition is met"

This is a rule about valuelevel data and associated codelist, for example that when VSTEST=FRMSIZE, VSORRES must be one of "SMALL", "MEDIUM" and "LARGE".
Or "VSORRESU must be "cm" when VSTESTCD=HEIGHT and DM-COUNTRY=CAN, MEX, GER, or FRA and must be "inch" when VSTESTCD=HEIGHT and DM-COUNTRY=USA.

The latter is of course special as it goes over the boundaries of domain, and thus files. When you  however have all your submissions in a native XML database (as I recommended to the FDA, but no reaction at all sofar...) this rule shouldn't be too hard to implement.

We are currently implementing all validation rules of CDISC, the FDA and PMDA in XQuery, the open, vendor-neutral, W3C standard for quering XML documents and databases, and thus also Dataset-XML submissions.

The challenging in this rule is that one needs to translate the contents of def:WhereClauseDef elements in the define.xml, like:


with this "WhereClauseDef" referenced from a def:ValueList:


applicable to the variable SCORRES as defined by;


So, how do we translate the "def:WhereClauseDef" into an XQuery statement? Of course the XQuery script can read the "def:WhereClauseDef" and the "RangeCheck" element in it, but it requires a "where" FLWOR expression like:

where $itemgroupdata/odm:ItemData[@ItemOID="IT.SC.SCTESTCD" and @Value="MARISTAT"]

So I wrote an XQuery function that accepts a def:ValueListDef node and returns a string which essentially is an XPath. Here it is:


The function is not perfect yet, it works well for the simple case that there is only one "RangeCheck" within the "def:ValueListDef" and the "comparator is "EQ" or "NE" and the check value is a string (the most common case). It doesn't work yet for more complicated cases - but I am working on it...

The function returns a string, which is essentially XQuery source code, but even XQuery needs executable code. Fortunately, there is the "util:eval" function (xdmp:eval in xquery:eval in BaseX) which takes a string which is XQuery code itself as an argument and evaluates it. In our validation script this looks like:


What this code snippet does, is that it iterates over all "ItemRef" child elements of a def:ValueListDef element, picks up the corresponding "def:WhereClauseDef" element, which is translated into an XQuery snippet and evaluated on the current "SCORRES". If the XQuery returns an answer (in this case an "ItemData" element), this means that the condition is applicable to the current record.

In the next step, it is then checked whether there is a codelist for SCORRES at the value level. For example if the SCTESTCD=MARISTAT, then the ItemDef "IT.SC.SCTESTCD" is applicable for which there is an associated CodeList "CL.MARISTAT" with allowed values "DIVORSED", "DOMESTIC PARTNER", "LEGALLY SEPARATED", "MARRIED", "NEVER MARRIED" and "WIDOWED".
If the actual value of the current data point is not in the valuelevel codelist (and there is such a codelist), an error is produced:


The complete code for this rule PMDA-CT2005 can be found at:

http://www.xml4pharma.com/download/CT2005_WhereClause.xq

I will of course further refine this XQuery function, especially for multiple RangeCheck child elements and for the "IN" and "NOTIN" comparators. When finished, I will again publish that code.

If you would like to contribute to the development of validation rules in the vendor-neutral XQuery language, just please let me know.