Thursday, December 4, 2014

Machine executable FDA rules for SDTM

In my previous posts "FDA publishes Study Validation Rules" and "Follow up to 'FDA publishes Study Validation Rules'" I showed how these rules can be expressed in XQuery, an open W3C standard query language for XML documents and XML databases.

I made some good progress in the last few days, and could already implement and test about 1/6th of the rules. My "rule writing pace" even increases, as I get more and more experience with the XQuery language, which was also pretty new for me.

So I wonder why the FDA (with considerably more resources than I have) did not publish these rules as machine-executable rules.

One of the great things of XQuery, is that one can easily do cross-document quering.
Another example is given below (I hope it displays well). It is the XQuery for rule FDAC049, requiring that there are no EX records for subjects that were not assigned to an arm (ARMCD='NOTASSGN' in DM). It took me about 15 minutes to develop and test this rule. Here it is:

(: Rule FDAC49: EX record is present, when subject is not assigned to an arm: Subjects that have withdrawn from a trial before assignment to an Arm (ARMCD='NOTASSGN') should not have any Exposure records :)
xquery version "3.0";
declare namespace def = "http://www.cdisc.org/ns/def/v2.0";
declare namespace odm="http://www.cdisc.org/ns/odm/v1.3";
declare namespace data="http://www.cdisc.org/ns/Dataset-XML/v1.0";
declare namespace xlink="http://www.w3.org/1999/xlink";
let $base := '/db/fda_submissions/cdisc01/'
let $define := 'define2-0-0-example-sdtm.xml'
(: Get the DM dataset :)
let $dmdatasetname := doc(concat($base,$define))//odm:ItemGroupDef[@Name='DM']/def:leaf/@xlink:href
let $dmdataset := concat($base,$dmdatasetname)
(: Get the EX dataset :)
let $exdatasetname := doc(concat($base,$define))//odm:ItemGroupDef[@Name='EX']/def:leaf/@xlink:href
let $exdataset := concat($base,$exdatasetname)
(: get the OID for the ARMCD variable in the DM dataset :)
let $armcdoid := doc(concat($base,$define))//odm:ItemDef[@Name='ARMCD']/@OID  (: supposing there is only one :)
(: and the OID of USUBJID - which is the third variable :)
let $usubjidoid := doc(concat($base,$define))//odm:ItemGroupDef[@Name='DM']/odm:ItemRef[3]/@ItemOID
(: we also need the OID of the USUBJID in the EX dataset :)
let $exusubjidoid := doc(concat($base,$define))//odm:ItemGroupDef[@Name='EX']/odm:ItemRef[3]/@ItemOID
(: in the DM dataset, select the subjects which have ARMCD='NOTASSGN' :)
for $rec in doc($dmdataset)//odm:ItemGroupData[odm:ItemData[@ItemOID=$armcdoid and @Value='NOTASSGN']]
    let $usubjidvalue := $rec/odm:ItemData[@ItemOID=$usubjidoid]/@Value
    (: and the record number for which ARMCD='NOTASSGN' :)
    let $recnum := $rec/@data:ItemGroupDataSeq
    (: and now check whether there is a record in the EX dataset :)
    let $count := count(doc($exdataset)//odm:ItemGroupData[odm:ItemData[@ItemOID=$exusubjidoid]])
    where $count > 0  (: at least one EX record was found :)
    return <warning rule="FDAC049" recordnuber="{data($recnum)}">{data($count)} EX records were found in EX dataset {data($exdatasetname)} for USUBJID={data($usubjidvalue)} although subject has not been assigned to an arm (ARMCD='NOTASSGN') in DM dataset {data($dmdatasetname)}</warning>

Comment lines are in (: this is a comment line :)

And here is a snapshot of the test result:


I guess that I would not have been able to develop and test this rule in Java in less than 15 minutes...

The advantage of using an open standard like XQuery is that everyone is using the same rule, and that there is no room for different interpretations, unlike in a Java programm, which essentially is a "black box" implementation. As such, these rules in XQuery, can function as "reference implementation", meaning that any software application (such as a Java programm) needs to give the same results as the reference implementation does.