Saturday, December 29, 2012

SDTM Supplemental Qualifiers

From the document "Proposal for Alternate Handling of Supplemental Qualifiers" of the CDISC SDTM Team:
"The vision at the time the Supplemental Qualifiers structure was created was that standard review tools used at the FDA would automatically display the Non-Standard-Variables (NSVs) together with the parent data in tabular views. The reality, however, is the representation of NSVs in separate SUPP-- datasets has resulted in increased effort by reviewers to prepare for and perform reviews of SDTM datasets.
The end result is that a data structure that was created to provide a standard method for representing NSVs has not been the best structure for the viewing and analysis of SDTM data by FDA reviewers. This document describes a proposed method by which NSVs and standard variables could be represented together in the parent datasets".

Or in other words: FDA reviewers are uncapable of merging/combining datasets (tables) so that the non-standard variables appear in their original table!

Oh my God: even my undergraduate students would be able to do that!

But why haven't the non-standard-variables not always be kept in their original table? Because they would then not be distinguishable from the standard variables? Yes, it is true that SAS Transport 5 does not have the capability of keeping metadata so that standard-variables and non-standard-variables can be distinguished, except for by naming conventions.

Wait a minute!
Metadata belong in the define.xml isn't it?
So why not just state in the define.xml that a variable is a non-standard variable? For example, suppose we have a non-standard variable "QSLANG" ("Question Language") in the QS (Questionnaires) domain. In the current SDTM implementation it is "banned" to the SUPPQS dataset.

We just keep that variable in the QS dataset and write in the define.xml:

<ItemGroupDef OID="QS" Name="QS" ... def:Label="Questionnaires" ...>
    <ItemRef ItemOID="STUDYID" Mandatory="Yes" OrderNumber="1"/>
    <ItemRef ItemOID="DOMAIN" Mandatory="Yes" OrderNumber="2"/>
    <ItemRef ItemOID="USUBJID" OrderNumber="3"/>
    ....
    <ItemRef ItemOID="QS.QSLANG" OrderNumber=".." def:IsNonStandardVariable="Yes"/>
</ItemGroupDef>

and of course define the corresponding ItemDef with OID "QS.QSLANG":
<ItemDef OID="QS.QSLANG" Name="QSLANG" def:Label="Question Language" DataType="text" Length="2"/>

DONE!

Does this help the reviewer in distinguishing between standard variables and non-standard variables?
Of course not very much. The FDA reviewers usually print out the define.xml (using the stylesheet view) and then inspect the XPT datasets in parallel. Not very modern isn't it?
The simple reason is that the viewer cannot read define.xml files.

Suppose however that the datasets do not come as XPT files, but as CDISC ODM (XML) files.
In CDISC ODM there is always a strong relation between the variable definition (ItemDef) and the data value (ItemData) over the OID. So for example, we would have (extending the above example):

<ItemDef OID="QS.QSLANG" Name="QSLANG" def:Label="Question Language" DataType="text" Length="2"/>

and the value:
<ItemGroupData ItemGroupOID="QS">
    <ItemData ItemOID="STUDYID" Value="STUDYX"/>
    <ItemData ItemOID="DOMAIN" Value="QS"/>
    <ItemData ItemOID="USUBJID" Value="P0001"/>
    ...
    <ItemData ItemOID="QS.QSLANG" Value="en"/>
</ItemGroupData>

This allows us to create software that tabulates the data and e.g. colors it depending whether it is a standard variable or a non-standard variable. This is very easy. It even can be done using an XSLT stylesheet when one wants to display the data in a browser.

It even creates new opportunities, such as:
- the software/stylesheet can easily validate the data, and e.g. color cells in the table (with an additional tooltip) for which there is an SDTM rule violation. For example an empty "QSREASND" can be colored red when the value of "QSSTAT" is "NOT DONE", this as the rule is that QSREASND may not be empty when QSSTAT is populated.
- In the LB (Laboratory) domain, values of "LBSTRESN" can be colored when the value is outside the reference range, i.e. lower than the value from LBSTNRLO or higher than the value of LBSTNRHI.
- rows that contain baseline data (--BLFL=Y) are colored differently
- data is sorted based on more than one criteria

All this is currently NOT possible using the SAS XPT Viewer, because it cannot read the define.xml containing the metadata.

There are much more opportunities: the number of them is only limited by your creativity.

Once we know how the future(?) ODM-based XML format for SDTM (and SEND and ADaM) data will look like, we will pilot such software and demonstrate it.
So ... keep tuned!

No comments:

Post a Comment