Sunday, September 23, 2012

Define.xml draft 2.0 is out

The long-awaited version 2.0 of the define.xml standard has now been published for public review by CDISC. It can be downloaded from the CDISC website at: http://www.cdisc.org/define-xml. If you are interested in define.xml, please download the distribution, review it and send your comments not later than October 1st. There is a separate file in the distribution explaining how to submit your comments.

Version 2.0 (draft) is based on the latest version of the ODM standard (v.1.3.1) taking away many of the limitations of v.1.0 of define.xml.
I am currently writing up my comments (good and bad) and already have more than 10 pages. The overall impression is however good - I think we are again taking a major step forward.

Some things I liked especially in the draft v.2.0:

- the "WhereClause" allows to define which units of measurement and categories (--CAT variables) for which cases (based on ValueLists e.g. for --ORRES).
- the "MethodOID" - "MethodDef" pair allows much better to add mapping information.

Things I did not like:

- "MeasurementRef" - "MeasurementDef" pairs are not supported. They are regarded as extensions to the standard. This is a bit stupid, as it breaks "end-to-end" processing. Instead the "WhereClause" needs to be used, which means an extra processing step.
The "WhereClause" is more general and is also applicable to other variables (such as --CAT). The problem however is not a define.xml or ODM problem, it is an SDTM problem!
Stupidly enough, variable qualifiers such as --ORRESU have been defined in SDTM as extra variables, i.e. extra columns in the SDTM table, instead of defining them as attributes (so a third dimension) to the parent variable (--ORRES). The consequence is that there is no way in SDTM to check whether the pair --ORRES / --ORRESU really matches and makes sense. The same applies to the --TESTCD (test code) / --TEST (test name) pair (also see my previous post)
Had ODM been used for transporting SDTM data, then everything would have been much more simple.
So the current "WhereClause" mechanism must be regarded as the best possible solution for supplying metadata to variables of a badly designed SDTM.

- "SASFormatName" is mandatory for CodeLists. Why? "SASFormatName" is not vendor-neutral! My personal opinion is that such an attribute should go into a "vendor" extension. However, it is traditionally already in ODM for a long time (SAS is a market leader in our industry). The fact that it has been made mandatory in define.xml 2.0 draft seems to have to do with that FDA reviewers want to have the ability to automatically create SAS tables for codelists. But why can't FDA reviewers assign SAS format names for codelists if they want to?

Things I suggest for improvement:

- explicitely allow "FormalExpression" as child of "MethodDef". The draft define.xml allows to add a reference to an external file with the calculation or imputation (source) code. However, it makes much more sense to allow to have the code within the define.xml itself. ODM 1.3.1 has an excellent, existing mechanism for that: FormalExpression.