Saturday, December 29, 2012

Another copy-and-paste frustration

Yesterday and today, I worked on implementing the new SDTM oncology and a number of the new draft SDTM-IG 3.1.4 domains in my SDTM-ELT(TM) software. For each new domain, I created a template define.xml file that can be read by the software.

It was so frustrating!

The new domains come as a PDF file with tables and additional information. During creation of the templates I continously needed to switch between the PDF and between the (smart) XML editor I am using. I needed to copy the variable name, role and datatype and then paste it into the template. I then also needed to add whether the variable is mandatory (required/expected) or not (permissible).

The validation whether everything was correctly implemented was purely visual, comparing a view of the machine-readable template with the tables in the PDF file. Another frustrating task...

In earlier days, the SDTM team also published these tables in an Excel file. Although not a vendor-neutral format, this Excel file at least allowed to automate some things. The way I proceeded was as follows:
First I read the Excel file into OpenOffice Calc (a competitor of Excel). Then I exported the tables as an OpenOffice "odf" file, which is essentially a zip file containing a set of XML files. After renaming and unzipping, one needs to take one of these XML files, and transform it to CDISC define.xml using a stylesheet.
Although each new publication of the SDTM team had differently formatted Excel files, meaning that I needed to adapt the stylesheet to transform to define.xml each time, it was a very good way, and I could generate the templates in a few hours.

The CDISC SDTM team does not publish their tables in a machine-readable format anymore. I do not know why, because I cannot imagine that the team is not using any computer tools to help in generating the tables. So why not publish them in a machine-readable format?

SDTM table definitions in a machine-readable format - that would be cool!

3 comments:

  1. Check out the first deliverable from the CDISC2RDF project http://cdisc2rdf.com/2013/01/20/first-version-of-cdisc2rdf-published/

    For not only a machine-readable but also machine processable format 1) of the SDTM table defintions :-)

    1) See http://opengovdata.io/2012-02/page/5-1-2/principle-data-format-matters)

    ReplyDelete
  2. Thanks Kerstin!
    Looks very interesting ... I will study this in the next few days.

    Essentially what I would like to see is that CDISC publishes any new SDTM version or subversion Implementation Guide in a machine-readable form together with the IG itself.
    My experience is that whenever a new version of the IG is published, my customers want to have the define.xml template for it within a few days after the publication in order to work with it in the SDTM-ETL tool.
    So what I now do is that each time a new version or IG is published, I spend the next nights with ... copy and paste.
    So if you can convince the SDTM team to publish the RDF version together with the IG, or even better, let them work with RDF and generate the IG (prototype) from that, that would really be great!

    Best regards,

    Jozef

    ReplyDelete
  3. Good discussion. I have a longstanding dialogue with Wayne about this. Hope more collegues across pharma:s, CRO:s, software vendors etc. caring about clinical trial data join us in encouraging CDISC, NCI and other SDO:s to take the next step. My thinking is that it's not helpful to say that "SHARE will solve this" (sometime in the future)

    ReplyDelete