Friday, June 24, 2011

Implementing the SEND 3.0 standard

SEND 3.0 final is out, so I started implementing it in one of my software tools (SDTM-ETL) . Now one of the problems is that the standard is delivered as a PDF file (which it should of course) full of tables with information about the individual domains like variable names, variable labels, data type, "CDISC notes" etc..

In order to bring this "knowledge" in a computer program, which I do by means of XML files that are read in when the program starts up, I needed to copy-and-paste the content of the tables from the PDF into the XML using an XML editor.
This of course stupid work! It took me already a full day, and I am still not completely ready.
So why are these tables not delivered by CDISC in a real machine-readable form? And I do not mean Excel files (although that would already be a start). Ideal would of course be that these tables are delivered as XML, as one can then easily transform them (just write a little XSLT) into the XML I need, which can then be read in by any software program that implements or uses the SEND standard.

During this "copy-and-paste" excercise I have been thinking a lot about why this is so (yes, although I am male I can do two things at the same time).
In CDISC we have to kinds of standards: content standards such as SDTM, SEND, controlled terminology, and format standards such as ODM and define.xml. The former are developed by domain specialists, the latter by technical (IT) people. Some of the latter have also become domain experts, but unfortunately there is not much technical (XML?) knowledge in the former group, not sufficient to also publish the content standards in a machine-readable form.
Should these people have these technical skills? Or should the technical people better help the domain experts with making these standards also available in a machine-readable form? I believe in the latter. But are the domain experts really interested in delivering their standards in a machine-readable form? I have some doubts.

CDISC already develops controlled terminology (CT) for more than 5 years. But only very recently all controlled terminology is being made available in a machine-readable XML form (thanks Lex!). I have been asking for this "feature" already for many years, as I had to spend so much time in doing copy-and-paste or using advanced tricks (to transform from Excel to ODM) each time an update was published, and I wanted to implement the new controlled terminology in software. But the CT team was not really interested. Only after NCI took over publication, and started asking for a machine-readable form of the CDISC controlled terminology, things were getting done.

I think we (CDISC) can improve here considerably by encouraging the domain people and the technical people to better work together. Why not make a rule that a technical person should be added to each "domain" working group (SDTM, SEND, ...) and made responsible to take care that a machine-readable version of the new standard is published together with it?
I agree that we do not have sufficient technical people (I mean people with an IT background) in the CDISC volunteer teams. It is volunteer work, and so it is difficult to find people that want to spend a few hours, or a day per week on the development of standards, unpaid. Maybe CDISC should find better ways to reward these people (volunteers must currently even pay their own travel and hotel costs to come to a face-to-face volunteers meeting). There is currently a giant amount of money available in the US for e-health, and it should be possible to obtain some of it to make it attractive to volunteers to work for CDISC.

But back to SEND.
Most of you know that I am an innovative guy who does not like SAS Transport 5. It is an ancient format stemming from the days of mainframe computers (internally SAS Transport 5 is based on IBM-mainframe binary formats). During my copy-and-paste excercise, each time I read sentences like "The value ... cannot be longer than 8 characters, nor can it start with a number" I became pretty frustrated. Why do we still use this ancient format for electronic submissions? Why don't we have an XML based format for this yet? Are SDTM, SEND, ADaM real vendor-neutral standards? Or are they still developed with one vendor in mind? Why isn't there any XML knowledge at the FDA?
But yes, this is a conservative industry. We made some progress, but I think we are still 10 years behind relative to other industries who have already completely switched to XML for many years.
With the very innovative people we have (I hope I can say I am one of them), we can leap up, but we need to get all the stakeholders convinced, especially the FDA.
But that's another topic I will later spend a blog on.

No comments:

Post a Comment