Wednesday, December 21, 2011

SDTM Amendment 1 is out

Just released by CDISC: the SDTM Amendment 1.
The amendment was created by a number of CDISC volunteers in cooperation with CDER (FDA) - other departments seem to have not been involved.
It adds a new set of variables (MedDRA codes and decodes) to the Events classes and also some new variables to the Demographics domain. Also the AE domain is extended with the new "MedDRA" variables, so that each row in the AE now has 51 (!) fields.

Fortunately, the changes and additions are not very extensive nor complicated, so that I could implement everything in our SDTM-ETL software package in just two evenings.

However, the SDTM Amenment 1 also raises a lot of new questions like:
- what about submissions that do not go to CDER (but e.g. to CBER): are the new set of rules also applicable in that case?
- or is the Amendment 1 a "CDER dialect" of the standard?
- other departments than CDER seem not to have been involved. Do they agree on these additions and changes? Were they asked at all?
- why isn't this a new version of the standard. Amendments to "final" (?!) standards are always dangerous.
- MedDRA codes need to be provided as numeric values. Now these codes are 8 characters long. Can SAS XPT cope with such very high numbers? I have some doubts.
- how do I deal with this new "standard" in define.xml? The latter is not even mentioned in the document!
- what do I need to fill in for def:StandardVersion in define.xml? Most software packages use that attribute for finding out which version of the standard was used.

and many many more ...

Saturday, December 10, 2011

Difficult days for CDISC end-to-end

The last two-three weeks were difficult ones for the CDISC end-to-end cause.

There was a lot of discussion within the volunteer team that is developing define.xml 2.0 whether elements like "MeasurementUnit" should be allowed in define.xml 2.0. The reason is that some of the team wanted to discourage (or even forbid) the use of "MeasurementUnit" in define.xml 1.0 and future 2.0 files, as "MeasurementUnit" is not explicitely stated in the define.xml specification. Others were of the opinion that "MeasurementUnit" has always been allowed in define.xml, as define.xml is an extension of the ODM standard. Also, several vendors of software for generating define.xml files do use "Measurement" as it is the most natural way to attach information about which units were used in which tests.
The discussion however has a deeper origin. The real question is whether define.xml is part of the chain in end-to-end (i.e. is the "last mile" to the FDA), or is just something totally different for which (unfortunately?) ODM was used (or abused?). Or as one participant stated "define.xml is another animal".
The discussions were heavy: some of the team were of the opinion that (as define.xml is totally something different) not a single element or attribute of the core ODM should be allowed in ODM if not explicitely mentioned in the define.xml spec (and also refusing to have "MeasurementUnit" in the new define.xml spec), others were of the opinion that define.xml is an important part in CDISC end-to-end, and that people should be allowed to provide additional information (such as MeasurementUnit, Question, RangeCheck etc.) to a define.xml and so to the FDA, supported by an appropriate stylesheet. This information would just come from the original protocol in ODM format without needing transformations.

At the moment that a "groaning teeth" compromise was in sight, allowing people to use define.xml either way ("strict" or "loose"), everything was questioned again by a member stating "a standard that has such compromises is not a standard".

So instead of coming to an end, the heavy discussions started again.

My personal opinion (which you may have guessed already) is that define.xml is an important link in the end-to-end chain, and using ODM elements and thus information coming from the original study design has a tremendous value.

I do not know how the discussion will further evolve. The best I think is that the compromise that was ultimately reached (but gives stomache aches to almost all of us) is implemented. If not, I am afraid that the discussions will go on, and a release of define.xml 2.0 is out-of-sight for several more months.

To better explain the (visionary? although already used by several vendors) idea of end-to-end (i.e. using one transport format from protocol design to submission to the FDA) I have decided, together with a few others, to start a new blog site, which you will soon find at

It will be open to anyone having a good heart for the CDISC end-to-end case.

Saturday, December 3, 2011

A glimpse of hope

A few days, Becky Kush (president of CDISC) informed me that the FDA recently appointed a new CIO who has a track record of success in science in pharma industry.
I do not know Eric Perakslis personally, but I heard some very good things about him. Eric launched the TranSMART project at J&J, so he surely is a visionary IT guy.

The one million dollar question will be if he will really get the empowerment that is needed to structurally change something at the FDA, and bring their IT (which is currently in a desperate state - see the previous blog entry) on such a level that it can cope with the IT level on which the industry is acting.

Already more than a year ago at the Baltimore Interchange, I heard Theresa Mullin of the "Office of Planning and Informatics" (i.e. the CIO's department) telling us about all the plans she had for improving things at CDER. More than a year later still nothing seems to have changed: CDER does not have any servers, databases, good viewers, XML or database knowledge ...
The problem is that (which she told us herself) is that she cannot force any department to develop or implement something (advisory role only).

Will Eric Perakslis also be chained into such an "advisory role" as all his predecessors were? Or will he get the power, people and financial means to really change things? Will he be able to force departments like CDER to bring their IT into the 21st century?

I really hope so ...

Friday, November 18, 2011

The FDA and the "Standard Issues Document"

A week ago, I had a teleconf with a number of people of a department of the FDA regarding the "CDER Common Data Standard Issues Document".

The FDA itself had asked readers of the document to provide/send comments, so I did.
In my comments, I protested against (amongst others) the additional requirement that variables such as EPOCH, ELEMENT, ECTD must be added for each observation domain.
My comment was that this is ridiculous, as this information is already in the study design domains, and can easily be looked up easily e.g. though the visit number. A simple "table join" suffices.

In my classes on databases to undergradute (Bachelor) students, I learn them that redundant data in tables (of databases in this case) should always be avoided, as this easily leads to inconsistency (violation of the "ACID" properties).
With their additional requirements, the FDA is massively introducing redundant data in the SDTM tables, which I consider very bad practice.
But back to the teleconference: it was set up by the FDA represenatives to explain to me the "why" of these additional requirement. It looks as I am sufficiently influencial that they decided I should be talked to.

They explained to me that the reason for these additional (redundancy creating) requirements is that they do not have any mean of joining two tables. So they use the SDTM tables both for storage AND for presentation, as is. They do not load the tables into a database, as they haven't got one, nor the knowledge to create one, nor a server to put them on. They have SAS, but no people that can uses it. That is what they told me.

I was shocked!

Creating databases, populating them, and generating views of them (with joins) is something I learn my undergraduates, and which they pick up quickly, as it is not difficult.

But, the FDA people told me, they hope they will be able to do so in about 1-2 years.
They however appreciated my comments, and told me that they expect a lot of the new cooperation with Phuse, especially with the results of the (now yearly) "FDA/Phuse Computational Science Symposium". So they asked me to attend the next one (in Silverspring, March 2012). I objected that I live in Europe and so would need at least compensation for travel and accomodation, which they can't provide me. I can hardly imagine that the university provides me the necessary funds to attend that symposium. After all, as a professor I am expected to do innovative work, and not to teach the FDA how they can make a join or view on two tables.

So unless someone provides me the funding, I will not attend that symposium.

My personal opinion is that the FDA (i.e. the department I was speaking to) should first get things straight: start hiring people that can install databases, populate the latter with SDTM tables, and create views on them based on the requirements and requests of the reviewers, and make these available to them.
It is all a question of priorities: do I hire an extra reviewer or do I invest in IT (hardware, software and people)? Personally I am convinced that such investment would pay itself back in less than a year.

This teleconf has, unfortunatey, confirmed my feelings about the current state of IT at the FDA (at least as this department is concerned). It is even worse than I expected.
I was feeling very depressed after this teleconf: more than five years after the introduction of SDTM (and the commitment of the FDA was given), they are still not able to do anything with the datasets than just looking at them.
This is not very encouraging for the so many volunteers that have put so much time and effort in the development of the standards and formats, and makes me doubt whether we should put any more effort in projects such as define.xml 2.0, or in an XML format for SDTM submissions, this as long as the FDA is not able implement these within a reasonable period of time.

Wednesday, November 9, 2011

A new challenge

I have recently obtained a professorship in "Medical Informatics and Documentation" at the eHealth department of the "University of Applied Sciences FH Joanneum" in Graz, Austria.
This will enable me to continue my work on the development of CDISC standards (and even being paid for it), but also to do exciting projects in the area of e-healthcare and its integration with clinical research.
My company XML4Pharma will also further exist in the future, but will concentrate more on software development (for CDISC implementation) and less on consultancy and training.

At the university, I am teaching "databases" and "medical informatics", the latter concentrating especially on hospital information systems, their architecture, standards in healthcare (HL7, DICOM, semantic standards like SNOMED, ICD-10, LOINC...).

Austria is currently, slowly but steadily, rolling out a nationwide system for electronic health records (ELGA) and I hope to also contribute to that. For example, I am currently discussing a project with 3 Master students in which they will develop an XForms implementation of the Austrian "elektronische Arztbrief", a letter that is send from one physician to another physician when the latter takes over the care of the patient. When submitted to the server, the form will then be converted into HL7-CDA format.

But I also will work on the further development of the CDISC standards. One of the teams I am in recently published the "Study Design Model in XML" (SDM-XML), an extension to the ODM standard. But that does not finish the work: we need to develop an implementation guide, and provide samples. We need to demonstrate the usefulness of the standard and convince the vendors to implement it in their tools.
I did already take some steps in that direction: the "ODM Designer" now implements SDM-XML, and I also demonstrated (also see previous posts) that an SDM-XML study design can easily be transformed in a "caBIG Patient Study Calendar" (PSC), and its workflow and timings part into a BMPN-2 workflow XML document, that can easily be read into a hospital information system.
That brings us back to the topic of integration between healthcare and clinical research, which is also taken care of by a number of IHE-profiles. So also there, I will be contributing, as I already did in the past (RDF and RPE profiles).

Of course there are already thoughts about the further development of the ODM standard (version 1.4). If you have some things that you think should be added to the "user requirements", please let us know, and we I will add them.

What I will probably discontinue, are my contributions to the further development of CDISC submission standards, such as define.xml and the "MetaData Submission Guide". Reason is that as a professor, I need to do innovative work. My strong impression (confirmed yesterday in a teleconference with people of the FDA - but that's another post) is that, even more than five years after SDTM and define.xml were first introduced (because the FDA wanted to standardize), the FDA is still not able to work with SDTM nor define.xml - the review environment is still completely absent.
So as long as the FDA is not modernizing their IT (to my opinion they are 20 years behind with respect to industry), it may be a waste of time spending time on the further development of formats for submission standards. On the other hand, proving that e.g. an XML format for SDTM submissions (NOT based on HL7-v3) is a giant step forward (which would also enable the FDA to come to a great review environment without high costs - especially statistical software licenses), might boost the modernization of IT at the agency.

But that's another story...

Sunday, September 4, 2011

HL7-v3 thoughts

I was less than 3 weeks on vacation with my 20-year old son, doing some fine mountaineering in the neighbourhood of Chamonix at the foot of the Mont Blanc. Together, we climbed his second 4000 meter peak, for me it was probably my last (getting older ...).
But in these 3 weeks a lot seem to have happened in the HL7 world.
First there was the announcement that the UK is scrapping its National Health IT Network, which did run for 9 years and costed 18.7 billion dollars (about 360$ per resident). You can find the article here. The program has for many years been criticized, including for its choice of HL7-v3 as the basis for electronic health records (EHRs) and the false belief that this standard would solve all problems at once.
Then there was the series of blog articles of Graham Grieve, titled "v3 has failed?". For your reference, Graham is one of the main contributors to the HL7-v3 standard. HL7-v3 has always been criticized to be overcomplicated and difficult and expensive to implement, and also I (as an XML specialist) have a lot of comments on the clumsy way it has been implemented in XML.
But now HL7 has started the "fresh look task force", so there is some hope that within a number of years (I guess 5 at minimum) there is a standard for exchange of health care data that is easy to understand, clear, and easy to implement (the latter also meaning "cheap" to implement).

Now, I will soon start working in a number of projects where CDA (which is based on Hl7-v3) is the basis of everything (more about that in a future post). CDA is there and is being successfully used in EHR systems, though it is not at all perfect (the XML is still very clumsy) and not easy to implement either. But it works, somehow.
I then also hope to be able to contribute, as an XML specialist, to the "HL7 fresh look task force", so thus starting contributing positively rather than critisizing this standard.


Saturday, September 3, 2011

Snapshot versus Transactional for ODM metadata import

CDISC has an excellent ODM Certification system. As one of the developers of different versions of the standard, and as an independent consultant, I regularly help (EDC) vendors with getting their systems certified.
One of the questions that comes back over again is about the difference between "snapshot" and "transactional" for import of metadata only (columns 1 and 3 of the table of the certification webpage). I recently had a chat with Dave Iberson-Hurst (who is doing the certification testings) and he told me there isn't a difference:
"The Transactional metadata import is no different from the Snapshot metadata import. It is an implication of the table layout because it is there for data import. In theory there is nothing in the spec to say the FileType attribute should be set to snapshot or transactional for a metadata import so you could set it to either, I would use snapshot but some people dont. So if you tick one of the boxes you tick both".

The most craziest forms on the world

Within clinical research, we try to make our forms (CRFs) clear, easy-to-use, and clever (the latter especially in the case of eCRFs).
In Germany, we have the "Deutsche Rentenversicherung" (DRV - German state pension fund), which is also responsible for paying out pensions to orphans. Both my children get such a (very small) pension as they lost their mother some years ago.
So every few months, we get a set of forms to fill in, so that the DRV can judge whether my children are still eligible for payments. You can find one of such forms (although not exactly the one I got today) here.
If you try it out, you will get a message that you cannot save the form to your hard disc, but only fill it out using your computer, and then PRINT it out. The reason seems to be that the DRV does not have any mean to receive forms electronically. They only accept paper!
But now back to the forms I have to fill in for my children. These have some pecularities:
- the name, address and date of birth, and insurance number are preprinted on page 1 (fine!).
- on page 2 (the backside of page 1) you MUST fill in ... name, address, date of birth of the submitter (which is of course the same as the receiver data preprinted on page 1).
- if your bank account number has changed, you must give it as an international bank account number (IBAN and BIC), even if the account is in Germany
- if you live abroad, you should give it in the same way (which makes sense) AND additionally fill another set of forms (why???)
- on page 3, the insurance number is preprinted
- on page 3, you you MUST fill in ... name, address, date of birth of the submitter (why, as the preprinted insurance number is the unique identifier anyway?)
- if the orphan is in education (school, high school, university, in education at a company...) you have to fill in when the day, month, and year that the education will finish.
Do I have a crystal ball to see when this will be the case?
- on page 4 (backside of page 3), you have to provide some details about the education. In case it is university, you have to give the current semester (makes sense), and once again (why?) when the orphan will get his/her diploma (again you need the crystal ball here).
- on page 5, the insurance number is preprinted
- on page 5, you you MUST fill in ... name, address, date of birth of the submitter ... Getting frustrated ???
- many of the forms contain a large number of references to law paragraphs, e.g. (translated):
"the DRV is obliged by §48 of the 10th book of the social security law book in connection with §100 part 3 of the 6th book of the social security law book (DGB VI) to regularly test ..."
Got it?

Each time (about twice a year) I have to fill in these forms. Each time, I guess this shortens my life with about a week. These forms should get a header "these forms endanger your health"!

Yes, I know, I live in the BRD (Burokratische Republik Deutschland - Burocratic Republic of Germany)

Maybe one day (e.g. if they pay me for that), I will make an XForms form sample showing how this form can be provided electronically in a smart way ...

Hope we do better in clinical research ...

Tuesday, July 12, 2011

CDISC publishes Study Design Model for public review

CDISC has recently released the "Study Design Model" (SDM) for public review. You can find the draft specification, XML-schemas and a set of samples (e.g. for the workflow of the LZZT trial) at: The SDM is an extension of the ODM standard, the well known CDISC standard for exchange of study design (metadata) and of clinical data . It adds a number of features to the ODM that many of us have long been waiting for, such as trial parameters, inclusion/exclusion criteria, study structure (arms, epochs, segments, activities) and especially, workflows and timing. Please review this new draft standard and send your comments to CDISC before July 22nd.

At the same time, I can announce a new version of the "ODM Study Designer" which already implements the new SDM standad. You can find all necessary information at

I will however tell something more about how the new SDM standard has been implemented in the ODM Study Designer in a next post.

Thursday, June 30, 2011

The CDISC ODM Study Designer 2011

I really worked hard on it during the last days and weeks, but now it is there: the 2011 (beta) release of our popular CDISC ODM Study Designer.

The major new feature is a full implementation of the new CDISC "Study Design Model" (SDM) that was published for public review by CDISC just a few days ago. As this new standard (which is an extension to the ODM Standard) is still "in public review", I also designated the ODM Designer as "beta". The final 2011 version of the software will then be released once the SDM standard is approved and formally released.

You may wonder how it is possible that our new version of the ODM Designer is already there although the SDM has been released just a few days ago! The reason is simple: I was one of the co-developers of the SDM standard (together with Jan Kratky, Andrew Fowler and Peter Villiers), and in each state of the development I tested whether the proposed structure, elements and attributes were implementable in software. In our e-world, a standard that is not implementable in software doesn't make sense isn't it?

So every detail of the standard was tested on implementability, and the result is not only a good standard, but also the immediate availability of a software tool to work with it.

As such, one may say that our ODM Study Designer can be seen as a reference implementation of the new SDM standard.

By using the new standard, one can define features of a study that we not covered yet by the CDISC ODM standard (and that I already wanted to have in it for many years), such as study parameters (also required to be submitted in SDTM format to the FDA), eligibility criteria (i.e. inclusion/exclusion) - also in machine-readable format, study structure (arms, epochs, segments, activities, ...), and most important in my opinion: workflows and timings.

Within the team, I was responsible for the Workflow part, maybe the hardest part. Our workflow implementation is still pretty simple, e.g. not allowing workflows within other workflows, but I think we made something that is simple and easy enough to work with for people that design clinical studies. Of course one can already attach (very) complicated workflows in XML languages like BPEL, XPL, Window Workflow Foundation, or BPMN 2.0, to ODM through the vendor extension, but we wanted something that is easy to learn (the named ones have a steep learning curve) and can easily be implemented, and is still sufficient for describing >90% of the clinical workflows.

In the new ODM Study Designer I also implemented some special features that are very interesting for integration with the patient care world, and especially with hospital planning and information systems. For example, the new version of the ODM Designer allows export of the study workflow in BPMN 2.0 format, which is believed to become the most important standard for workflow definitions. Such an export (which is an XML file) can then be used in a hospital information planning system, in order to plan the clinical study within the care plan.

Another feature of the ODM Study Designer is that it allows to gerate a caBIG Patient Study Calendar (one for each arm on the study). The PSC is used by caBIG (and many other researchers) to provide a study plan for groups of subjects in a clinical study. Our export provides a PSC-XML file that can then be imported into the caBIG PSC software tool and used there.

The new Study Designer now also has a full implementation of the CDASH standards (v.1.0 and v.1.1). Users can just load CDASH forms from a repository that comes with the software, and can immediately start working with these forms.

But have a look for yourself - I already posted some information on the ODM Study Designer website, and there is also a "New features" brochure that is available on request.

Friday, June 24, 2011

Implementing the SEND 3.0 standard

SEND 3.0 final is out, so I started implementing it in one of my software tools (SDTM-ETL) . Now one of the problems is that the standard is delivered as a PDF file (which it should of course) full of tables with information about the individual domains like variable names, variable labels, data type, "CDISC notes" etc..

In order to bring this "knowledge" in a computer program, which I do by means of XML files that are read in when the program starts up, I needed to copy-and-paste the content of the tables from the PDF into the XML using an XML editor.
This of course stupid work! It took me already a full day, and I am still not completely ready.
So why are these tables not delivered by CDISC in a real machine-readable form? And I do not mean Excel files (although that would already be a start). Ideal would of course be that these tables are delivered as XML, as one can then easily transform them (just write a little XSLT) into the XML I need, which can then be read in by any software program that implements or uses the SEND standard.

During this "copy-and-paste" excercise I have been thinking a lot about why this is so (yes, although I am male I can do two things at the same time).
In CDISC we have to kinds of standards: content standards such as SDTM, SEND, controlled terminology, and format standards such as ODM and define.xml. The former are developed by domain specialists, the latter by technical (IT) people. Some of the latter have also become domain experts, but unfortunately there is not much technical (XML?) knowledge in the former group, not sufficient to also publish the content standards in a machine-readable form.
Should these people have these technical skills? Or should the technical people better help the domain experts with making these standards also available in a machine-readable form? I believe in the latter. But are the domain experts really interested in delivering their standards in a machine-readable form? I have some doubts.

CDISC already develops controlled terminology (CT) for more than 5 years. But only very recently all controlled terminology is being made available in a machine-readable XML form (thanks Lex!). I have been asking for this "feature" already for many years, as I had to spend so much time in doing copy-and-paste or using advanced tricks (to transform from Excel to ODM) each time an update was published, and I wanted to implement the new controlled terminology in software. But the CT team was not really interested. Only after NCI took over publication, and started asking for a machine-readable form of the CDISC controlled terminology, things were getting done.

I think we (CDISC) can improve here considerably by encouraging the domain people and the technical people to better work together. Why not make a rule that a technical person should be added to each "domain" working group (SDTM, SEND, ...) and made responsible to take care that a machine-readable version of the new standard is published together with it?
I agree that we do not have sufficient technical people (I mean people with an IT background) in the CDISC volunteer teams. It is volunteer work, and so it is difficult to find people that want to spend a few hours, or a day per week on the development of standards, unpaid. Maybe CDISC should find better ways to reward these people (volunteers must currently even pay their own travel and hotel costs to come to a face-to-face volunteers meeting). There is currently a giant amount of money available in the US for e-health, and it should be possible to obtain some of it to make it attractive to volunteers to work for CDISC.

But back to SEND.
Most of you know that I am an innovative guy who does not like SAS Transport 5. It is an ancient format stemming from the days of mainframe computers (internally SAS Transport 5 is based on IBM-mainframe binary formats). During my copy-and-paste excercise, each time I read sentences like "The value ... cannot be longer than 8 characters, nor can it start with a number" I became pretty frustrated. Why do we still use this ancient format for electronic submissions? Why don't we have an XML based format for this yet? Are SDTM, SEND, ADaM real vendor-neutral standards? Or are they still developed with one vendor in mind? Why isn't there any XML knowledge at the FDA?
But yes, this is a conservative industry. We made some progress, but I think we are still 10 years behind relative to other industries who have already completely switched to XML for many years.
With the very innovative people we have (I hope I can say I am one of them), we can leap up, but we need to get all the stakeholders convinced, especially the FDA.
But that's another topic I will later spend a blog on.

About this blog

Why this blog?

I have become involved as a commenter in several blogs about clinical research, CDISC standards, e-Health, electronic health records, so that I think it is now start to start my own one.

What I will do in this blog is give my thoughts about the CDISC standards, my involvement in their development, my involvement in different IHE-profiles for integration of electronic health records (and eHealth in general) and clinical research, and from time to time some technical tips, especially for working with ODM, define.xml, and SDTM and SEND standards.

I am now involved in the development and implementation of CDISC standards for 10 years, and I can say that I can (almost) claim to be a "CDISC guru". I think I am one of the very few independent consultants that have so many years of experience with the set of CDISC standards.

But I do not want to keep all this knowledge for myself. I want to share it with others. That's one reason for starting this blog. Another is that it is sometimes a good thing to publicly comment on what is currently happening with standards in healthcare, especially in the area of electronic health records and CDISC.
Where can we do better? What is holding is up? Why do acceptance and implementation take so long? What is the status of these standards at the FDA?
These are all topics I have a meaning about, and which I will try to post here.
I apologize in advance if people feel bad or even insulted by my comments. I have seen that people that have been involved in the development of a specific standard feel insulted when I criticize a standard. They shouldn't.
Standards development work is human work, and often compromises must be made in order to get them accepted by all stakeholders. Even when I say something negative about a standard, or give suggestions for improvements, this does not mean that I criticize the people that developed the standard. At the contrary, I have a huge amount of respect for these people.

So let us start now. I hope you will like this blog and send many comments.


Jozef Aerts, XML4Pharma