Thursday, November 8, 2012

SDTM databases and the FDA

Two days again, my (first year) bachelor students had their "Introduction to Databases" exam.

When looking for a good question on "CREATE VIEW" (or joins in general) I was thinking again about SDTM and whether it is suitable as a database. After all, the FDA has done several attempts to create an SDTM datawarehouse, and as everyone having some database skills knows, you cannot (or almost not) create a datawarehouse without having one or more databases.

So I came to the following exercise:

"Given the following tables:


write an SQL statement to generate the following result table:"

and then I gave them a picture of the result SDTM table for the Laboratory "LB" domain.

(P.S. a correct answer is: "
CREATE VIEW Laboratory AS
SELECT t.STUDYID, t.DOMAIN, t.USUBJID, t.LBSEQ, t.LBREFID, t.LBTESTCD, tc.LBTEST, tc.LBCAT, v.VISITNUM, v.VISIT, v.VISITDY FROM Laboratory_test t, Testcode tc, Visit v WHERE t.LBTESTCD = tc.LBTESTCD AND t.VISITNUM = v.VISITNUM; "  )

This brought me to the following thoughts: "if we submit our SDTM datasets as essentially being a "View" on an SDTM database (see a previous contribution), how can the FDA reconstruct a database from this?"

After all, they want to use this kind of data in a datawarehouse, so they need to start from databases. If they would like to reconstruct the database from the (sas transport) tables, they do need to split the "view" (e.g. the LB sas dataset) into 3 or more tables.
Or can one start from the "view" to populate a datawarehouse?

Splitting up a "view" table into 3 (or more) tables so that the first, second and third normal forms are obeyed, and doing this in an automated way, does not look to be very simple to me.
Just suppose that there is one inconsistency in the "view" (sas dataset) e.g. that for the same LBTESTCD (e.g. "BILI") there is more than one corresponding value for LBTEST (e.g. once "Bilirubin" and one "Billirubin"). What would then happen?

But we do not know whether the FDA really tries to reconstruct the original database, or that it just uses the SDTM (sas) tables "as is".

Any comments (especially from FDA people) are of course very welcome.

No comments:

Post a Comment