Data Merging: Tolterodine Protocol A6121123
This document details the importation of raw data, and the matching/merging of the various data
sets into one dataset. This document constitutes a “literate program”: it contains computer code that isrun in the process of creating this document, and the merged data set is created simultaneously. Thus,it is a reliable, reproducible description of the data set used in further analysis.
In this section we detail the importing of the raw data files, and provide brief structure summaries.
This dataset gives information about the assignment of subjects to treatments.
randomization data were provided in a PDF file. I copied and pasted the information into a text file. Wesubtract a constant from each subject ID to remove the first 4 digits.
R> randomization = read.table("A6121123 A7.2-2.txt", header = TRUE)R> randomization$Subj = randomization$Subj - 10010000
: Factor w/ 6 levels "ABC","ACB","BAC",.: 2 1 5 1 2 5 1 3 1 2 .
: Factor w/ 6 levels "ABC","ACB","BAC",.: 2 1 5 1 2 5 1 3 1 2 .
$ FirstTrtDate: Factor w/ 6 levels "02FEB2005","14MAR2005",.: 1 1 6 5 4 4 2 5 5 6 . $ FirstTrtTime: Factor w/ 4 levels "6:59","7:00",.: 3 4 2 1 2 2 4 2 4 2 . $ AgeGrp
: Factor w/ 2 levels "46-64","65+": 1 2 2 1 2 2 1 1 1 1 .
The driving data were sent as an Excel file that was opened and re-saved as a comma-delimited
R> driving = read.csv("Pfizer All Data Sent Final.csv", na = ".")R> driving$Group = factor(driving$Group)R> driving$Week = factor(driving$Week)R> driving$Day = factor(driving$Day)R> driving$Time = factor(driving$Time)
1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 .
: Factor w/ 24 levels "1001-1101","1001-1102",.: 5 6 12 13 4 8 9 14 3 10 .
: Factor w/ 6 levels "1","2","3","4",.: 1 1 1 1 2 2 2 2 3 3 .
: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 .
: Factor w/ 2 levels "NS","SN": 2 1 2 1 2 1 2 1 2 1 .
: Factor w/ 2 levels "4","5": 1 2 1 2 1 2 1 2 1 2 .
: Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2 1 1 .
-0.710 -0.916 -1.102 -1.008 -0.888 .
0.0975 0.0824 0.0770 0.0745 0.0992 .
The neuropsychological data were provided as an SPSS SAV file. The dataset contains two missing-
value codes: −3 denotes a value that is missing because the protocol does not call for data to be collectedthere; −9 is the code for an actual nonresponse. When we merge the datasets, the −3s will not be an issueat all. However, it is expedient to just set all the negative values to missing.
R> library(foreign)R> npdata = read.spss("np-Data.sav", to.data.frame = TRUE)R> for (i in 1:length(npdata)) npdata[[i]][npdata[[i]] < 0] = NAR> rm(i)
: Factor w/ 2 levels "right","left": 1 1 2 1 1 1 1 1 1 1 .
530 133 133 434 291 245 342 120 84 208 .
.- attr(*, "value.labels")= Named num 17. .- attr(*, "names")= chr ">16%ile"
9.5 16 17 20.5 23.5 22 25 20 19 17.5 .
.- attr(*, "value.labels")= Named num 19. .- attr(*, "names")= chr "<20tscr"
.- attr(*, "value.labels")= Named num 0. .- attr(*, "names")= chr "<1%ile"
100 100 100 100 100 100 100 100 100 100 .
.- attr(*, "value.labels")= Named num 17. .- attr(*, "names")= chr ">16%ile"
.- attr(*, "value.labels")= Named num 17. .- attr(*, "names")= chr ">16&ile"
46.2 51.2 54 44.6 46.6 44.4 42.8 49 38 47.2 .
- attr(*, "variable.labels")= Named chr
"participant study group" "participant study number" "participant visit number" "participant age" .
"GROUP" "ID" "SESSION" "AGE" .
Some other data collected before and after each driving run were sent in comma-delimited format.
R> other = read.csv("Other.csv", na = ".")
1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 .
: Factor w/ 24 levels "1001-1101","1001-1102",.: 5 5 5 5 5 5 6 6 6 6 .
: Factor w/ 2 levels "NS","SN": 2 2 2 2 2 2 1 1 1 1 .
The data are matched by first sorting to ensure that the subject IDs line up (and also confirming that thesubject IDs match among the tables)
R> ord = order(randomization$Subj)R> randomization = randomization[ord, ]R> print(randomization$Subj)
[1] 1101 1102 1104 1106 1107 1108 1111 1113 1114 1115 1116 1117 1118 1119 1120
[16] 1125 1127 1128 1131 1132 1136 1139 1140 1143
R> ord = order(npdata$SESSION, npdata$ID)R> npdata = npdata[ord, ]R> print(npdata$ID[1:24])
[1] 1101 1102 1104 1106 1107 1108 1111 1113 1114 1115 1116 1117 1118 1119 1120
[16] 1125 1127 1128 1131 1132 1136 1139 1140 1143
R> ord = order(driving$Week, driving$Screen.Num)R> driving = driving[ord, ]R> print(driving$Screen.Num[1:24])
[1] 1001-1101 1001-1102 1001-1104 1001-1106 1001-1107 1001-1108 1001-1111[8] 1001-1113 1001-1114 1001-1115 1001-1116 1001-1117 1001-1118 1001-1119
[15] 1001-1120 1001-1125 1001-1127 1001-1128 1001-1131 1001-1132 1001-1136[22] 1001-1139 1001-1140 1001-114324 Levels: 1001-1101 1001-1102 1001-1104 1001-1106 1001-1107 . 1001-1143
R> ord = order(other$Day, other$Week, other$ScreenNo)R> other = other[ord, ]R> print(other$ScreenNo[1:24])
[1] 1001-1101 1001-1102 1001-1104 1001-1106 1001-1107 1001-1108 1001-1111[8] 1001-1113 1001-1114 1001-1115 1001-1116 1001-1117 1001-1118 1001-1119
[15] 1001-1120 1001-1125 1001-1127 1001-1128 1001-1131 1001-1132 1001-1136[22] 1001-1139 1001-1140 1001-114324 Levels: 1001-1101 1001-1102 1001-1104 1001-1106 1001-1107 . 1001-1143
We now need to select appropriate entries from each variable and bring them into a merged data set. Note
that the data tables have different numbers of observations For analysis purposes, we want the merged dataset to have 3 × 24 = 72 rows, consisting of an observation for each subject for each week of the experiment. Thus, the driving data set serves as the basic structure.
Variables from the randomization table need to come in three times:
R> attach(randomization)R> merged$Seq = rep(seq, 3)R> s = as.character(seq)R> trt = c(substr(s, 1, 1), substr(s, 2, 2), substr(s, 3, 3))R> merged$Trt = factor(trt, labels = c("Tol", "Oxy", "Pbo"))R> levels(merged$Seq) = c("TOP", "TPO", "OTP", "OPT", "PTO", "POT")R> rm(s, trt)R> detach()
To bring-in variables from the npdata table, the following functions are useful. forsess returns the indicesif the rows in npdata that correspond to the given session(s).
addnp adds the given variable var from npdata to merged, using the specified sessions sess. When it is onlyone session, the 24 values are replicated 3 times and the given prefix is prepended to the variable name. Otherwise it is assumed (without checking!) that the defaut sessions are 1:7, and three variables are added:three copies of the variable for session 1 (prepended with Base for the baseline values), one for sessions 2,4, and 6 (prepended with Pre because these are at the beginnings of the experimental weeks); and one forsessions 3, 5, and 7 (prepended with Post because these are at the ends of the experimental weeks).
R> addnp = function(var, sess = 1:7, prefix = "") {R+
nm1 = paste(prefix, var, sep = "")
cat(paste("Variable", nm1, "added\n"))
nmbase = paste("Base", var, sep = "")
nm1 = paste("Pre", var, sep = "")
nm2 = paste("Post", var, sep = "")
cat(paste("Variables ", nmbase, ", ", nm1, ", and ",
nm2, " added\n", sep = ""))
So here we go with the needed neuropsychological variables:
Variables only measured in one session (0 or 1)
Variables BaseTMTASCR, PreTMTASCR, and PostTMTASCR added
Variables BaseTMTBSCR, PreTMTBSCR, and PostTMTBSCR added
Variables BaseDIGITSYM, PreDIGITSYM, and PostDIGITSYM added
Variables BaseLNSSCORE, PreLNSSCORE, and PostLNSSCORE added
Variables BaseRAVLT1, PreRAVLT1, and PostRAVLT1 added
Variables BaseRAVLTSUM, PreRAVLTSUM, and PostRAVLTSUM added
Variables BaseRAVLTDLY, PreRAVLTDLY, and PostRAVLTDLY added
Variables BaseBUTTONS, PreBUTTONS, and PostBUTTONS added
Variables BaseCHOOSER, PreCHOOSER, and PostCHOOSER added
Variables BaseTAPPER, PreTAPPER, and PostTAPPER added
The Pfizer protocol specifies two composite variables that we will label Memory and Speed, each an averageof three standardized variables. For convenience, we define vectors with their names:
R> memvars = c("RAVLT1", "RAVLTSUM", "RAVLTDLY")R> spdvars = c("TMTASCR", "TMTBSCR", "CHOOSER")
Here is a utility function to create these variables. It standardizes with respect to the values in sessionswrt.sess, and outputs the values for the sessions in out.sess. If gain=TRUE, the values from session 1 aresubtracted first, and we make sure that session 1 is excluded from wrt.sess.
R> comp = function(vars, wrt.sess = 1:7, out.sess = c(3, 5, 7),R+
R+ }R> merged$BaseMemory = rep(comp(memvars, out = 1), 3)R> merged$PreMemory = comp(memvars, out = c(2, 4, 6))R> merged$PostMemory = comp(memvars)R> merged$BaseSpeed = rep(comp(spdvars, out = 1), 3)R> merged$PreSpeed = comp(spdvars, out = c(2, 4, 6))R> merged$PostSpeed = comp(spdvars)
The other table has 144 observations; after sorting, the first 72 are data collected on day 4 or each week,and the last 72 are data collected on day 5. The code below reflects this organization in extracting separatevariables for days 4 and 5.
R> merged$PreDrowsy4 = other$PreDrowsy[1:72]R> merged$PreDrowsy5 = other$PreDrowsy[73:144]R> merged$PreImp4 = other$PreImp[1:72]R> merged$PreImp5 = other$PreImp[73:144]R> merged$PostDrowsy4 = other$PostDrowsy[1:72]R> merged$PostDrowsy5 = other$PostDrowsy[73:144]R> merged$PostImp4 = other$PostImp[1:72]R> merged$PostImp5 = other$PostImp[73:144]
In these variables, “pre” and “post” refer to data collected before and after the test on that day (neuropsy-chological on one day and driving on the other, depending on Order).
Here is a structure listing of the merged dataset.
1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 .
: Factor w/ 24 levels "1001-1101","1001-1102",.: 1 2 3 4 5 6 7 8 9 10 .
: Factor w/ 6 levels "1","2","3","4",.: 4 4 3 2 1 1 6 2 2 3 .
: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 .
: Factor w/ 2 levels "NS","SN": 2 2 2 2 2 1 1 1 2 1 .
: Factor w/ 2 levels "4","5": 1 1 1 1 1 2 2 2 1 2 .
: Factor w/ 2 levels "1","2": 1 2 1 1 1 1 2 1 2 1 .
-0.993 -1.080 -1.353 -0.888 -0.710 .
0.0974 0.1168 0.0881 0.0992 0.0975 .
0.0954 0.1024 0.0836 0.0917 0.0942 .
: Factor w/ 6 levels "TOP","TPO","OTP",.: 2 1 5 1 2 5 1 3 1 2 .
: Factor w/ 3 levels "Tol","Oxy","Pbo": 1 1 3 1 1 3 1 2 1 1 .
121 101 112 118 123 119 106 112 102 118 .
20.5 25 16 15 19 10 16.5 13 7.5 17 .
44.6 42.8 51.2 39.4 38 34 54 51.4 49 54 .
48.6 49.6 47.2 42.2 NA NA 55 50.6 57.8 53 .
46.2 49 48.2 40 41 39.6 60 49.6 47.4 47 .
Here are statistical summaries of all the variables
P1.Avg.Fix.Length P1.Avg.Sacc.Length P2.Steer.Instab
We now save the data as both an R worksheet and a CSV file that can be imported into practically anystatistical program. In the CSV file, missing values will be empty cells.
R> save.image("merged.RData")R> write.csv(merged, file = "merged.csv", row.names = FALSE, na = "")
Zinacef (Cefuroxime for Injection) SECTION 1: Identification of the substance/mixture and of the company/undertaking 1.1. Product identifier Product name. : ZINACEF 750 MG * ZINACEF 1.5G * ZINACEF 7.5G * CEFUROXIME SODIUM, 1.2. Relevant identified uses of the substance or mixture and uses advised against Use of the substance/preparation 1.3. Details of the supplier of t
CURRICULUM VITAE Prof Petro Terblanche Tel: 021 938-0245 (w) Fax: 021 938-0356 Cell: 082 893 1850 e-mail: [email protected] January 2007 CAPABILITY STATEMENT As evident from my detailed CV, I have since the start of my professional career in 1981, made three very significant and distinctly different career moves. However, there was a clear evolution from single research project executio