Taming the Data Beast from “Cleaning Medical Data with R” workshop by Shannon Pileggi, Crystal Lewis and Peter Higgins presented at R/Medicine 2023. Illustrated by Allison Horst.
Outline
Data harmonisation overview strategy
Counter small but annoying issues during retrospective data harmonisation.
Suggested reports and diagram to create for different clients.
Error: Row counts for the two tables did not match.
The `expect_row_count_match()` validation failed beyond the absolute threshold level (1).
* failure level (1) >= failure threshold (1)
A similar method is done to create a summary report in word using flextable.
Data Variable Justification Report (Explanation)
Initially we have smoke_history with values 0 (Non-smoker), 1 (Current smoker), 2 (Past smoker) and -1 (Unknown).
However, in the early stage of the study, some collaborators could only provide if the patient has a smoking history but could not specify if the patient is a current or past smoker.
Data Variable Justification Report (Explanation)
To deal with this case, we have smoke_history, smoke_current and smoke_past to hold values 1 (Yes), 0 (No) and -1 (Unknown).