Tackling Formatted Tabular Data from Excel

10th July 2024

Formatted Cell in Excel

Formatted Cell in Excel

Formatted cells are useful for clinicians to make highlight important information. They tend to be well received by people but not so for software.

Here is one called sample_excel.xlsx.

A peak view of a formatted data set.

Formatted Cell in Excel

The column date_issue has two format.

One format is in General

A column named date_issue with some Excel cells formatted as General.

The other format is in Date

A column named date_issue with some Excel cells formatted as Date.

Formatted Cell in Excel

The column colour_weight has two colour format.

  • Cells in black are weight in pounds
  • Cells in green are weight in kilogram

A column named colour_weight with some Excel cells in black font for weight in pounds and in green font for weight in kilogram.

The columns text_integer_issue and text_numeric_issue are numeric columns but some cells were formatted as text. These cells are indicated by the green triangle.

Two columns named text_integer_issue and text_numeric_issue have some cells with a green triangle to indicate that the cells were formatted as text instead of numeric.

Formatted Cell in Excel

The columns numeric_integer_issue and one_or_zero_issue are numeric columns.

  • numeric_integer_issue has only positive integer values
  • one_or_zero_issue has only values 0 and 1 but has missing values on the first few hundred rows

Two columns named numeric_integer_issue which has only positive integer values and one_or_zero_issue which has only values 0 and 1 but has missing values on the first few hundred rows.

Read Data Attempts

Read Data Attempt 1

Tried to read the data using readxl::read_excel. No warning was provided but…

sample_excel_attempt_1 <- readxl::read_excel(
  path = here::here("sample_excel.xlsx"),
  sheet = "Sheet1"
)
A peak view of the data set used.

Read Data Attempt 1

Here is the output

  • date_issue: those formatted as Date have been turned to numbers
  • colour_weight: different colour inputs not differentiated

A column named date_issue with some Excel cells formatted as Date.

Read Data Attempt 1

str(sample_excel_attempt_1)
tibble [1,053 × 7] (S3: tbl_df/tbl/data.frame)
 $ id                                                     : chr [1:1053] "ID0001" "ID0002" "ID0003" "ID0004" ...
 $ date_issue                                             : chr [1:1053] "11/11/2017" "18/10/2017" NA "32/1/2017" ...
 $ colour_weight 
Black in pounds
Green in kilograms: num [1:1053] 92 194 61 165 148 86 84 182 80 78 ...
 $ text_integer_issue                                     : chr [1:1053] "74" "54" "53" "64" ...
 $ text_numeric_issue                                     : chr [1:1053] "1.6" "0.14000000000000001" "0.96" "0.02" ...
 $ numeric_integer_issue                                  : num [1:1053] 1 55 9 2 3 7 1 3 75 23 ...
 $ one_or_zero_issue                                      : logi [1:1053] NA NA NA NA NA NA ...

Good news

  • numeric_integer_issue: column is read correctly as numeric

Bad news

  • text_integer_issue: column turned to text
  • text_numeric_issue: column turned to text
  • one_or_zero_issue: column turned to logical

Read Data Attempt 2

When I read the formatted data indicating the column types, it gives intimidating warnings.

  • id as “text”
  • date_issue as “date”
  • colour_weight as “numeric”
  • text_integer_issue and text_numeric_issue as “numeric”
  • numeric_integer_issue and one_or_zero_issue as “numeric”
sample_excel_attempt_2 <- readxl::read_excel(
  path = here::here("sample_excel.xlsx"),
  sheet = "Sheet1",
  col_types = c("text" , "date", 
                "numeric", "numeric", 
                "numeric", "numeric", 
                "numeric")
)
Warning: Expecting date in B2 / R2C2: got '11/11/2017'
Warning: Expecting date in B3 / R3C2: got '18/10/2017'
Warning: Coercing text to numeric in D3 / R3C4: '54'
Warning: Coercing text to numeric in E4 / R4C5: '0.96'
Warning: Expecting date in B5 / R5C2: got '32/1/2017'
Warning: Coercing text to numeric in E5 / R5C5: '0.02'
Warning: Coercing text to numeric in E6 / R6C5: '0.23'
Warning: Expecting date in B7 / R7C2: got '29/11/1985'
Warning: Coercing text to numeric in D7 / R7C4: '33'
Warning: Coercing text to numeric in D8 / R8C4: '35'
Warning: Coercing text to numeric in D9 / R9C4: '53'
Warning: Coercing text to numeric in D10 / R10C4: '187'
Warning: Coercing text to numeric in E10 / R10C5: '0.06'
Warning: Expecting date in B11 / R11C2: got '21/11/1962'
Warning: Expecting date in B12 / R12C2: got '30/11/2021'
Warning: Expecting date in B14 / R14C2: got '14/11/2016'
Warning: Expecting date in B15 / R15C2: got '26/09/2016'
Warning: Expecting date in B16 / R16C2: got '15/12/1962'
Warning: Expecting date in B17 / R17C2: got '10/08/2016'
Warning: Expecting date in B18 / R18C2: got '18/07/2016'
Warning: Expecting date in B19 / R19C2: got '19/01/2019'
Warning: Expecting date in B20 / R20C2: got '05/03/2015'
Warning: Expecting date in B21 / R21C2: got '27/10/2018'
Warning: Expecting date in B22 / R22C2: got '19/09/2018'
Warning: Expecting date in B23 / R23C2: got '04/06/2018'
Warning: Expecting date in B24 / R24C2: got '23/04/2018'
Warning: Expecting date in B26 / R26C2: got '29/11/2017'
Warning: Expecting date in B27 / R27C2: got '21/1/2010'
Warning: Expecting date in B29 / R29C2: got '02/08/2017'
Warning: Expecting date in B31 / R31C2: got '22/11/2017'
Warning: Expecting date in B32 / R32C2: got '17/03/2018'
Warning: Expecting date in B33 / R33C2: got '18/03/2018'
Warning: Expecting date in B38 / R38C2: got '16/7/2019'
Warning: Expecting date in B39 / R39C2: got '16/7/2019'
Warning: Expecting date in B40 / R40C2: got '15/8/2020'
Warning: Expecting date in B41 / R41C2: got '16/4/2021'
Warning: Expecting date in B42 / R42C2: got '16/12/2022'
Warning: Expecting date in B43 / R43C2: got '21/8/2023'
Warning: Expecting date in B44 / R44C2: got '1/10/2024'
Warning: Expecting date in B45 / R45C2: got '9/11/2025'
Warning: Expecting date in B46 / R46C2: got '30/3/2026'
Warning: Expecting date in B47 / R47C2: got '18/5/1962'
Warning: Expecting date in B48 / R48C2: got '29/10/2018'
Warning: Expecting date in B49 / R49C2: got '29/10/2019'
Warning: Expecting date in B50 / R50C2: got '29/10/2020'
Warning: Expecting date in B51 / R51C2: got '29/10/2021'
Warning: Expecting date in B52 / R52C2: got '11/9/2022'
Warning: Expecting date in B53 / R53C2: got '30/12/2023'
Warning: Expecting date in B54 / R54C2: got '31/3/2024'
Warning: Expecting date in B55 / R55C2: got '29/2/2020'
Warning: Expecting date in B56 / R56C2: got '7/2/1970'
Warning: Expecting date in B57 / R57C2: got '15/5/1999'
Warning: Expecting date in B58 / R58C2: got '13/8/2019'
Warning: Expecting date in B59 / R59C2: got '9/6/1989'
Warning: Expecting date in B60 / R60C2: got '17/1/1970'
Warning: Expecting date in B61 / R61C2: got '1/8/1997'
Warning: Expecting date in B62 / R62C2: got '3/9/2020'
Warning: Expecting date in B63 / R63C2: got '10/01/2007'
Warning: Expecting date in B64 / R64C2: got '24/01/1972'
Warning: Expecting date in B65 / R65C2: got '10/02/1993'
Warning: Expecting date in B66 / R66C2: got '21/09/1981'
Warning: Expecting date in B67 / R67C2: got '03/04/2014'
Warning: Expecting date in B68 / R68C2: got '08/04/2011'
Warning: Expecting date in B69 / R69C2: got '09/06/1990'
Warning: Expecting date in B70 / R70C2: got '08/09/1976'
Warning: Expecting date in B71 / R71C2: got '18/12/2001'
Warning: Expecting date in B72 / R72C2: got '01/06/2004'
Warning: Expecting date in B73 / R73C2: got '12/02/1973'
Warning: Expecting date in B74 / R74C2: got '09/09/2010'
Warning: Expecting date in B75 / R75C2: got '27/03/1981'
Warning: Expecting date in B76 / R76C2: got '16/01/1978'
Warning: Expecting date in B77 / R77C2: got '29/08/1979'
Warning: Expecting date in B78 / R78C2: got '30/08/1997'
Warning: Expecting date in B79 / R79C2: got '20/05/1990'
Warning: Expecting date in B80 / R80C2: got '04/01/1987'
Warning: Expecting date in B81 / R81C2: got '03/08/2014'
Warning: Expecting date in B82 / R82C2: got '17/10/1968'
Warning: Expecting date in B83 / R83C2: got '26/11/1980'
Warning: Expecting date in B84 / R84C2: got '11/12/2019'
Warning: Expecting date in B85 / R85C2: got '15/06/2000'
Warning: Expecting date in B86 / R86C2: got '08/10/1996'
Warning: Expecting date in B87 / R87C2: got '09/01/1964'
Warning: Expecting date in B88 / R88C2: got '13/09/1986'
Warning: Expecting date in B89 / R89C2: got '31/05/1962'
Warning: Expecting date in B90 / R90C2: got '26/10/1960'
Warning: Expecting date in B91 / R91C2: got '05/01/1972'
Warning: Expecting date in B92 / R92C2: got '09/12/1988'
Warning: Expecting date in B93 / R93C2: got '14/08/1976'
Warning: Expecting date in B94 / R94C2: got '14/06/1982'
Warning: Expecting date in B95 / R95C2: got '17/04/1960'
Warning: Expecting date in B96 / R96C2: got '01/05/1992'
Warning: Expecting date in B97 / R97C2: got '01/09/1989'
Warning: Expecting date in B98 / R98C2: got '05/08/1981'
Warning: Expecting date in B99 / R99C2: got '24/04/1986'
Warning: Expecting date in B100 / R100C2: got '07/06/2005'
Warning: Expecting date in B101 / R101C2: got '20/07/1966'
Warning: Expecting date in B102 / R102C2: got '31/07/2007'
Warning: Expecting date in B103 / R103C2: got '19/10/1995'
Warning: Expecting date in B104 / R104C2: got '02/05/2001'
Warning: Expecting date in B105 / R105C2: got '02/01/1961'
Warning: Expecting date in B106 / R106C2: got '06/11/1977'
Warning: Expecting date in B107 / R107C2: got '25/11/1997'
Warning: Expecting date in B108 / R108C2: got '26/08/1962'
Warning: Expecting date in B109 / R109C2: got '21/12/1981'
Warning: Expecting date in B110 / R110C2: got '28/09/1986'
Warning: Expecting date in B111 / R111C2: got '29/03/1986'
Warning: Expecting date in B112 / R112C2: got '28/02/2004'
Warning: Expecting date in B113 / R113C2: got '16/08/1967'
Warning: Expecting date in B114 / R114C2: got '10/03/1962'
Warning: Expecting date in B115 / R115C2: got '23/02/2002'
Warning: Expecting date in B116 / R116C2: got '17/03/2010'
Warning: Expecting date in B117 / R117C2: got '16/09/1975'
Warning: Expecting date in B118 / R118C2: got '04/09/1982'
Warning: Expecting date in B119 / R119C2: got '25/06/1973'
Warning: Expecting date in B120 / R120C2: got '15/07/1991'
Warning: Expecting date in B121 / R121C2: got '11/06/1997'
Warning: Expecting date in B122 / R122C2: got '21/09/2002'
Warning: Expecting date in B123 / R123C2: got '07/11/1974'
Warning: Expecting date in B124 / R124C2: got '04/10/2001'
Warning: Expecting date in B125 / R125C2: got '16/11/2005'
Warning: Expecting date in B126 / R126C2: got '05/07/1980'
Warning: Expecting date in B127 / R127C2: got '01/12/1994'
Warning: Expecting date in B128 / R128C2: got '22/08/1973'
Warning: Expecting date in B129 / R129C2: got '10/07/1990'
Warning: Expecting date in B130 / R130C2: got '09/09/1969'
Warning: Expecting date in B131 / R131C2: got '03/09/2003'
Warning: Expecting date in B132 / R132C2: got '28/01/1962'
Warning: Expecting date in B133 / R133C2: got '18/09/1982'
Warning: Expecting date in B134 / R134C2: got '24/07/1983'
Warning: Expecting date in B135 / R135C2: got '28/05/1999'
Warning: Expecting date in B136 / R136C2: got '14/11/2004'
Warning: Expecting date in B137 / R137C2: got '16/07/1981'
Warning: Expecting date in B138 / R138C2: got '26/01/2016'
Warning: Expecting date in B139 / R139C2: got '16/05/1984'
Warning: Expecting date in B140 / R140C2: got '15/08/2014'
Warning: Expecting date in B141 / R141C2: got '12/05/1987'
Warning: Expecting date in B142 / R142C2: got '16/08/2013'
Warning: Expecting date in B143 / R143C2: got '16/03/2006'
Warning: Expecting date in B144 / R144C2: got '13/01/1964'
Warning: Expecting date in B145 / R145C2: got '31/10/1981'
Warning: Expecting date in B146 / R146C2: got '20/09/2015'
Warning: Expecting date in B147 / R147C2: got '03/04/1965'
Warning: Expecting date in B148 / R148C2: got '20/04/2007'
Warning: Expecting date in B149 / R149C2: got '03/05/1995'
Warning: Expecting date in B150 / R150C2: got '16/09/1988'
Warning: Expecting date in B151 / R151C2: got '08/10/1981'
Warning: Expecting date in B152 / R152C2: got '22/12/1974'
Warning: Expecting date in B153 / R153C2: got '19/02/1964'
Warning: Expecting date in B154 / R154C2: got '12/04/2010'
Warning: Expecting date in B155 / R155C2: got '24/08/1984'
Warning: Expecting date in B156 / R156C2: got '11/07/1963'
Warning: Expecting date in B157 / R157C2: got '17/04/1992'
Warning: Expecting date in B158 / R158C2: got '25/08/1971'
Warning: Expecting date in B159 / R159C2: got '30/08/1983'
Warning: Expecting date in B160 / R160C2: got '19/07/2003'
Warning: Expecting date in B161 / R161C2: got '15/05/1987'
Warning: Expecting date in B162 / R162C2: got '03/11/1976'
Warning: Expecting date in B263 / R263C2: got '25/04/2019'
Warning: Expecting date in B264 / R264C2: got '15/01/1990'
Warning: Expecting date in B265 / R265C2: got '24/12/1965'
Warning: Expecting date in B266 / R266C2: got '24/05/2011'
Warning: Expecting date in B267 / R267C2: got '21/08/1990'
Warning: Expecting date in B268 / R268C2: got '26/04/1980'
Warning: Expecting date in B269 / R269C2: got '22/02/1976'
Warning: Expecting date in B270 / R270C2: got '27/02/1960'
Warning: Expecting date in B271 / R271C2: got '31/12/1964'
Warning: Expecting date in B272 / R272C2: got '06/02/1961'
Warning: Expecting date in B273 / R273C2: got '26/01/1999'
Warning: Expecting date in B274 / R274C2: got '04/12/2010'
Warning: Expecting date in B275 / R275C2: got '20/10/1992'
Warning: Expecting date in B276 / R276C2: got '05/03/1971'
Warning: Expecting date in B277 / R277C2: got '02/12/2019'
Warning: Expecting date in B278 / R278C2: got '09/08/2001'
Warning: Expecting date in B279 / R279C2: got '19/10/1985'
Warning: Expecting date in B280 / R280C2: got '01/04/1964'
Warning: Expecting date in B281 / R281C2: got '23/02/1963'
Warning: Expecting date in B282 / R282C2: got '15/09/1967'
Warning: Expecting date in B283 / R283C2: got '23/09/1969'
Warning: Expecting date in B284 / R284C2: got '21/11/1974'
Warning: Expecting date in B285 / R285C2: got '15/10/1987'
Warning: Expecting date in B286 / R286C2: got '29/05/1986'
Warning: Expecting date in B287 / R287C2: got '28/02/2006'
Warning: Expecting date in B288 / R288C2: got '13/09/1965'
Warning: Expecting date in B289 / R289C2: got '14/12/2018'
Warning: Expecting date in B290 / R290C2: got '23/07/1987'
Warning: Expecting date in B291 / R291C2: got '15/06/1989'
Warning: Expecting date in B292 / R292C2: got '27/07/1983'
Warning: Expecting date in B293 / R293C2: got '01/04/1985'
Warning: Expecting date in B294 / R294C2: got '29/07/1964'
Warning: Expecting date in B295 / R295C2: got '23/02/1996'
Warning: Expecting date in B296 / R296C2: got '19/06/2006'
Warning: Expecting date in B297 / R297C2: got '28/12/1971'
Warning: Expecting date in B298 / R298C2: got '03/04/2010'
Warning: Expecting date in B299 / R299C2: got '08/05/2008'
Warning: Expecting date in B300 / R300C2: got '26/10/2018'
Warning: Expecting date in B301 / R301C2: got '19/12/1996'
Warning: Expecting date in B302 / R302C2: got '27/01/1964'
Warning: Expecting date in B303 / R303C2: got '06/07/1966'
Warning: Expecting date in B304 / R304C2: got '27/08/1999'
Warning: Expecting date in B305 / R305C2: got '13/05/2006'
Warning: Expecting date in B306 / R306C2: got '21/01/1977'
Warning: Expecting date in B307 / R307C2: got '20/10/1996'
Warning: Expecting date in B308 / R308C2: got '08/10/1986'
Warning: Expecting date in B309 / R309C2: got '12/06/1975'
Warning: Expecting date in B310 / R310C2: got '14/08/2015'
Warning: Expecting date in B311 / R311C2: got '13/10/2001'
Warning: Expecting date in B312 / R312C2: got '12/05/2017'
Warning: Expecting date in B313 / R313C2: got '19/11/2016'
Warning: Expecting date in B314 / R314C2: got '09/12/1984'
Warning: Expecting date in B315 / R315C2: got '10/05/2019'
Warning: Expecting date in B316 / R316C2: got '19/07/1973'
Warning: Expecting date in B317 / R317C2: got '25/05/1960'
Warning: Expecting date in B318 / R318C2: got '03/01/1975'
Warning: Expecting date in B319 / R319C2: got '20/11/2002'
Warning: Expecting date in B320 / R320C2: got '27/12/1994'
Warning: Expecting date in B321 / R321C2: got '04/01/1992'
Warning: Expecting date in B322 / R322C2: got '12/09/2017'
Warning: Expecting date in B323 / R323C2: got '15/09/1981'
Warning: Expecting date in B324 / R324C2: got '10/07/1983'
Warning: Expecting date in B325 / R325C2: got '04/04/1998'
Warning: Expecting date in B326 / R326C2: got '03/02/1982'
Warning: Expecting date in B327 / R327C2: got '24/04/1964'
Warning: Expecting date in B328 / R328C2: got '28/11/2012'
Warning: Expecting date in B329 / R329C2: got '22/07/1974'
Warning: Expecting date in B330 / R330C2: got '31/10/2003'
Warning: Expecting date in B331 / R331C2: got '22/10/1998'
Warning: Expecting date in B332 / R332C2: got '27/04/1986'
Warning: Expecting date in B333 / R333C2: got '11/03/1962'
Warning: Expecting date in B334 / R334C2: got '08/06/2010'
Warning: Expecting date in B335 / R335C2: got '21/07/1995'
Warning: Expecting date in B336 / R336C2: got '13/06/1998'
Warning: Expecting date in B337 / R337C2: got '07/12/1983'
Warning: Expecting date in B338 / R338C2: got '11/06/1999'
Warning: Expecting date in B415 / R415C2: got '25/04/1980'
Warning: Expecting date in B416 / R416C2: got '19/04/1976'
Warning: Expecting date in B417 / R417C2: got '01/01/1999'
Warning: Expecting date in B418 / R418C2: got '29/08/1991'
Warning: Expecting date in B419 / R419C2: got '17/07/2005'
Warning: Expecting date in B420 / R420C2: got '11/02/1997'
Warning: Expecting date in B421 / R421C2: got '15/11/1964'
Warning: Expecting date in B422 / R422C2: got '25/03/1965'
Warning: Expecting date in B423 / R423C2: got '15/12/1996'
Warning: Expecting date in B424 / R424C2: got '27/05/1984'
Warning: Expecting date in B425 / R425C2: got '27/11/2004'
Warning: Expecting date in B426 / R426C2: got '07/04/1979'
Warning: Expecting date in B427 / R427C2: got '03/07/2011'
Warning: Expecting date in B428 / R428C2: got '06/04/2003'
Warning: Expecting date in B429 / R429C2: got '22/02/2007'
Warning: Expecting date in B430 / R430C2: got '07/12/2002'
Warning: Expecting date in B431 / R431C2: got '20/11/1967'
Warning: Expecting date in B432 / R432C2: got '27/08/2018'
Warning: Expecting date in B433 / R433C2: got '16/03/2019'
Warning: Expecting date in B434 / R434C2: got '03/02/1975'
Warning: Expecting date in B435 / R435C2: got '19/07/2000'
Warning: Expecting date in B436 / R436C2: got '19/05/1980'
Warning: Expecting date in B437 / R437C2: got '17/02/1974'
Warning: Expecting date in B438 / R438C2: got '17/01/2015'
Warning: Expecting date in B439 / R439C2: got '18/10/1990'
Warning: Expecting date in B440 / R440C2: got '26/07/1991'
Warning: Expecting date in B441 / R441C2: got '04/10/1963'
Warning: Expecting date in B442 / R442C2: got '18/02/2008'
Warning: Expecting date in B443 / R443C2: got '12/11/2009'
Warning: Expecting date in B444 / R444C2: got '26/12/2014'
Warning: Expecting date in B445 / R445C2: got '25/05/2004'
Warning: Expecting date in B446 / R446C2: got '13/07/2015'
Warning: Expecting date in B447 / R447C2: got '02/03/2013'
Warning: Expecting date in B448 / R448C2: got '29/08/1976'
Warning: Expecting date in B449 / R449C2: got '13/06/1990'
Warning: Expecting date in B450 / R450C2: got '17/01/1985'
Warning: Expecting date in B451 / R451C2: got '07/05/1981'
Warning: Expecting date in B452 / R452C2: got '22/11/1964'
Warning: Expecting date in B453 / R453C2: got '29/09/2009'
Warning: Expecting date in B454 / R454C2: got '09/05/1976'
Warning: Expecting date in B455 / R455C2: got '01/07/1966'
Warning: Expecting date in B456 / R456C2: got '05/07/1986'
Warning: Expecting date in B457 / R457C2: got '06/10/1979'
Warning: Expecting date in B458 / R458C2: got '24/10/1967'
Warning: Expecting date in B459 / R459C2: got '01/02/1998'
Warning: Expecting date in B460 / R460C2: got '04/05/1988'
Warning: Expecting date in B461 / R461C2: got '17/12/1991'
Warning: Expecting date in B462 / R462C2: got '28/02/2000'
Warning: Expecting date in B463 / R463C2: got '21/12/1998'
Warning: Expecting date in B464 / R464C2: got '09/03/1979'
Warning: Expecting date in B465 / R465C2: got '26/07/2012'
Warning: Expecting date in B466 / R466C2: got '20/10/2012'
Warning: Expecting date in B467 / R467C2: got '26/03/1995'
Warning: Expecting date in B468 / R468C2: got '18/01/1989'
Warning: Expecting date in B469 / R469C2: got '31/01/1997'
Warning: Expecting date in B470 / R470C2: got '21/01/1981'
Warning: Expecting date in B471 / R471C2: got '07/01/1962'
Warning: Expecting date in B472 / R472C2: got '14/01/1977'
Warning: Expecting date in B473 / R473C2: got '17/04/1980'
Warning: Expecting date in B474 / R474C2: got '04/09/1989'
Warning: Expecting date in B475 / R475C2: got '19/03/2017'
Warning: Expecting date in B476 / R476C2: got '03/01/1993'
Warning: Expecting date in B477 / R477C2: got '27/07/1986'
Warning: Expecting date in B478 / R478C2: got '12/03/1981'
Warning: Expecting date in B479 / R479C2: got '31/05/1976'
Warning: Expecting date in B480 / R480C2: got '06/12/1967'
Warning: Expecting date in B481 / R481C2: got '22/07/1997'
Warning: Expecting date in B482 / R482C2: got '28/02/1988'
Warning: Expecting date in B483 / R483C2: got '03/07/1984'
Warning: Expecting date in B484 / R484C2: got '12/05/1961'
Warning: Expecting date in B485 / R485C2: got '26/04/1984'
Warning: Expecting date in B486 / R486C2: got '20/12/2015'
Warning: Expecting date in B487 / R487C2: got '02/12/1990'
Warning: Expecting date in B488 / R488C2: got '09/02/1961'
Warning: Expecting date in B489 / R489C2: got '03/11/1992'
Warning: Expecting date in B490 / R490C2: got '25/12/1988'
Warning: Expecting date in B491 / R491C2: got '17/04/1967'
Warning: Expecting date in B492 / R492C2: got '25/12/1972'
Warning: Expecting date in B493 / R493C2: got '10/10/1988'
Warning: Expecting date in B494 / R494C2: got '29/01/2007'
Warning: Expecting date in B495 / R495C2: got '22/08/1992'
Warning: Expecting date in B496 / R496C2: got '27/02/1983'
Warning: Expecting date in B497 / R497C2: got '10/11/1962'
Warning: Expecting date in B498 / R498C2: got '30/09/1968'
Warning: Expecting date in B499 / R499C2: got '06/04/1993'
Warning: Expecting date in B500 / R500C2: got '06/06/1987'
Warning: Expecting date in B501 / R501C2: got '17/01/2006'
Warning: Expecting date in B502 / R502C2: got '01/02/2010'
Warning: Expecting date in B595 / R595C2: got '23/06/2010'
Warning: Expecting date in B596 / R596C2: got '05/01/1998'
Warning: Expecting date in B597 / R597C2: got '16/11/1999'
Warning: Expecting date in B598 / R598C2: got '13/03/1985'
Warning: Expecting date in B599 / R599C2: got '16/11/1988'
Warning: Expecting date in B600 / R600C2: got '06/08/2008'
Warning: Expecting date in B601 / R601C2: got '16/01/2005'
Warning: Expecting date in B602 / R602C2: got '12/05/2005'
Warning: Expecting date in B603 / R603C2: got '01/11/1981'
Warning: Expecting date in B604 / R604C2: got '13/08/1986'
Warning: Expecting date in B605 / R605C2: got '31/01/1993'
Warning: Expecting date in B606 / R606C2: got '29/10/1968'
Warning: Expecting date in B607 / R607C2: got '28/07/2010'
Warning: Expecting date in B608 / R608C2: got '30/06/1990'
Warning: Expecting date in B609 / R609C2: got '22/07/1980'
Warning: Expecting date in B610 / R610C2: got '09/06/1966'
Warning: Expecting date in B611 / R611C2: got '21/11/1977'
Warning: Expecting date in B612 / R612C2: got '09/02/1981'
Warning: Expecting date in B613 / R613C2: got '31/07/1984'
Warning: Expecting date in B614 / R614C2: got '26/04/1987'
Warning: Expecting date in B615 / R615C2: got '22/10/1983'
Warning: Expecting date in B616 / R616C2: got '15/09/1989'
Warning: Expecting date in B617 / R617C2: got '07/04/2000'
Warning: Expecting date in B618 / R618C2: got '10/08/2019'
Warning: Expecting date in B619 / R619C2: got '29/04/1977'
Warning: Expecting date in B620 / R620C2: got '29/01/2019'
Warning: Expecting date in B621 / R621C2: got '09/05/1988'
Warning: Expecting date in B622 / R622C2: got '31/07/1999'
Warning: Expecting date in B623 / R623C2: got '05/01/1990'
Warning: Expecting date in B624 / R624C2: got '10/01/2005'
Warning: Expecting date in B625 / R625C2: got '16/12/2014'
Warning: Expecting date in B626 / R626C2: got '25/02/1975'
Warning: Expecting date in B627 / R627C2: got '26/11/1991'
Warning: Expecting date in B628 / R628C2: got '27/04/2013'
Warning: Expecting date in B629 / R629C2: got '29/09/1993'
Warning: Expecting date in B630 / R630C2: got '04/03/1983'
Warning: Expecting date in B631 / R631C2: got '01/05/2004'
Warning: Expecting date in B632 / R632C2: got '15/03/1966'
Warning: Expecting date in B633 / R633C2: got '08/10/1995'
Warning: Expecting date in B634 / R634C2: got '25/12/1990'
Warning: Expecting date in B635 / R635C2: got '30/12/2001'
Warning: Expecting date in B636 / R636C2: got '14/10/1975'
Warning: Expecting date in B637 / R637C2: got '03/05/1988'
Warning: Expecting date in B638 / R638C2: got '01/01/1973'
Warning: Expecting date in B639 / R639C2: got '06/08/1997'
Warning: Expecting date in B640 / R640C2: got '01/02/1968'
Warning: Expecting date in B641 / R641C2: got '12/11/2005'
Warning: Expecting date in B642 / R642C2: got '16/09/1994'
Warning: Expecting date in B643 / R643C2: got '28/03/1991'
Warning: Expecting date in B644 / R644C2: got '22/12/1988'
Warning: Expecting date in B645 / R645C2: got '22/07/1976'
Warning: Expecting date in B646 / R646C2: got '11/11/1999'
Warning: Expecting date in B647 / R647C2: got '30/05/1978'
Warning: Expecting date in B648 / R648C2: got '01/03/1999'
Warning: Expecting date in B649 / R649C2: got '03/05/2012'
Warning: Expecting date in B650 / R650C2: got '09/04/1993'
Warning: Expecting date in B651 / R651C2: got '25/09/1982'
Warning: Expecting date in B652 / R652C2: got '18/06/1985'
Warning: Expecting date in B653 / R653C2: got '17/04/1997'
Warning: Expecting date in B654 / R654C2: got '17/08/1975'
Warning: Expecting date in B655 / R655C2: got '16/08/1961'
Warning: Expecting date in B656 / R656C2: got '09/10/1995'
Warning: Expecting date in B657 / R657C2: got '25/05/1977'
Warning: Expecting date in B658 / R658C2: got '08/06/1996'
Warning: Expecting date in B659 / R659C2: got '09/07/2003'
Warning: Expecting date in B660 / R660C2: got '23/05/2011'
Warning: Expecting date in B661 / R661C2: got '09/01/2010'
Warning: Expecting date in B662 / R662C2: got '13/12/2005'
Warning: Expecting date in B663 / R663C2: got '03/07/1969'
Warning: Expecting date in B664 / R664C2: got '03/03/1996'
Warning: Expecting date in B665 / R665C2: got '29/07/1970'
Warning: Expecting date in B666 / R666C2: got '27/06/1994'
Warning: Expecting date in B667 / R667C2: got '27/11/1986'
Warning: Expecting date in B668 / R668C2: got '15/11/1988'
Warning: Expecting date in B669 / R669C2: got '18/06/2013'
Warning: Expecting date in B670 / R670C2: got '15/09/2004'
Warning: Expecting date in B671 / R671C2: got '11/12/1966'
Warning: Expecting date in B672 / R672C2: got '21/08/1981'
Warning: Expecting date in B673 / R673C2: got '21/02/1989'
Warning: Expecting date in B674 / R674C2: got '04/10/2010'
Warning: Expecting date in B675 / R675C2: got '07/06/2009'
Warning: Expecting date in B676 / R676C2: got '01/07/2017'
Warning: Expecting date in B677 / R677C2: got '03/10/1981'
Warning: Expecting date in B678 / R678C2: got '26/05/1962'
Warning: Expecting date in B679 / R679C2: got '21/11/1990'
Warning: Expecting date in B680 / R680C2: got '10/06/1967'
Warning: Expecting date in B681 / R681C2: got '28/01/2018'
Warning: Expecting date in B682 / R682C2: got '30/05/2007'
Warning: Expecting date in B683 / R683C2: got '27/06/2001'
Warning: Expecting date in B684 / R684C2: got '07/10/1987'
Warning: Expecting date in B685 / R685C2: got '07/05/1971'
Warning: Expecting date in B686 / R686C2: got '28/07/1968'
Warning: Expecting date in B687 / R687C2: got '15/12/2001'
Warning: Expecting date in B688 / R688C2: got '09/08/2013'
Warning: Expecting date in B689 / R689C2: got '18/05/1995'
Warning: Expecting date in B690 / R690C2: got '19/12/2011'
Warning: Expecting date in B691 / R691C2: got '08/05/1980'
Warning: Expecting date in B692 / R692C2: got '22/08/1970'
Warning: Expecting date in B693 / R693C2: got '13/08/2008'
Warning: Expecting date in B694 / R694C2: got '10/07/1964'
Warning: Expecting date in B695 / R695C2: got '22/04/1991'
Warning: Expecting date in B696 / R696C2: got '26/05/2006'
Warning: Expecting date in B697 / R697C2: got '13/12/1973'
Warning: Expecting date in B698 / R698C2: got '08/12/1969'
Warning: Expecting date in B699 / R699C2: got '18/04/1963'
Warning: Expecting date in B700 / R700C2: got '19/10/2001'
Warning: Expecting date in B701 / R701C2: got '29/12/1983'
Warning: Expecting date in B702 / R702C2: got '21/09/1997'
Warning: Expecting date in B703 / R703C2: got '22/01/1987'
Warning: Expecting date in B704 / R704C2: got '19/06/2017'
Warning: Expecting date in B705 / R705C2: got '23/05/2008'
Warning: Expecting date in B706 / R706C2: got '27/01/2018'
Warning: Expecting date in B707 / R707C2: got '08/02/1973'
Warning: Expecting date in B708 / R708C2: got '12/08/1989'
Warning: Expecting date in B709 / R709C2: got '30/01/1968'
Warning: Expecting date in B710 / R710C2: got '02/07/1986'
Warning: Expecting date in B815 / R815C2: got '30/09/1967'
Warning: Expecting date in B816 / R816C2: got '20/05/1960'
Warning: Expecting date in B817 / R817C2: got '21/05/2002'
Warning: Expecting date in B818 / R818C2: got '21/07/2003'
Warning: Expecting date in B819 / R819C2: got '09/07/1990'
Warning: Expecting date in B820 / R820C2: got '26/01/1970'
Warning: Expecting date in B821 / R821C2: got '06/05/1966'
Warning: Expecting date in B822 / R822C2: got '02/03/1980'
Warning: Expecting date in B823 / R823C2: got '08/06/1973'
Warning: Expecting date in B824 / R824C2: got '17/11/1971'
Warning: Expecting date in B825 / R825C2: got '29/10/1967'
Warning: Expecting date in B826 / R826C2: got '08/11/1996'
Warning: Expecting date in B827 / R827C2: got '09/12/1964'
Warning: Expecting date in B828 / R828C2: got '21/09/1977'
Warning: Expecting date in B829 / R829C2: got '19/06/1966'
Warning: Expecting date in B830 / R830C2: got '11/06/1983'
Warning: Expecting date in B831 / R831C2: got '11/02/1985'
Warning: Expecting date in B832 / R832C2: got '24/01/2009'
Warning: Expecting date in B833 / R833C2: got '24/08/1967'
Warning: Expecting date in B834 / R834C2: got '19/09/1970'
Warning: Expecting date in B835 / R835C2: got '04/02/2012'
Warning: Expecting date in B836 / R836C2: got '14/11/1971'
Warning: Expecting date in B837 / R837C2: got '31/07/2010'
Warning: Expecting date in B838 / R838C2: got '09/01/1983'
Warning: Expecting date in B839 / R839C2: got '13/11/1987'
Warning: Expecting date in B840 / R840C2: got '23/08/2001'
Warning: Expecting date in B841 / R841C2: got '08/07/2019'
Warning: Expecting date in B842 / R842C2: got '04/03/1995'
Warning: Expecting date in B843 / R843C2: got '25/03/1984'
Warning: Expecting date in B844 / R844C2: got '27/09/1970'
Warning: Expecting date in B845 / R845C2: got '05/05/1976'
Warning: Expecting date in B846 / R846C2: got '24/07/1990'
Warning: Expecting date in B847 / R847C2: got '22/11/1990'
Warning: Expecting date in B848 / R848C2: got '16/12/2017'
Warning: Expecting date in B849 / R849C2: got '04/12/1979'
Warning: Expecting date in B850 / R850C2: got '04/12/2008'
Warning: Expecting date in B851 / R851C2: got '29/03/1965'
Warning: Expecting date in B852 / R852C2: got '05/07/1966'
Warning: Expecting date in B853 / R853C2: got '28/03/2003'
Warning: Expecting date in B854 / R854C2: got '18/09/2002'
Warning: Expecting date in B855 / R855C2: got '05/07/1998'
Warning: Expecting date in B856 / R856C2: got '23/11/1978'
Warning: Expecting date in B857 / R857C2: got '19/07/2005'
Warning: Expecting date in B858 / R858C2: got '28/01/1960'
Warning: Expecting date in B859 / R859C2: got '22/02/1992'
Warning: Expecting date in B860 / R860C2: got '16/04/1995'
Warning: Expecting date in B861 / R861C2: got '17/02/2008'
Warning: Expecting date in B862 / R862C2: got '03/11/1984'
Warning: Expecting date in B863 / R863C2: got '03/02/2018'
Warning: Expecting date in B864 / R864C2: got '20/01/1967'
Warning: Expecting date in B865 / R865C2: got '02/05/1990'
Warning: Expecting date in B866 / R866C2: got '02/06/1972'
Warning: Expecting date in B867 / R867C2: got '10/11/1960'
Warning: Expecting date in B868 / R868C2: got '11/05/2001'
Warning: Expecting date in B869 / R869C2: got '20/07/2016'
Warning: Expecting date in B870 / R870C2: got '30/05/2008'
Warning: Expecting date in B871 / R871C2: got '03/05/1980'
Warning: Expecting date in B872 / R872C2: got '09/05/1996'
Warning: Expecting date in B873 / R873C2: got '08/11/2001'
Warning: Expecting date in B874 / R874C2: got '24/03/1999'
Warning: Expecting date in B875 / R875C2: got '21/12/2003'
Warning: Expecting date in B876 / R876C2: got '31/12/2007'
Warning: Expecting date in B877 / R877C2: got '22/01/2010'
Warning: Expecting date in B878 / R878C2: got '06/08/2000'
Warning: Expecting date in B879 / R879C2: got '26/05/2013'
Warning: Expecting date in B880 / R880C2: got '15/08/1987'
Warning: Expecting date in B881 / R881C2: got '11/08/2004'
Warning: Expecting date in B882 / R882C2: got '01/10/2016'
Warning: Expecting date in B883 / R883C2: got '09/01/1991'
Warning: Expecting date in B884 / R884C2: got '21/11/1971'
Warning: Expecting date in B885 / R885C2: got '02/09/1991'
Warning: Expecting date in B886 / R886C2: got '28/06/1998'
Warning: Expecting date in B887 / R887C2: got '27/11/2005'
Warning: Expecting date in B888 / R888C2: got '14/02/1974'
Warning: Expecting date in B889 / R889C2: got '19/09/1977'
Warning: Expecting date in B890 / R890C2: got '17/07/2004'
Warning: Expecting date in B891 / R891C2: got '07/07/1997'
Warning: Expecting date in B892 / R892C2: got '21/08/2016'
Warning: Expecting date in B893 / R893C2: got '31/03/1978'
Warning: Expecting date in B894 / R894C2: got '07/03/2005'
Warning: Expecting date in B895 / R895C2: got '26/05/2000'
Warning: Expecting date in B896 / R896C2: got '13/06/1981'
Warning: Expecting date in B897 / R897C2: got '21/03/1987'
Warning: Expecting date in B898 / R898C2: got '20/03/1980'
Warning: Expecting date in B899 / R899C2: got '29/08/2015'
Warning: Expecting date in B900 / R900C2: got '01/01/1960'
Warning: Expecting date in B901 / R901C2: got '15/10/2009'
Warning: Expecting date in B902 / R902C2: got '10/07/1984'
Warning: Expecting date in B903 / R903C2: got '24/05/1968'
Warning: Expecting date in B904 / R904C2: got '14/03/2019'
Warning: Expecting date in B905 / R905C2: got '23/05/2009'
Warning: Expecting date in B906 / R906C2: got '11/08/1981'
Warning: Expecting date in B907 / R907C2: got '28/06/1976'
Warning: Expecting date in B908 / R908C2: got '23/10/1993'
Warning: Expecting date in B909 / R909C2: got '08/11/1960'
Warning: Expecting date in B910 / R910C2: got '01/02/1967'
Warning: Expecting date in B911 / R911C2: got '28/12/2004'
Warning: Expecting date in B912 / R912C2: got '17/03/1982'
Warning: Expecting date in B913 / R913C2: got '07/03/1969'
Warning: Expecting date in B914 / R914C2: got '03/02/1980'
Warning: Expecting date in B999 / R999C2: got '27/05/2007'
Warning: Expecting date in B1000 / R1000C2: got '31/12/1997'
Warning: Expecting date in B1001 / R1001C2: got '11/01/2007'
Warning: Expecting date in B1002 / R1002C2: got '26/07/1993'
Warning: Expecting date in B1003 / R1003C2: got '28/02/1981'
Warning: Expecting date in B1004 / R1004C2: got '04/02/2005'
Warning: Expecting date in B1005 / R1005C2: got '20/07/1997'
Warning: Expecting date in B1006 / R1006C2: got '26/02/1999'
Warning: Expecting date in B1007 / R1007C2: got '21/05/1987'
Warning: Expecting date in B1008 / R1008C2: got '22/01/1991'
Warning: Expecting date in B1009 / R1009C2: got '03/04/2015'
Warning: Expecting date in B1010 / R1010C2: got '15/12/2014'
Warning: Expecting date in B1011 / R1011C2: got '02/01/1967'
Warning: Expecting date in B1012 / R1012C2: got '22/07/2000'
Warning: Expecting date in B1013 / R1013C2: got '10/11/2009'
Warning: Expecting date in B1014 / R1014C2: got '06/02/2018'
Warning: Expecting date in B1015 / R1015C2: got '07/12/1963'
Warning: Expecting date in B1016 / R1016C2: got '12/01/1979'
Warning: Expecting date in B1017 / R1017C2: got '07/04/2018'
Warning: Expecting date in B1018 / R1018C2: got '20/03/2002'
Warning: Expecting date in B1019 / R1019C2: got '13/05/1986'
Warning: Expecting date in B1020 / R1020C2: got '08/03/1990'
Warning: Expecting date in B1021 / R1021C2: got '31/07/1993'
Warning: Expecting date in B1022 / R1022C2: got '11/05/1968'
Warning: Expecting date in B1023 / R1023C2: got '04/03/1969'
Warning: Expecting date in B1024 / R1024C2: got '23/01/1995'
Warning: Expecting date in B1025 / R1025C2: got '26/06/1986'
Warning: Expecting date in B1026 / R1026C2: got '05/04/1996'
Warning: Expecting date in B1027 / R1027C2: got '13/11/1986'
Warning: Expecting date in B1028 / R1028C2: got '24/02/2006'
Warning: Expecting date in B1029 / R1029C2: got '24/07/1972'
Warning: Expecting date in B1030 / R1030C2: got '30/07/2014'
Warning: Expecting date in B1031 / R1031C2: got '12/03/2007'
Warning: Expecting date in B1032 / R1032C2: got '27/08/1962'
Warning: Expecting date in B1033 / R1033C2: got '05/02/1985'
Warning: Expecting date in B1034 / R1034C2: got '30/03/2019'
Warning: Expecting date in B1035 / R1035C2: got '06/04/2019'
Warning: Expecting date in B1036 / R1036C2: got '02/07/1965'
Warning: Expecting date in B1037 / R1037C2: got '03/03/2017'
Warning: Expecting date in B1038 / R1038C2: got '04/05/2014'
Warning: Expecting date in B1039 / R1039C2: got '08/09/2010'
Warning: Expecting date in B1040 / R1040C2: got '10/02/2010'
Warning: Expecting date in B1041 / R1041C2: got '03/02/1969'
Warning: Expecting date in B1042 / R1042C2: got '06/06/1979'
Warning: Expecting date in B1043 / R1043C2: got '29/04/1991'
Warning: Expecting date in B1044 / R1044C2: got '03/12/1975'
Warning: Expecting date in B1045 / R1045C2: got '02/11/1973'
Warning: Expecting date in B1046 / R1046C2: got '18/06/2016'
Warning: Expecting date in B1047 / R1047C2: got '20/04/1991'
Warning: Expecting date in B1048 / R1048C2: got '27/10/2007'
Warning: Expecting date in B1049 / R1049C2: got '28/07/1994'
Warning: Expecting date in B1050 / R1050C2: got '10/08/2004'
Warning: Expecting date in B1051 / R1051C2: got '16/01/1995'
Warning: Expecting date in B1052 / R1052C2: got '16/11/1969'
Warning: Expecting date in B1053 / R1053C2: got '06/08/1995'
Warning: Expecting date in B1054 / R1054C2: got '03/03/2009'

Read Data Attempt 2

Here is the output

Good news

  • numeric_integer_issue: column is read correctly as numeric
  • one_or_zero_issue: column is read correctly as numeric

Bad news

  • date_issue: many rows turn to blank
  • colour_weight: different colour inputs not differentiated

Reflection

From these previous failed attempt, I start to ask these questions and lose confidence in R.

  • Does the id column only have unique values ?
  • Are numbers in characters from columns text_integer_issue and text_numeric_issue read correctly ?
  • Does column numeric_integer_issue only have integer values ?
  • Does column one_or_zero_issue only have values 0 or 1 ?
  • Can we fix date_issue ?
  • Can we fix colour_weight ?

Image of a businessman sitting at office and feeling dull because lack of idea.

Reflection

Resort to manual checking/fixing of formatted cells in excel sheets. However, I realise that this approach is not sustainable.

Thankfully, there are some R packages (pointblank, collateral, tidyxl) that can help

Image of a businessman cleaning up.

Image of R pacakges pointblank, collateral and tidyxl.

Read Data Attempt 3

Read the excel sheet again with the following col_type and deal with each question one at a time.

  • id as “text”
  • date_issue as “list”
  • colour_weight \r\nBlack in pounds\r\nGreen in kilograms as “text”
  • text_integer_issue and text_numeric_issue as “text”
  • numeric_integer_issue and one_or_zero_issue as “numeric”
sample_excel_attempt_3 <- readxl::read_excel(
  path = here::here("sample_excel.xlsx"),
  sheet = "Sheet1",
  col_types = c("text" , "list", 
                "text", "text", "text", 
                "numeric", "numeric")
)

Does the id column only have unique values ?

Does the id column only have unique values ?

Use pointblank::rows_distinct to validate columns that needs to have unique values.

data.frame(id = c("ID_01", "ID_02")) |> 
  pointblank::rows_distinct(columns = "id" )
     id
1 ID_01
2 ID_02
data.frame(id = c("ID_01", "ID_02", "ID_01")) |> 
  pointblank::rows_distinct(columns = "id" )
Error: Exceedance of failed test units where there weren't distinct rows across all columns.
The `rows_distinct()` validation failed beyond the absolute threshold level (1).
* failure level (2) >= failure threshold (1)
id_check <- sample_excel_attempt_3 |> 
  dplyr::select("id") |> 
  pointblank::rows_distinct(columns = "id" )

Are numbers in characters from columns text_integer_issue and text_numeric_issue read correctly ?

Are numbers in characters from columns text_integer_issue and text_numeric_issue read correctly ?

In sample_excel_attempt_2, when I read text_integer_issue as a numeric column, I received some warning message..

  • Warning: Coercing text to numeric in D3 / R3C4: ‘54’
  • Warning: Coercing text to numeric in D7 / R7C4: ‘33’

The warnings inform the user that it sees “54” and “33” in cells D3 and D7 respectively as “text” and it is forced to be converted to numeric

Image showing 54 and 33 in cells D3 and D7 respectively as text.

Important

It may be safer to check if the column truly contain only positive integers even though they are in text, rather than relying on long warning messages.

Are numbers in characters from columns text_integer_issue and text_numeric_issue read correctly ?

Use pointblank::col_vals_regex and pointblank::col_vals_gt to ensure that text in the column text_integer_issue are positive integers.

data.frame(integer_data = c("1", "2.0000", "59", NA)) |> 
  pointblank::col_vals_regex(
    columns = c("integer_data"),
    regex = "^[1-9]([0-9]+)?(.[0]+)?$",
    na_pass = TRUE,
  ) 
  integer_data
1            1
2       2.0000
3           59
4         <NA>
data.frame(integer_data = c(1, 2, 3, NA)) |> 
  pointblank::col_vals_gt(
    columns = c("integer_data"),
    value = 0,
    na_pass = TRUE,    
  )
  integer_data
1            1
2            2
3            3
4           NA
data.frame(integer_data = c("1", "2.0000", "2.1")) |> 
  pointblank::col_vals_regex(
    columns = c("integer_data"),
    regex = "^[1-9]([0-9]+)?(.[0]+)?$",
    na_pass = TRUE,
  ) 
Error: Exceedance of failed test units where values in `integer_data` should have matched the regular expression: `^[1-9]([0-9]+)?(.[0]+)?$`.
The `col_vals_regex()` validation failed beyond the absolute threshold level (1).
* failure level (1) >= failure threshold (1)
data.frame(integer_data = c(-1, 0, 1, 2)) |> 
  pointblank::col_vals_gt(
    columns = c("integer_data"),
    value = 0,
    na_pass = TRUE,    
  )
Error: Exceedance of failed test units where values in `integer_data` should have been > `0`.
The `col_vals_gt()` validation failed beyond the absolute threshold level (1).
* failure level (2) >= failure threshold (1)

Are numbers in characters from columns text_integer_issue and text_numeric_issue read correctly ?

Similarly, I can use the same functions pointblank::col_vals_regex and pointblank::col_vals_gt to ensure that the text in the column text_numeric_issue are positive numbers.

data.frame(numeric_data = c("0.140", "7.07E-2", "2", NA)) |> 
  pointblank::col_vals_regex(
    columns = c("numeric_data"),
    regex = "^[0-9]+((.[0-9]+)?(E(-)?[0-9]+)?)?$",
    na_pass = TRUE,
  ) 
  numeric_data
1        0.140
2      7.07E-2
3            2
4         <NA>
data.frame(numeric_data = c("not numeric", FALSE, "", 2)) |> 
  pointblank::col_vals_regex(
    columns = c("numeric_data"),
    regex = "^[0-9]+((.[0-9]+)?(E(-)?[0-9]+)?)?$",
    na_pass = TRUE,
  ) 
Error: Exceedance of failed test units where values in `numeric_data` should have matched the regular expression: `^[0-9]+((.[0-9]+)?(E(-)?[0-9]+)?)?$`.
The `col_vals_regex()` validation failed beyond the absolute threshold level (1).
* failure level (3) >= failure threshold (1)

Are numbers in characters from columns text_integer_issue and text_numeric_issue read correctly ?

Continue with sample_excel_attempt_3 which reads text_integer_issue and text_numeric_issue column as text.

integer_check_from_text <- sample_excel_attempt_3 |> 
  dplyr::select(c("id","text_integer_issue")) |> 
  pointblank::col_vals_regex(
    columns = c("text_integer_issue"),
    regex = "^[1-9]([0-9]+)?(.[0]+)?$",
    na_pass = TRUE,
  ) |> 
  dplyr::mutate(
    text_integer_issue = as.integer(.data[["text_integer_issue"]])
  ) |> 
  pointblank::col_vals_gt(
    columns = c("text_integer_issue"),
    value = 0,
    na_pass = TRUE,    
  ) |> 
  dplyr::rename(
    text_integer_verified = "text_integer_issue"
  )
numeric_check <- sample_excel_attempt_3 |> 
  dplyr::select(c("id","text_numeric_issue")) |> 
  pointblank::col_vals_regex(
    columns = c("text_numeric_issue"),
    regex = "^[0-9]+((.[0-9]+)?(E(-)?[0-9]+)?)?$",
    na_pass = TRUE,
  ) |> 
  dplyr::mutate(
    text_numeric_issue = as.numeric(.data[["text_numeric_issue"]])
  ) |> 
  pointblank::col_vals_gt(
    columns = c("text_numeric_issue"),
    value = 0,
    na_pass = TRUE,    
  ) |> 
  dplyr::rename(
    text_numeric_verified = "text_numeric_issue"
  )

Does the column numeric_integer_issue only have integer values ?

Does the column numeric_integer_issue only have integer values ?

It may be necessary to check if a numeric column only has integers.

However, I cannot use pointblank::col_vals_regex because the column is not read in text.

Create the function is_integer_vector that returns FALSE when at least one of its element is not an integer.

is_integer_value <- function(input_value,
                             allow_na = FALSE) {

  boolean_result <- FALSE

  # When input value is NA
  if (is.na(input_value)) {
    if (isTRUE(allow_na)) {
      boolean_result <- TRUE
      return(boolean_result)
    } else {
      return(boolean_result)
    }
  }

  # When input value is not numeric
  if (isTRUE(!is.numeric(input_value))) {
    return(boolean_result)
  }

  # When input value is numeric
  boolean_result <- isTRUE(input_value %% 1 == 0)

  return(boolean_result)
}


is_integer_vector <- function(input_vector,
                              allow_na = FALSE) {

  boolean_results <- input_vector |>
    purrr::map_lgl(
      .f = is_integer_value,
      allow_na = allow_na
    )
  return(boolean_results)
}

Does the column numeric_integer_issue only have integer values ?

Similarly, I can use pointblank::col_vals_expr to ensure that the numeric column has only integer using the self-made is_integer_vector function.

integer_data <- data.frame(
  integer_col = c(-1, 0, NA, 2.0000, 3)
)

integer_data |> 
  pointblank::col_vals_expr(
    expr = ~is_integer_vector(
      input_vector = integer_data[["integer_col"]],
      allow_na = TRUE)
  )
  integer_col
1          -1
2           0
3          NA
4           2
5           3
non_integer_data <- data.frame(
  non_integer_col = c(-1, 0, NA, 2.0000, 
                      3.010, pi, exp(1)
                     )
) 

non_integer_data |> 
  pointblank::col_vals_expr(
    expr = ~ is_integer_vector(
      input_vector = non_integer_data[["non_integer_col"]],
      allow_na = TRUE)
  )
Error: The `col_vals_expr()` validation failed beyond the absolute threshold level (1).
* failure level (3) >= failure threshold (1)

Does the column numeric_integer_issue only have integer values ?

Going back to sample_excel_attempt_3 which reads numeric_integer_issue column as numeric, I apply the is_integer_vector function on the numeric_integer_issue column before converting the column to an integer column.

integer_check_from_numeric <- sample_excel_attempt_3 |> 
  dplyr::select(c("id","numeric_integer_issue")) |> 
  pointblank::col_vals_expr(
    expr = ~ is_integer_vector(
      input_vector = sample_excel_attempt_3[["numeric_integer_issue"]],
      allow_na = TRUE)
  ) |> 
  dplyr::mutate(
    numeric_integer_issue = as.integer(.data[["numeric_integer_issue"]]),
  ) |> 
  dplyr::rename(
    numeric_integer_verified = "numeric_integer_issue"
  )

Does the column one_or_zero_issue only have values 0 or 1 ?

Does the column one_or_zero_issue only have values 0 or 1 ?

Use pointblank::col_vals_in_set to ensure that the column only contains values from a user-defined set.

data.frame(one_or_zero_data = c(0, NA, 1)) |> 
  pointblank::col_vals_in_set(
    columns = c("one_or_zero_data"),
    set = c(NA, 0, 1)
  ) 
  one_or_zero_data
1                0
2               NA
3                1
data.frame(one_or_zero_data = c(0, NA, 1, 2)) |> 
  pointblank::col_vals_in_set(
    columns = c("one_or_zero_data"),
    set = c(NA, 0, 1)
  ) 
Error: Exceedance of failed test units where values in `one_or_zero_data` should have been in the set of `NA`, `0`, `1`.
The `col_vals_in_set()` validation failed beyond the absolute threshold level (1).
* failure level (1) >= failure threshold (1)
one_or_zero_check <- sample_excel_attempt_3 |> 
  dplyr::select(c("id","one_or_zero_issue")) |> 
  pointblank::col_vals_in_set(
    columns = c("one_or_zero_issue"),
    set = c(NA, 0, 1)
  ) |> 
  dplyr::rename(
    one_or_zero_verified = "one_or_zero_issue"
  )

Can we fix date_issue ?

Can we fix date_issue ?

First, convert the date columns into a list of character, Date and logical vectors so that data in both Excel General and Date format are preserved.

sample_excel_attempt_3 <- readxl::read_excel(
  path = here::here("sample_excel.xlsx"),
  sheet = "Sheet1",
  col_types = c("text" , "list", 
                "text", "text", "text", 
                "numeric", "numeric")
) |> 
  pointblank::rows_distinct(columns = "id" )


str(head(sample_excel_attempt_3$date_issue))
List of 6
 $ : chr "11/11/2017"
 $ : chr "18/10/2017"
 $ : logi NA
 $ : chr "32/1/2017"
 $ : POSIXct[1:1], format: "1971-01-08"
 $ : chr "29/11/1985"

Can we fix date_issue ?

Next, create a function that convert dates in character vectors into Date objects, convert logical vector to NA and convert Date vectors into the date format that I want.

convert_dmy_text_to_date <- function(input) {
  if (length(class(input)) == 1) {
    if (class(input) == "character") {
      return(as.Date.character(lubridate::dmy(input)))
    } else if (class(input) == "logical") {
      return(NA)
    }
  }
  return(lubridate::as_date(lubridate::ymd(input)))
}

Tip

However, creating function can lead to unexpected warnings and errors. To view these issues, I use some functions from the collateral R package.

Can we fix date_issue ?

Use collateral::map_peacefully to capture function side effects using both purrr::safely() and purrr::quietly().

fixed_date <- sample_excel_attempt_3 |>
  dplyr::select(c("id","date_issue")) |> 
  dplyr::mutate(
    converted_date_log = collateral::map_peacefully(
      .x = .data[["date_issue"]],
      .f = convert_dmy_text_to_date
    ),
    converted_date = purrr::map_vec(
      .x = .data[["converted_date_log"]], 
      .f = "result"
    )
  )

print(head(fixed_date))
# A tibble: 6 × 4
  id     date_issue converted_date_log converted_date
  <chr>  <list>     <collat>           <date>        
1 ID0001 <chr [1]>  R _ _ _ _          2017-11-11    
2 ID0002 <chr [1]>  R _ _ _ _          2017-10-18    
3 ID0003 <lgl [1]>  R _ _ _ _          NA            
4 ID0004 <chr [1]>  R _ _ W _          NA            
5 ID0005 <dttm [1]> R _ _ _ _          1971-01-08    
6 ID0006 <chr [1]>  R _ _ _ _          1985-11-29    

Can we fix date_issue ?

Use collateral::has_warnings and collateral::has_errors to create logical columns which gives TRUE when there are warning or error messages. Use pointblank::test_col_vals_in_set to obtain a single logical value.

fixed_date <- fixed_date |>
  dplyr::mutate(
    warning_check = collateral::has_warnings(.data[["converted_date_log"]]),
    error_check = collateral::has_errors(.data[["converted_date_log"]])
  )

print(head(fixed_date))
# A tibble: 6 × 6
  id     date_issue converted_date_log converted_date warning_check error_check
  <chr>  <list>     <collat>           <date>         <lgl>         <lgl>      
1 ID0001 <chr [1]>  R _ _ _ _          2017-11-11     FALSE         FALSE      
2 ID0002 <chr [1]>  R _ _ _ _          2017-10-18     FALSE         FALSE      
3 ID0003 <lgl [1]>  R _ _ _ _          NA             FALSE         FALSE      
4 ID0004 <chr [1]>  R _ _ W _          NA             TRUE          FALSE      
5 ID0005 <dttm [1]> R _ _ _ _          1971-01-08     FALSE         FALSE      
6 ID0006 <chr [1]>  R _ _ _ _          1985-11-29     FALSE         FALSE      
no_issue <- fixed_date |>
  pointblank::test_col_vals_in_set(
    columns = c("warning_check", "error_check"),
    set = c(FALSE)    
  )

print(no_issue)
[1] FALSE

Can we fix date_issue ?

Isolate rows with issues and output the warning and error messages.

if (!isTRUE(no_issue)) {
  fixed_date |>
    dplyr::filter(
      warning_check == TRUE | error_check == TRUE
    ) |> 
    dplyr::mutate(
      warning_log = purrr::map(
        .x = .data[["converted_date_log"]],
        .f = "warnings",
        .null = NA),
      error_log = purrr::map(
        .x = .data[["converted_date_log"]],
        .f = "errors",
        .null = NA)   
    ) |> 
    reactable::reactable(
      style = list(fontSize = "1rem")
    )
}

Can we fix date_issue ?

Correct the invalid dates accordingly and rerun everything. We just assume that 32/1/2017 was supposed to be 31/1/2017.

fixed_date <- sample_excel_attempt_3 |> 
  dplyr::select(c("id","date_issue")) |> 
  dplyr::mutate(
    date_issue = dplyr::case_when(
      (.data[["id"]] == "ID0004" & 
       .data[["date_issue"]] == "32/1/2017"
      ) ~ list(c("31/1/2017")),
      .default = .data[["date_issue"]]
    ) 
  ) |> 
  dplyr::mutate(
    converted_date_log = collateral::map_peacefully(
      .x = .data[["date_issue"]],
      .f = convert_dmy_text_to_date
    ),
    converted_date = purrr::map_vec(
      .x = .data[["converted_date_log"]], 
      .f = "result"
    ),
    warning_check = collateral::has_warnings(.data[["converted_date_log"]]),
    error_check = collateral::has_errors(.data[["converted_date_log"]]),

  ) |>
  pointblank::col_vals_in_set(
    columns = c("warning_check", "error_check"),
    set = c(FALSE)    
  ) |> 
  dplyr::select(
    c("id", "converted_date")
  ) |> 
  dplyr::rename(
    date_fixed_yyyy_mm_dd = "converted_date"
  )

Can we fix colour_weight ?

Can we fix colour_weight ?

Use tidyxl::xlsx_cells to read the excel file in cells.

cells <- tidyxl::xlsx_cells(
  path = here::here("sample_excel.xlsx"),
  sheet = "Sheet1",
  include_blank_cells = TRUE)

Observe that there is no indication of which row is green or black.

I am only provided with the local_format_id labelled 1 to 15 at the last column.

Can we fix colour_weight ?

Use tidyxl::xlsx_formats to obtain the format information of the excel file in a list.

formats <- tidyxl::xlsx_formats(
  path = here::here("sample_excel.xlsx")
)

Here is a way to view all colours in Hex8 used for all 15 local_format_id

print(formats$local$font$color$rgb)
 [1] "FF000000" "FF000000" "FF00B050" "FF000000" "FF000000" "FF000000"
 [7] "FF000000" "FF000000" "FF000000" "FF000000" NA         NA        
[13] NA         "FF00B050" "FF000000"
unique(formats$local$font$color$rgb)
[1] "FF000000" "FF00B050" NA        

Need to identify which one is black and green.

$local
$local$numFmt
 [1] "General"  "mm-dd-yy" "General"  "General"  "General"  "mm-dd-yy"
 [7] "@"        "General"  "General"  "General"  "mm-dd-yy" "General" 
[13] "General"  "General"  "General" 

$local$font
$local$font$bold
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$font$italic
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$font$underline
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$font$strike
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$font$vertAlign
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$font$size
 [1] 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

$local$font$color
$local$font$color$rgb
 [1] "FF000000" "FF000000" "FF00B050" "FF000000" "FF000000" "FF000000"
 [7] "FF000000" "FF000000" "FF000000" "FF000000" NA         NA        
[13] NA         "FF00B050" "FF000000"

$local$font$color$theme
 [1] "text1" "text1" NA      "text1" "text1" "text1" "text1" "text1" "text1"
[10] "text1" NA      NA      NA      NA      "text1"

$local$font$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$font$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA


$local$font$name
 [1] "Calibri" "Calibri" "Calibri" "Calibri" "Calibri" "Calibri" "Calibri"
 [8] "Calibri" "Calibri" "Calibri" "Calibri" "Calibri" "Calibri" "Calibri"
[15] "Calibri"

$local$font$family
 [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

$local$font$scheme
 [1] "minor" "minor" "minor" "minor" "minor" "minor" "minor" "minor" "minor"
[10] "minor" "minor" "minor" "minor" "minor" "minor"


$local$fill
$local$fill$patternFill
$local$fill$patternFill$fgColor
$local$fill$patternFill$fgColor$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$patternFill$fgColor$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$patternFill$fgColor$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$patternFill$fgColor$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA


$local$fill$patternFill$bgColor
$local$fill$patternFill$bgColor$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$patternFill$bgColor$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$patternFill$bgColor$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$patternFill$bgColor$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA


$local$fill$patternFill$patternType
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA


$local$fill$gradientFill
$local$fill$gradientFill$type
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$degree
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$left
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$right
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$top
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$bottom
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop1
$local$fill$gradientFill$stop1$position
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop1$color
$local$fill$gradientFill$stop1$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop1$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop1$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop1$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$fill$gradientFill$stop2
$local$fill$gradientFill$stop2$position
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop2$color
$local$fill$gradientFill$stop2$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop2$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop2$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$fill$gradientFill$stop2$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA





$local$border
$local$border$diagonalDown
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$border$diagonalUp
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$border$outline
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$border$left
$local$border$left$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$left$color
$local$border$left$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$left$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$left$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$left$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$right
$local$border$right$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$right$color
$local$border$right$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$right$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$right$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$right$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$start
$local$border$start$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$start$color
$local$border$start$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$start$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$start$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$start$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$end
$local$border$end$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$end$color
$local$border$end$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$end$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$end$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$end$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$top
$local$border$top$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$top$color
$local$border$top$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$top$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$top$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$top$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$bottom
$local$border$bottom$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$bottom$color
$local$border$bottom$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$bottom$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$bottom$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$bottom$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$diagonal
$local$border$diagonal$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$diagonal$color
$local$border$diagonal$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$diagonal$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$diagonal$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$diagonal$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$vertical
$local$border$vertical$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$vertical$color
$local$border$vertical$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$vertical$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$vertical$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$vertical$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



$local$border$horizontal
$local$border$horizontal$style
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$horizontal$color
$local$border$horizontal$color$rgb
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$horizontal$color$theme
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$horizontal$color$indexed
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

$local$border$horizontal$color$tint
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA




$local$alignment
$local$alignment$horizontal
 [1] "general" "general" "right"   "right"   "left"    "left"    "left"   
 [8] "general" "center"  "right"   "left"    "left"    "left"    "general"
[15] "center" 

$local$alignment$vertical
 [1] "bottom" "center" "center" "center" "bottom" "bottom" "bottom" "bottom"
 [9] "center" "bottom" "bottom" "bottom" "bottom" "bottom" "bottom"

$local$alignment$wrapText
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$alignment$readingOrder
 [1] "context"       "context"       "context"       "context"      
 [5] "context"       "context"       "context"       "context"      
 [9] "context"       "context"       "context"       "context"      
[13] "left-to-right" "context"       "context"      

$local$alignment$indent
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

$local$alignment$justifyLastLine
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$alignment$shrinkToFit
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE

$local$alignment$textRotation
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


$local$protection
$local$protection$locked
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

$local$protection$hidden
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE



$style
$style$numFmt
   Normal 
"General" 

$style$font
$style$font$bold
Normal 
 FALSE 

$style$font$italic
Normal 
 FALSE 

$style$font$underline
Normal 
    NA 

$style$font$strike
Normal 
 FALSE 

$style$font$vertAlign
Normal 
    NA 

$style$font$size
Normal 
    11 

$style$font$color
$style$font$color$rgb
    Normal 
"FF000000" 

$style$font$color$theme
 Normal 
"text1" 

$style$font$color$indexed
Normal 
    NA 

$style$font$color$tint
Normal 
    NA 


$style$font$name
   Normal 
"Calibri" 

$style$font$family
Normal 
     2 

$style$font$scheme
 Normal 
"minor" 


$style$fill
$style$fill$patternFill
$style$fill$patternFill$fgColor
$style$fill$patternFill$fgColor$rgb
Normal 
    NA 

$style$fill$patternFill$fgColor$theme
Normal 
    NA 

$style$fill$patternFill$fgColor$indexed
Normal 
    NA 

$style$fill$patternFill$fgColor$tint
Normal 
    NA 


$style$fill$patternFill$bgColor
$style$fill$patternFill$bgColor$rgb
Normal 
    NA 

$style$fill$patternFill$bgColor$theme
Normal 
    NA 

$style$fill$patternFill$bgColor$indexed
Normal 
    NA 

$style$fill$patternFill$bgColor$tint
Normal 
    NA 


$style$fill$patternFill$patternType
Normal 
    NA 


$style$fill$gradientFill
$style$fill$gradientFill$type
Normal 
    NA 

$style$fill$gradientFill$degree
Normal 
    NA 

$style$fill$gradientFill$left
Normal 
    NA 

$style$fill$gradientFill$right
Normal 
    NA 

$style$fill$gradientFill$top
Normal 
    NA 

$style$fill$gradientFill$bottom
Normal 
    NA 

$style$fill$gradientFill$stop1
$style$fill$gradientFill$stop1$position
Normal 
    NA 

$style$fill$gradientFill$stop1$color
$style$fill$gradientFill$stop1$color$rgb
Normal 
    NA 

$style$fill$gradientFill$stop1$color$theme
Normal 
    NA 

$style$fill$gradientFill$stop1$color$indexed
Normal 
    NA 

$style$fill$gradientFill$stop1$color$tint
Normal 
    NA 



$style$fill$gradientFill$stop2
$style$fill$gradientFill$stop2$position
Normal 
    NA 

$style$fill$gradientFill$stop2$color
$style$fill$gradientFill$stop2$color$rgb
Normal 
    NA 

$style$fill$gradientFill$stop2$color$theme
Normal 
    NA 

$style$fill$gradientFill$stop2$color$indexed
Normal 
    NA 

$style$fill$gradientFill$stop2$color$tint
Normal 
    NA 





$style$border
$style$border$diagonalDown
Normal 
 FALSE 

$style$border$diagonalUp
Normal 
 FALSE 

$style$border$outline
Normal 
 FALSE 

$style$border$left
$style$border$left$style
Normal 
    NA 

$style$border$left$color
$style$border$left$color$rgb
Normal 
    NA 

$style$border$left$color$theme
Normal 
    NA 

$style$border$left$color$indexed
Normal 
    NA 

$style$border$left$color$tint
Normal 
    NA 



$style$border$right
$style$border$right$style
Normal 
    NA 

$style$border$right$color
$style$border$right$color$rgb
Normal 
    NA 

$style$border$right$color$theme
Normal 
    NA 

$style$border$right$color$indexed
Normal 
    NA 

$style$border$right$color$tint
Normal 
    NA 



$style$border$start
$style$border$start$style
Normal 
    NA 

$style$border$start$color
$style$border$start$color$rgb
Normal 
    NA 

$style$border$start$color$theme
Normal 
    NA 

$style$border$start$color$indexed
Normal 
    NA 

$style$border$start$color$tint
Normal 
    NA 



$style$border$end
$style$border$end$style
Normal 
    NA 

$style$border$end$color
$style$border$end$color$rgb
Normal 
    NA 

$style$border$end$color$theme
Normal 
    NA 

$style$border$end$color$indexed
Normal 
    NA 

$style$border$end$color$tint
Normal 
    NA 



$style$border$top
$style$border$top$style
Normal 
    NA 

$style$border$top$color
$style$border$top$color$rgb
Normal 
    NA 

$style$border$top$color$theme
Normal 
    NA 

$style$border$top$color$indexed
Normal 
    NA 

$style$border$top$color$tint
Normal 
    NA 



$style$border$bottom
$style$border$bottom$style
Normal 
    NA 

$style$border$bottom$color
$style$border$bottom$color$rgb
Normal 
    NA 

$style$border$bottom$color$theme
Normal 
    NA 

$style$border$bottom$color$indexed
Normal 
    NA 

$style$border$bottom$color$tint
Normal 
    NA 



$style$border$diagonal
$style$border$diagonal$style
Normal 
    NA 

$style$border$diagonal$color
$style$border$diagonal$color$rgb
Normal 
    NA 

$style$border$diagonal$color$theme
Normal 
    NA 

$style$border$diagonal$color$indexed
Normal 
    NA 

$style$border$diagonal$color$tint
Normal 
    NA 



$style$border$vertical
$style$border$vertical$style
Normal 
    NA 

$style$border$vertical$color
$style$border$vertical$color$rgb
Normal 
    NA 

$style$border$vertical$color$theme
Normal 
    NA 

$style$border$vertical$color$indexed
Normal 
    NA 

$style$border$vertical$color$tint
Normal 
    NA 



$style$border$horizontal
$style$border$horizontal$style
Normal 
    NA 

$style$border$horizontal$color
$style$border$horizontal$color$rgb
Normal 
    NA 

$style$border$horizontal$color$theme
Normal 
    NA 

$style$border$horizontal$color$indexed
Normal 
    NA 

$style$border$horizontal$color$tint
Normal 
    NA 




$style$alignment
$style$alignment$horizontal
   Normal 
"general" 

$style$alignment$vertical
  Normal 
"bottom" 

$style$alignment$wrapText
Normal 
 FALSE 

$style$alignment$readingOrder
   Normal 
"context" 

$style$alignment$indent
Normal 
     0 

$style$alignment$justifyLastLine
Normal 
 FALSE 

$style$alignment$shrinkToFit
Normal 
 FALSE 

$style$alignment$textRotation
Normal 
     0 


$style$protection
$style$protection$locked
Normal 
  TRUE 

$style$protection$hidden
Normal 
 FALSE 

Can we fix colour_weight ?

Using green as the running example, first in excel, click on a cell with green font. Next, click on the drop down button beside the font colour button.

Figure showing where the drop down button beside the font colour button is.

This will give the following output. Next click on More Colors…

Figure showing where the More Colors... button is.

Can we fix colour_weight ?

Go to the Custom tab and extract the hex code saying #00B050 for green.

Figure showing how to get the Hex code.

Next, use https://www.schemecolor.com/?getcolor={hex code} (https://www.schemecolor.com/sample?getcolor=00B050 in our running example) to find out what the Hex8 code is for the green font.

Figure showing how to get the Hex8 code from www.schemecolor.com.

Can we fix colour_weight ?

Identify the local_format_id accordingly with the black and green Hex8 code as #FF000000 and #FF00B050 respectively.

green_font_local_format_id <- which(formats$local$font$color$rgb == "FF00B050")
green_font_local_format_id
[1]  3 14
black_font_local_format_id <- which(formats$local$font$color$rgb == "FF000000")
black_font_local_format_id
 [1]  1  2  4  5  6  7  8  9 10 15

Identify the column index of colour_weight_black_in_pounds_green_in_kilograms. pointblank::row_count_match is used to ensure we have only one row left after filtering.

weight_column_index <- cells |> 
  dplyr::filter(
    .data[["character"]] == "colour_weight \r\nBlack in pounds\r\nGreen in kilograms"
  ) |> 
  pointblank::row_count_match(count = 1) |> 
  dplyr::pull(.data[["col"]])

weight_column_index
[1] 3

Can we fix colour_weight ?

With the column index and local_format_id identified, we can filter the cells data to isolate cells which contain the weight in pounds.

weight_in_pounds <- cells |>
  dplyr::filter(.data[["row"]] != 1) |> 
  dplyr::filter(.data[["col"]] == weight_column_index) |> 
  dplyr::filter(.data[["local_format_id"]] %in% black_font_local_format_id) |> 
  pointblank::col_vals_in_set(columns = c("data_type"), set = c("numeric")) |>   
  dplyr::filter(.data[["data_type"]] == "numeric") |>  
  dplyr::select(c("row", "numeric")) |> 
  dplyr::rename(weight_pounds = "numeric") |> 
  dplyr::mutate(
    weight_kg_converted = janitor::round_half_up(.data[["weight_pounds"]] / 2.2046, digits = 0)
  )

Can we fix colour_weight ?

With the column index and local_format_id identified, we can filter the cells data to isolate cells which contain the weight in kilogram.

weight_in_kg <- cells |>
  dplyr::filter(.data[["row"]] != 1) |> 
  dplyr::filter(.data[["col"]] == weight_column_index) |> 
  dplyr::filter(.data[["local_format_id"]] %in% green_font_local_format_id) |> 
  pointblank::col_vals_in_set(
    columns = c("data_type"),
    set = c("numeric")    
  ) |>  
  dplyr::filter(.data[["data_type"]] == "numeric") |>  
  dplyr::select(c("row", "numeric")) |> 
  dplyr::rename(weight_kg = "numeric")

Can we fix colour_weight ?

Need to extract the id column from cells

id_column_index <- which(
  colnames(sample_excel_attempt_3) == "id"
)

id_cells <- cells |>
  dplyr::filter(.data[["row"]] != 1) |> 
  dplyr::filter(.data[["col"]] == id_column_index) |>
  pointblank::col_vals_in_set(
    columns = c("data_type"),
    set = c("character")
  ) |> 
  dplyr::select(c("row", "character")) |> 
  dplyr::rename(id = "character") 

Can we fix colour_weight ?

Combine the weight data together

fixed_weight <- id_cells |> 
  dplyr::left_join(weight_in_pounds, 
                   by = dplyr::join_by("row"),
                   unmatched = "error",
                   relationship = "one-to-one") |> 
  dplyr::left_join(weight_in_kg, 
                   by = dplyr::join_by("row"),
                   unmatched = "error",
                   relationship = "one-to-one") |> 
  tidyr::unite(
    col = "weight_fixed_kg",
    c("weight_kg_converted",
      "weight_kg"),
    remove = TRUE,
    na.rm = TRUE) |> 
  dplyr::select(c("id", "weight_fixed_kg"))

Can we fix colour_weight (alternative) ?

Another approach is to use unheadr::annotate_mf_all but the last column one_or_zero_issue must be removed. Here is the file required: (sample_excel_remove_last.xlsx).

sample_excel_attempt_4 <- unheadr::annotate_mf_all(
  xlfilepath = here::here(
    "sample_excel.xlsx"
  )
)
Error in unheadr::annotate_mf_all(xlfilepath = here::here("sample_excel.xlsx")): Check spreadsheet for blank cells in seemingly empty rows

Figure showing the contents in sample_excel.xlsx with the last column crossed out, indicating it needs to be removed.

sample_excel_attempt_4 <- unheadr::annotate_mf_all(
  xlfilepath = here::here("sample_excel_remove_last.xlsx")
)

Wrapping up

Wrapping up

Combine all fixed and verified columns together

A peak view of the first three columns of a formatted data set. A peak view of the last three columns of formatted data set.

Code
cleaned_data <- sample_excel_attempt_3 |> 
  dplyr::select("id") |> 
  dplyr::left_join(fixed_date, 
                   by = dplyr::join_by("id"),
                   unmatched = "error",
                   relationship = "one-to-one") |> 
  dplyr::left_join(fixed_weight, 
                   by = dplyr::join_by("id"),
                   unmatched = "error",
                   relationship = "one-to-one") |> 
  dplyr::left_join(integer_check_from_text, 
                   by = dplyr::join_by("id"),
                   unmatched = "error",
                   relationship = "one-to-one") |> 
  dplyr::left_join(numeric_check, 
                   by = dplyr::join_by("id"),
                   unmatched = "error",
                   relationship = "one-to-one") |> 
  dplyr::left_join(integer_check_from_numeric, 
                   by = dplyr::join_by("id"),
                   unmatched = "error",
                   relationship = "one-to-one") |> 
  dplyr::left_join(one_or_zero_check, 
                   by = dplyr::join_by("id"),
                   unmatched = "error",
                   relationship = "one-to-one")

Wrapping up

R packages (pointblank, collateral, tidyxl) can help to validate and tackle some problematic formatted columns in Excel, without resorting to too much manual work.

However, we can see that tidying up formatted Excel files remains challenging, even with R. Hope that this presentation can encourage others to persevere and strive to find/share alternative ways.

Image of R pacakges pointblank, collateral and tidyxl.

Image of a businessman sitting at stack of books, read a book and dreaming about success.