This document helps to explain how the number displayed in the BioPAN software are calculated using a small dataset as an example. The source code of this document can be found in this GitHub respository
BioPAN (1) is a tool found in LIPID MAPS designed to automate biosynthetic pathway analysis of lipids.
Below is a YouTube video on how to use BioPAN.
The documentation of BioPAN provides a technical explanation of how the pathway scores are calculated.
More information can be found in this paper (2)
library(styler)
library(dplyr)
library(magrittr)
library(report)
summary(report::report(sessionInfo()))
The analysis was done using the R Statistical language (v4.1.3; R Core Team, 2022) on Windows 10 x64, using the packages dplyr (v1.0.6), rmarkdown (v2.13), styler (v1.7.0), report (v0.3.0) and magrittr (v2.0.1).
The data used in this tutorial is taken from this paper (3)
To download the data, scroll down to the “Supplementary data” Section and click on “giz061_Supplemental_Files”
Unzip the folder and go to Supplemental Data 6.csv
The data should look like this when opened
For this tutorial, a small subset is used
Description for each column can be found here
Let us consider the pathway of PS(42:8)
to PE(42:8)
to PC(42:8)
.
The dataset are as follows:
<- c(
samples "24h CON", "24h CON", "24h CON",
"24h AA", "24h AA", "24h AA"
)
<- c(
group "CON", "CON", "CON",
"AA", "AA", "AA"
)
<- c(
PS42_8 0.039096671, 0.048629063, 0.033723608,
0.047177176, 0.041427748, 0.043042335
)
<- c(
PE42_8 0.005358486, 0.00509418, 0.00489321,
0.160215808, 0.151076058, 0.166159881
)
<- c(
PC42_8 0.01543998, 0.015312104, 0.015252706,
1.286319709, 1.083053662, 1.261169034
)
<- tibble::tibble(
active_data Sample = samples,
Group = group,
`PS(42:8)` = PS42_8,
`PE(42:8)` = PE42_8,
`PC(42:8)` = PC42_8
)
active_data
Sample | Group | PS(42:8) | PE(42:8) | PC(42:8) |
---|---|---|---|---|
24h CON | CON | 0.0390967 | 0.0053585 | 0.0154400 |
24h CON | CON | 0.0486291 | 0.0050942 | 0.0153121 |
24h CON | CON | 0.0337236 | 0.0048932 | 0.0152527 |
24h AA | AA | 0.0471772 | 0.1602158 | 1.2863197 |
24h AA | AA | 0.0414277 | 0.1510761 | 1.0830537 |
24h AA | AA | 0.0430423 | 0.1661599 | 1.2611690 |
PS(42:8)
to PE(42:8)
Compute the weight for product PE(42:8)
over reactant PS(42:8)
for each sample.
<- (active_data$`PE(42:8)` / active_data$`PS(42:8)`)
weights <- active_data %>%
active_data ::mutate(`PE(42:8) over PS(42:8)` = weights)
dplyr
%>%
active_data ::select(-c(.data$`PC(42:8)`)) dplyr
Sample | Group | PS(42:8) | PE(42:8) | PE(42:8) over PS(42:8) |
---|---|---|---|---|
24h CON | CON | 0.0390967 | 0.0053585 | 0.1370573 |
24h CON | CON | 0.0486291 | 0.0050942 | 0.1047559 |
24h CON | CON | 0.0337236 | 0.0048932 | 0.1450975 |
24h AA | AA | 0.0471772 | 0.1602158 | 3.3960449 |
24h AA | AA | 0.0414277 | 0.1510761 | 3.6467360 |
24h AA | AA | 0.0430423 | 0.1661599 | 3.8603826 |
Compute a one-sided Welch \(t\)-test between the samples of interest (Group AA) and the control samples (Group CON).
<- active_data %>%
aa_samples ::filter(.data$Group == "AA") %>%
dplyr::pull(.data$`PE(42:8) over PS(42:8)`)
dplyr
<- active_data %>%
control_samples ::filter(.data$Group == "CON") %>%
dplyr::pull(.data$`PE(42:8) over PS(42:8)`)
dplyr
<- t.test(aa_samples, control_samples, alternative = "greater")
t1
::report(t1) report
Effect sizes were labelled following Cohen’s (1988) recommendations.
The Welch Two Sample t-test testing the difference between aa_samples and control_samples (mean of x = 3.63, mean of y = 0.13) suggests that the effect is positive, statistically significant, and large (difference = 3.51, 95% CI [3.12, Inf], t(2.03) = 26.01, p < .001; Cohen’s d = 21.24, 95% CI [3.37, 39.33])
cat(paste("$p$ value is", format(t1$p.value, scientific = TRUE, nsmall = 3)))
\(p\) value is 6.752038e-04
Convert the \(p\) value into a \(Z\) Score.
This is also the pathway score for PS(42:8)
to PE(42:8)
.
<- qnorm(1 - t1$p.value)
z_score1 cat(paste("$Z$ score for `PS(42:8)` to `PE(42:8)` is", format(z_score1, nsmall = 3)))
\(Z\) score for PS(42:8)
to PE(42:8)
is 3.205046
PE(42:8)
to PC(42:8)
Compute the weight for product PC(42:8)
over reactant PE(42:8)
for each sample.
<- (active_data$`PC(42:8)` / active_data$`PE(42:8)`)
weights <- active_data %>%
active_data ::mutate(`PC(42:8) over PE(42:8)` = weights)
dplyr
%>%
active_data ::select(-c(.data$`PS(42:8)`, .data$`PE(42:8) over PS(42:8)`)) dplyr
Sample | Group | PE(42:8) | PC(42:8) | PC(42:8) over PE(42:8) |
---|---|---|---|---|
24h CON | CON | 0.0053585 | 0.0154400 | 2.881407 |
24h CON | CON | 0.0050942 | 0.0153121 | 3.005803 |
24h CON | CON | 0.0048932 | 0.0152527 | 3.117117 |
24h AA | AA | 0.1602158 | 1.2863197 | 8.028669 |
24h AA | AA | 0.1510761 | 1.0830537 | 7.168930 |
24h AA | AA | 0.1661599 | 1.2611690 | 7.590094 |
Compute a one-sided Welch \(t\)-test between the samples of interest (Group AA) and the control samples (Group CON).
<- active_data %>%
aa_samples ::filter(.data$Group == "AA") %>%
dplyr::pull(.data$`PC(42:8) over PE(42:8)`)
dplyr
<- active_data %>%
control_samples ::filter(.data$Group == "CON") %>%
dplyr::pull(.data$`PC(42:8) over PE(42:8)`)
dplyr
<- t.test(aa_samples, control_samples, alternative = "greater")
t2 ::report(t2) report
Effect sizes were labelled following Cohen’s (1988) recommendations.
The Welch Two Sample t-test testing the difference between aa_samples and control_samples (mean of x = 7.60, mean of y = 3.00) suggests that the effect is positive, statistically significant, and large (difference = 4.59, 95% CI [3.91, Inf], t(2.30) = 17.85, p < .001; Cohen’s d = 14.58, 95% CI [2.70, 27.26])
cat(paste("$p$ value is", format(t2$p.value, scientific = TRUE, nsmall = 3)))
\(p\) value is 8.112954e-04
Convert the \(p\) value into a \(Z\) score.
This is also the pathway score for PE(42:8)
to PC(42:8)
.
<- qnorm(1 - t2$p.value)
z_score2 cat(paste("$Z$ score for `PE(42:8)` to `PC(42:8)` is", format(z_score2, nsmall = 3)))
\(Z\) score for PE(42:8)
to PC(42:8)
is 3.151815
Compute \(Z_{A}\) for pathway PS(42:8)
to PE(42:8)
to PC(42:8)
.
Recall the formula is defined as:
where \(k\) is 2 and \(Z_{i}\) are the pathway scores PS(42:8)
to PE(42:8)
and PE(42:8)
to PC(42:8)
computed earlier.
<- (1 / sqrt(2)) * (z_score1 + z_score2)
z_a cat(paste("$Z_{A}$ is", format(z_a, nsmall = 3)))
\(Z_{A}\) is 4.49498
With this settings,
Since \(Z_{A} > 1.645\), the pathway is classified as active.
Let us consider the pathway of PS(32:0)
to PE(32:0)
to PC(32:0)
.
The dataset are as follows:
<- c(
samples "24h CON", "24h CON", "24h CON",
"24h AA", "24h AA", "24h AA"
)
<- c(
group "CON", "CON", "CON",
"AA", "AA", "AA"
)
<- c(
PS32_0 0.11392858, 0.080762026, 0.128541348,
0.656895224, 0.800790573, 0.592724899
)
<- c(
PE32_0 0.046063214, 0.043759251, 0.047335343,
0.175927791, 0.183855506, 0.194325215
)
<- c(
PC32_0_1 1.074150848, 0.726798053, 0.412228743,
1.94494173, 1.520645133, 1.337827826
)
<- c(
PC32_0_2 0.928888882, 0.964353269, 0.789633482,
1.666428947, 1.375398801, 1.616097007
)
<- tibble::tibble(
suppressed_data Sample = samples,
Group = group,
`PS(32:0)` = PS32_0,
`PE(32:0)` = PE32_0,
`PC(32:0) 1` = PC32_0_1,
`PC(32:0) 2` = PC32_0_2
)
suppressed_data
Sample | Group | PS(32:0) | PE(32:0) | PC(32:0) 1 | PC(32:0) 2 |
---|---|---|---|---|---|
24h CON | CON | 0.1139286 | 0.0460632 | 1.0741508 | 0.9288889 |
24h CON | CON | 0.0807620 | 0.0437593 | 0.7267981 | 0.9643533 |
24h CON | CON | 0.1285413 | 0.0473353 | 0.4122287 | 0.7896335 |
24h AA | AA | 0.6568952 | 0.1759278 | 1.9449417 | 1.6664289 |
24h AA | AA | 0.8007906 | 0.1838555 | 1.5206451 | 1.3753988 |
24h AA | AA | 0.5927249 | 0.1943252 | 1.3378278 | 1.6160970 |
BioPAN will give a warning on the duplicated transition PC(32:0)
and sum them up.
<- tibble::tibble(
suppressed_data Sample = samples,
Group = group,
`PS(32:0)` = PS32_0,
`PE(32:0)` = PE32_0,
`PC(32:0)` = PC32_0_1 + PC32_0_2
)
suppressed_data
Sample | Group | PS(32:0) | PE(32:0) | PC(32:0) |
---|---|---|---|---|
24h CON | CON | 0.1139286 | 0.0460632 | 2.003040 |
24h CON | CON | 0.0807620 | 0.0437593 | 1.691151 |
24h CON | CON | 0.1285413 | 0.0473353 | 1.201862 |
24h AA | AA | 0.6568952 | 0.1759278 | 3.611371 |
24h AA | AA | 0.8007906 | 0.1838555 | 2.896044 |
24h AA | AA | 0.5927249 | 0.1943252 | 2.953925 |
PS(32:0)
to PE(32:0)
Compute the weight for product PE(32:0)
over reactant PS(32:0)
for each sample.
<- (suppressed_data$`PE(32:0)` / suppressed_data$`PS(32:0)`)
weights <- suppressed_data %>%
suppressed_data ::mutate(`PE(32:0) over PS(32:0)` = weights)
dplyr
%>%
suppressed_data ::select(-c(.data$`PC(32:0)`)) dplyr
Sample | Group | PS(32:0) | PE(32:0) | PE(32:0) over PS(32:0) |
---|---|---|---|---|
24h CON | CON | 0.1139286 | 0.0460632 | 0.4043166 |
24h CON | CON | 0.0807620 | 0.0437593 | 0.5418295 |
24h CON | CON | 0.1285413 | 0.0473353 | 0.3682499 |
24h AA | AA | 0.6568952 | 0.1759278 | 0.2678171 |
24h AA | AA | 0.8007906 | 0.1838555 | 0.2295925 |
24h AA | AA | 0.5927249 | 0.1943252 | 0.3278506 |
Compute a one-sided Welch \(t\)-test between the samples of interest (Group AA) and the control samples (Group CON).
<- suppressed_data %>%
aa_samples ::filter(.data$Group == "AA") %>%
dplyr::pull(.data$`PE(32:0) over PS(32:0)`)
dplyr
<- suppressed_data %>%
control_samples ::filter(.data$Group == "CON") %>%
dplyr::pull(.data$`PE(32:0) over PS(32:0)`)
dplyr
<- t.test(aa_samples, control_samples, alternative = "less")
t1 ::report(t1) report
Effect sizes were labelled following Cohen’s (1988) recommendations.
The Welch Two Sample t-test testing the difference between aa_samples and control_samples (mean of x = 0.28, mean of y = 0.44) suggests that the effect is negative, statistically significant, and large (difference = -0.16, 95% CI [-Inf, -0.02], t(3.08) = -2.71, p < .05; Cohen’s d = -2.21, 95% CI [-4.46, 0.16])
cat(paste("$p$ value is", format(t1$p.value, scientific = TRUE, nsmall = 3)))
\(p\) value is 3.551733e-02
Convert the \(p\) value into a \(Z\) score.
This is also the pathway score for PS(32:0)
to PE(32:0)
.
<- qnorm(1 - t1$p.value)
z_score1 cat(paste("$Z$ score for`PS(32:0)` to `PE(32:0)` is", format(z_score1, nsmall = 3)))
\(Z\) score forPS(32:0)
to PE(32:0)
is 1.805256
PE(32:0)
to PC(32:0)
Compute the weight for product PC(32:0)
over reactant PE(32:0)
for each sample.
<- (suppressed_data$`PC(32:0)` / suppressed_data$`PE(32:0)`)
weights <- suppressed_data %>%
suppressed_data ::mutate(`PC(32:0) over PE(32:0)` = weights)
dplyr
%>%
suppressed_data ::select(-c(.data$`PS(32:0)`, .data$`PE(32:0) over PS(32:0)`)) dplyr
Sample | Group | PE(32:0) | PC(32:0) | PC(32:0) over PE(32:0) |
---|---|---|---|---|
24h CON | CON | 0.0460632 | 2.003040 | 43.48458 |
24h CON | CON | 0.0437593 | 1.691151 | 38.64672 |
24h CON | CON | 0.0473353 | 1.201862 | 25.39038 |
24h AA | AA | 0.1759278 | 3.611371 | 20.52757 |
24h AA | AA | 0.1838555 | 2.896044 | 15.75174 |
24h AA | AA | 0.1943252 | 2.953925 | 15.20093 |
Compute a one-sided Welch \(t\)-test between the samples of interest (Group AA) and the control samples (Group CON).
<- suppressed_data %>%
aa_samples ::filter(.data$Group == "AA") %>%
dplyr::pull(.data$`PC(32:0) over PE(32:0)`)
dplyr
<- suppressed_data %>%
control_samples ::filter(.data$Group == "CON") %>%
dplyr::pull(.data$`PC(32:0) over PE(32:0)`)
dplyr
<- t.test(aa_samples, control_samples, alternative = "less")
t2 ::report(t2) report
Effect sizes were labelled following Cohen’s (1988) recommendations.
The Welch Two Sample t-test testing the difference between aa_samples and control_samples (mean of x = 17.16, mean of y = 35.84) suggests that the effect is negative, statistically significant, and large (difference = -18.68, 95% CI [-Inf, -3.83], t(2.39) = -3.30, p < .05; Cohen’s d = -2.69, 95% CI [-5.41, 0.12])
cat(paste("$p$ value is", format(t2$p.value, scientific = TRUE, nsmall = 3)))
\(p\) value is 3.175951e-02
Convert the \(p\) value into a \(Z\) score.
This is also the pathway score for PE(32:0)
to PC(32:0)
.
<- qnorm(1 - t2$p.value)
z_score2 cat(paste("$Z$ score for `PE(32:0)` to `PC(32:0)` is", format(z_score2, nsmall = 3)))
\(Z\) score for PE(32:0)
to PC(32:0)
is 1.855541
Compute \(Z_{A}\) for pathway PS(32:0)
to PE(32:0)
to PC(32:0)
.
Recall the formula is defined as:
where \(k\) is 2 and \(Z_{i}\) are the pathway scores PS(32:0)
to PE(32:0)
and PE(32:0)
to PC(32:0)
computed earlier.
<- (1 / sqrt(2)) * (z_score1 + z_score2)
z_a cat(paste("$Z_{A}$ is", format(z_a, nsmall = 3)))
\(Z_{A}\) is 2.588574
With this settings,
Since \(Z_{A} > 1.645\), the pathway is classified as suppressed.
Let us consider the pathway of PC
to PS
to PE
.
The dataset are as follows:
<- c(
samples "24h CON", "24h CON", "24h CON",
"24h AA", "24h AA", "24h AA"
)
<- c(
group "CON", "CON", "CON",
"AA", "AA", "AA"
)
<- c(
PS42_8 0.039096671, 0.048629063, 0.033723608,
0.047177176, 0.041427748, 0.043042335
)
<- c(
PE42_8 0.005358486, 0.00509418, 0.00489321,
0.160215808, 0.151076058, 0.166159881
)
<- c(
PC42_8 0.01543998, 0.015312104, 0.015252706,
1.286319709, 1.083053662, 1.261169034
)
<- c(
PS32_0 0.11392858, 0.080762026, 0.128541348,
0.656895224, 0.800790573, 0.592724899
)
<- c(
PE32_0 0.046063214, 0.043759251, 0.047335343,
0.175927791, 0.183855506, 0.194325215
)
<- c(
PC32_0_1 1.074150848, 0.726798053, 0.412228743,
1.94494173, 1.520645133, 1.337827826
)
<- c(
PC32_0_2 0.928888882, 0.964353269, 0.789633482,
1.666428947, 1.375398801, 1.616097007
)
<- tibble::tibble(
active_data Sample = samples,
Group = group,
PS = PS42_8 + PS32_0,
PE = PE42_8 + PE32_0,
PC = PC42_8 + PC32_0_1 + PC32_0_2
)
active_data
Sample | Group | PS | PE | PC |
---|---|---|---|---|
24h CON | CON | 0.1530253 | 0.0514217 | 2.018480 |
24h CON | CON | 0.1293911 | 0.0488534 | 1.706463 |
24h CON | CON | 0.1622650 | 0.0522286 | 1.217115 |
24h AA | AA | 0.7040724 | 0.3361436 | 4.897690 |
24h AA | AA | 0.8422183 | 0.3349316 | 3.979098 |
24h AA | AA | 0.6357672 | 0.3604851 | 4.215094 |
PC
to PS
Compute the weight for product PS
over reactant PC
for each sample.
<- (active_data$`PS` / active_data$`PC`)
weights <- active_data %>%
active_data ::mutate(`PS over PC` = weights)
dplyr
%>%
active_data ::select(-c(.data$`PE`)) dplyr
Sample | Group | PS | PC | PS over PC |
---|---|---|---|---|
24h CON | CON | 0.1530253 | 2.018480 | 0.0758121 |
24h CON | CON | 0.1293911 | 1.706463 | 0.0758241 |
24h CON | CON | 0.1622650 | 1.217115 | 0.1333193 |
24h AA | AA | 0.7040724 | 4.897690 | 0.1437560 |
24h AA | AA | 0.8422183 | 3.979098 | 0.2116606 |
24h AA | AA | 0.6357672 | 4.215094 | 0.1508311 |
Compute a one-sided Welch \(t\)-test between the samples of interest (Group AA) and the control samples (Group CON).
<- active_data %>%
aa_samples ::filter(.data$Group == "AA") %>%
dplyr::pull(.data$`PS over PC`)
dplyr
<- active_data %>%
control_samples ::filter(.data$Group == "CON") %>%
dplyr::pull(.data$`PS over PC`)
dplyr
<- t.test(aa_samples, control_samples, alternative = "greater")
t1 ::report(t1) report
Effect sizes were labelled following Cohen’s (1988) recommendations.
The Welch Two Sample t-test testing the difference between aa_samples and control_samples (mean of x = 0.17, mean of y = 0.09) suggests that the effect is positive, statistically significant, and large (difference = 0.07, 95% CI [0.01, Inf], t(3.95) = 2.56, p < .05; Cohen’s d = 2.09, 95% CI [-0.11, 4.15])
cat(paste("$p$ value is", format(t1$p.value, scientific = TRUE, nsmall = 3)))
\(p\) value is 3.181857e-02
Convert the \(p\) value into a \(Z\) score.
This is also the pathway score for PC
to PS
.
<- qnorm(1 - t1$p.value)
z_score1 cat(paste("$Z$ score for `PC` to `PS` is", format(z_score1, nsmall = 3)))
\(Z\) score for PC
to PS
is 1.854714
PS
to PE
Compute the weight for product PE
over reactant PS
for each sample.
<- (active_data$`PE` / active_data$`PS`)
weights <- active_data %>%
active_data ::mutate(`PE over PS` = weights)
dplyr
%>%
active_data ::select(-c(.data$`PC`, .data$`PS over PC`)) dplyr
Sample | Group | PS | PE | PE over PS |
---|---|---|---|---|
24h CON | CON | 0.1530253 | 0.0514217 | 0.3360341 |
24h CON | CON | 0.1293911 | 0.0488534 | 0.3775641 |
24h CON | CON | 0.1622650 | 0.0522286 | 0.3218720 |
24h AA | AA | 0.7040724 | 0.3361436 | 0.4774276 |
24h AA | AA | 0.8422183 | 0.3349316 | 0.3976778 |
24h AA | AA | 0.6357672 | 0.3604851 | 0.5670080 |
Compute a one-sided Welch \(t\)-test between the samples of interest (Group AA) and the control samples (Group CON).
<- active_data %>%
aa_samples ::filter(.data$Group == "AA") %>%
dplyr::pull(.data$`PE over PS`)
dplyr
<- active_data %>%
control_samples ::filter(.data$Group == "CON") %>%
dplyr::pull(.data$`PE over PS`)
dplyr
<- t.test(aa_samples, control_samples, alternative = "greater")
t2 ::report(t2) report
Effect sizes were labelled following Cohen’s (1988) recommendations.
The Welch Two Sample t-test testing the difference between aa_samples and control_samples (mean of x = 0.48, mean of y = 0.35) suggests that the effect is positive, statistically significant, and large (difference = 0.14, 95% CI [2.24e-03, Inf], t(2.46) = 2.62, p < .05; Cohen’s d = 2.14, 95% CI [-0.31, 4.46])
cat(paste("$p$ value is", format(t2$p.value, scientific = TRUE, nsmall = 3)))
\(p\) value is 4.841056e-02
Convert the \(p\) value into a \(Z\) score.
This is also the pathway score for PS
to PE
.
<- qnorm(1 - t2$p.value)
z_score2 cat(paste("$Z$ score for `PS` to `PE` is", format(z_score2, nsmall = 3)))
\(Z\) score for PS
to PE
is 1.660464
Compute \(Z_{A}\) for pathway PC
to PS
to PE
.
Recall the formula is defined as:
where \(k\) is 2 and \(Z_{i}\) are the pathway scores PC
to PS
and PS
to PE
computed earlier.
<- (1 / sqrt(2)) * (z_score1 + z_score2)
z_a cat(paste("$Z_{A}$ is", format(z_a, nsmall = 4)))
\(Z_{A}\) is 2.485606
With this settings,
Since \(Z_{A} > 1.645\), the pathway is classified as active.
This Rmarkdown template is created by the Reality Bending Lab. The template can be download from the lab’s github repository. For more information about the motivation behind creating this template, check out Dr. Dominique Makowski’s blog post
::cite_packages(sessionInfo()) report