This is a function to derive the case, control, and total MAFs
from GWAS summary statistics when the user has access to the sample sizes,
and the OR (or beta), and SE for the log(OR) for each variant.
If user has total AF instead of SE use CaseControl_AF()
This code uses the GroupFreq function adapted from C from
https://github.com/Paschou-Lab/ReAct/blob/main/GrpPRS_src/CountConstruct.c
CaseControl_SE(
data,
N_case = 0,
N_control = 0,
OR_colname = "OR",
SE_colname = "SE",
chromosome_colname = "chr",
sex_chromosomes = FALSE,
position_colname = "pos",
N_XX_case = NA,
N_XX_control = NA,
N_XY_case = NA,
N_XY_control = NA,
do_correction = FALSE,
correction_data = NA,
remove_sex_chromosomes = TRUE,
verbose = FALSE
)
dataframe where each row is a variant and columns contain the OR, SE, chromosome and positions
an integer of the number of Case individuals
an integer of the number of Control individuals
a string containing the exact column name in 'data' with the OR
a string containing the exact column name in 'data' with the SE
a string containing the exact column name in 'data' with the chromosomes, default "chr"
boolean, TRUE if variants from sex chromosomes are included in the dataset. Sex chromosomes can be numeric (23, 24) or character (X, Y). If numeric, assumes X=23 and Y=24.
a string containing the exact column name in 'data' with the position, default "pos"
the number of XX chromosome case individuals (REQUIRED if sex_chromosomes == TRUE)
the number of XX chromosome control individuals (REQUIRED if sex_chromosomes == TRUE)
the number of XY chromosome case individuals (REQUIRED if sex_chromosomes == TRUE)
the number of XY chromosome control individuals (REQUIRED if sex_chromosomes == TRUE)
boolean, TRUE if data is provided to perform correction
a dataframe with the following exact columns: CHR, POS, proxy_MAF with data that is harmonized between the proxy true datasets and the observed dataset
boolean, TRUE if should keep autosomes only. This is needed when the number of biological sex males/females per case and control group is not known.
boolean, determine whether warnings should be displayed (default FALSE)
returns data as a dataframe with three additional columns: MAF_case, MAF_control, MAF_total for the estimated MAFs for each variant. If do_correction = TRUE, then will output 3 additional columns (MAF_case_adj, MAF_control_adj, MAF_total_adj) with the adjusted estimates.
https://github.com/wolffha/CCAFE
https://github.com/wolffha/CCAFE for further documentation
library(CCAFE)
data("sampleDat")
sampleDat <- as.data.frame(sampleDat)
nCase_sample = 16550
nControl_sample = 403923
# get the estimated case and control MAFs
se_method_results <- CaseControl_SE(data = sampleDat,
N_case = nCase_sample,
N_control = nControl_sample,
OR_colname = "OR",
SE_colname = "SE",
chromosome_colname = "CHR",
position_colname = "POS")
head(se_method_results)
#> CHR POS REF ALT true_maf_case true_maf_control beta SE
#> 1 chr1 226824710 C T 2.048e-01 2.041e-01 0.0003441 0.01471
#> 2 chr1 117812346 G A 4.480e-01 4.471e-01 -0.0013920 0.01196
#> 3 chr1 230838863 C T 1.647e-01 1.642e-01 0.0042250 0.01620
#> 4 chr1 93121792 ATT A 0.000e+00 0.000e+00 -0.1058000 3.53000
#> 5 chr1 240236388 G A 1.114e-05 3.775e-05 -1.2540000 1.25300
#> 6 chr1 12385196 G A 1.390e-02 1.402e-02 -0.0152700 0.05046
#> gnomad_maf OR true_maf_pop MAF_case MAF_control MAF_total
#> 1 2.04050e-01 1.0003442 2.041276e-01 1.764905e-01 1.764405e-01 1.764425e-01
#> 2 4.45508e-01 0.9986090 4.471354e-01 3.263835e-01 3.266896e-01 3.266776e-01
#> 3 1.82612e-01 1.0042339 1.642197e-01 1.392353e-01 1.387297e-01 1.387496e-01
#> 4 1.07925e-04 0.8996046 0.000000e+00 2.513875e-06 2.794421e-06 2.783379e-06
#> 5 8.82145e-05 0.2853611 3.670262e-05 1.946826e-05 6.821993e-05 6.630105e-05
#> 6 1.15995e-02 0.9848460 1.401528e-02 1.250050e-02 1.269041e-02 1.268294e-02