This is a function to derive the case, control, and total MAFs from GWAS summary statistics when the user has access to the sample sizes, and the OR (or beta), and SE for the log(OR) for each variant. If user has total AF instead of SE use CaseControl_AF() This code uses the GroupFreq function adapted from C from https://github.com/Paschou-Lab/ReAct/blob/main/GrpPRS_src/CountConstruct.c

CaseControl_SE(
  data,
  N_case = 0,
  N_control = 0,
  OR_colname = "OR",
  SE_colname = "SE",
  chromosome_colname = "chr",
  sex_chromosomes = FALSE,
  position_colname = "pos",
  N_XX_case = NA,
  N_XX_control = NA,
  N_XY_case = NA,
  N_XY_control = NA,
  do_correction = FALSE,
  correction_data = NA,
  remove_sex_chromosomes = TRUE,
  verbose = FALSE
)

Arguments

data

dataframe where each row is a variant and columns contain the OR, SE, chromosome and positions

N_case

an integer of the number of Case individuals

N_control

an integer of the number of Control individuals

OR_colname

a string containing the exact column name in 'data' with the OR

SE_colname

a string containing the exact column name in 'data' with the SE

chromosome_colname

a string containing the exact column name in 'data' with the chromosomes, default "chr"

sex_chromosomes

boolean, TRUE if variants from sex chromosomes are included in the dataset. Sex chromosomes can be numeric (23, 24) or character (X, Y). If numeric, assumes X=23 and Y=24.

position_colname

a string containing the exact column name in 'data' with the position, default "pos"

N_XX_case

the number of XX chromosome case individuals (REQUIRED if sex_chromosomes == TRUE)

N_XX_control

the number of XX chromosome control individuals (REQUIRED if sex_chromosomes == TRUE)

N_XY_case

the number of XY chromosome case individuals (REQUIRED if sex_chromosomes == TRUE)

N_XY_control

the number of XY chromosome control individuals (REQUIRED if sex_chromosomes == TRUE)

do_correction

boolean, TRUE if data is provided to perform correction

correction_data

a dataframe with the following exact columns: CHR, POS, proxy_MAF with data that is harmonized between the proxy true datasets and the observed dataset

remove_sex_chromosomes

boolean, TRUE if should keep autosomes only. This is needed when the number of biological sex males/females per case and control group is not known.

verbose

boolean, determine whether warnings should be displayed (default FALSE)

Value

returns data as a dataframe with three additional columns: MAF_case, MAF_control, MAF_total for the estimated MAFs for each variant. If do_correction = TRUE, then will output 3 additional columns (MAF_case_adj, MAF_control_adj, MAF_total_adj) with the adjusted estimates.

References

https://github.com/wolffha/CCAFE

See also

https://github.com/wolffha/CCAFE for further documentation

Author

Hayley Wolff (Stoneman), hayley.wolff@cuanschutz.edu

Examples

library(CCAFE)

data("sampleDat")
sampleDat <- as.data.frame(sampleDat)

nCase_sample = 16550
nControl_sample = 403923

# get the estimated case and control MAFs
se_method_results <- CaseControl_SE(data = sampleDat,
                                    N_case = nCase_sample,
                                    N_control = nControl_sample,
                                    OR_colname = "OR",
                                    SE_colname = "SE",
                                    chromosome_colname = "CHR",
                                    position_colname = "POS")

head(se_method_results)
#>    CHR       POS REF ALT true_maf_case true_maf_control       beta      SE
#> 1 chr1 226824710   C   T     2.048e-01        2.041e-01  0.0003441 0.01471
#> 2 chr1 117812346   G   A     4.480e-01        4.471e-01 -0.0013920 0.01196
#> 3 chr1 230838863   C   T     1.647e-01        1.642e-01  0.0042250 0.01620
#> 4 chr1  93121792 ATT   A     0.000e+00        0.000e+00 -0.1058000 3.53000
#> 5 chr1 240236388   G   A     1.114e-05        3.775e-05 -1.2540000 1.25300
#> 6 chr1  12385196   G   A     1.390e-02        1.402e-02 -0.0152700 0.05046
#>    gnomad_maf        OR true_maf_pop     MAF_case  MAF_control    MAF_total
#> 1 2.04050e-01 1.0003442 2.041276e-01 1.764905e-01 1.764405e-01 1.764425e-01
#> 2 4.45508e-01 0.9986090 4.471354e-01 3.263835e-01 3.266896e-01 3.266776e-01
#> 3 1.82612e-01 1.0042339 1.642197e-01 1.392353e-01 1.387297e-01 1.387496e-01
#> 4 1.07925e-04 0.8996046 0.000000e+00 2.513875e-06 2.794421e-06 2.783379e-06
#> 5 8.82145e-05 0.2853611 3.670262e-05 1.946826e-05 6.821993e-05 6.630105e-05
#> 6 1.15995e-02 0.9848460 1.401528e-02 1.250050e-02 1.269041e-02 1.268294e-02