| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Departments of Community and Preventive Medicine [J. C., G. B., J. G.] and Microbiology [J. G. W.], Mount Sinai School of Medicine, New York, New York 10029, and Roche Molecular Systems, Inc., Alameda, California 94501 [S. G., R. H.]
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
90%) form of human sequence variation, will be constructed by the year 2003. This map will offer a powerful tool for identifying genes that make small but significant contributions to disease risk, for understanding relationships between genetic variation and diseases, and in turn for changing the future of disease prevention and treatment (2)
. Studying genome-wide sequence variations associated with human disease calls for the rapid development of efficient technologies that can identify subtle genetic risk factors that go undetected in existing study designs that use fewer markers and limited sample sizes. The ideal technology should provide for the rapid and efficient scoring of known SNPs in a large number of samples. Although there are a number of high-throughput SNP analysis strategies in development (3) , two competing molecular strategies dominate the field (4) . One approach is to identify and/or type multiple polymorphisms one person at a time using, e.g., high-density oligonucleotide hybridization arrays (5) . Array hybridization, which relies on the difference between hybridization of matched and mismatched products to allele-specific oligonucleotides on the array, is powerful in SNP identification and has the advantage of maintaining individual information. However, the rate-limiting step for detecting SNPs is the PCR amplification, which has limited capacity for multiplexing. It addition, heterozygosity detection may not be completely foolproof for all SNPs (6) , and the required amount of DNA for testing is substantial. An alternative high-throughput strategy is to pool equal amounts of DNA from multiple individuals and then type one marker at a time. Pooled DNA samples have been used successfully with both microsatellite markers (7) and SNPs (8 , 9) , using fluorescent probes (10 , 11) , and capillary-based single-strand conformation polymorphism analysis (12) .
Germer et al. (13) have developed a novel kinetic PCR method for pooled DNA that is capable of assessing SNP frequencies with high precision and efficiency. The method is accurate, time-saving, and inexpensive, requiring no labeled probes. It requires only a fraction of the genomic DNA from each individual needed by conventional genotyping methods without the need for SNP-specific optimization and post-PCR processing. It promises to be a highly efficient alternative that allows detection of the relatively weak but common genetic associations expected for complex diseases in genetic epidemiological studies. We demonstrate here the successful implementation of this technology in a population study in which we blind-tested the technology using a pool of DNA from 269 individuals that we had genotyped for the PON1 Q191R polymorphism by the PCR-RFLP method. We pooled these individuals by their race/ethnicity to test the flexibility of various pooling strategies important in studying gene-environment interactions. We also discuss the importance and feasibility of applying this method to identifying disease genes as well as studying gene-environment and gene-gene interactions in genetic epidemiological studies.
| Materials and Methods |
|---|
|
|
|---|
Study Population and DNA Samples.
The study population was derived from prenatal patients in New York City who were participants in an ongoing cohort study on the effect of maternal exposure to pesticides and other toxicants on childhood neurodevelopment as part of the Mount Sinai Childrens Environmental Health Center. All subjects gave informed consent for measurement of PON1 genotypes as part of the study. The research protocol was approved by the Institutional Review Board of the Mount Sinai School of Medicine. Paraoxonase 1, the product of the PON1 gene, is a critical enzyme for inactivating neurotoxic intermediates in the metabolism of organophosphate pesticides, and the activity on various substrates is affected by polymorphisms in the PON1 gene. The study population consisted of 56 Caucasians, 86 African-Americans, and 127 Hispanics of Caribbean origin (mostly Puerto Ricans and Dominicans) from whom whole blood was collected. Genomic DNA was extracted from the buffy coat and purified with QIAamp blood kits (Qiagen) as described by the manufacturer.
Individual Genotypes.
Individual genotypes of the PON1 Q191R polymorphism were determined by a PCR-RFLP-based assay (21
, 22)
. In brief, genomic DNA was amplified using PCR primers 5'-GTATGTTTTAATTGCAGTTTGAA-3'and 5'-TGAAATGTTGATTCCATTAGCAA-3', where sequences with terminal AA sequences were chosen to suppress primer-dimer formation. Standard cycling conditions (1 min at 94°C, 1 min at 55°C, 3 min at 72°C) were used for Taq DNA polymerase in the buffer supplied by the manufacturer. The 207-bp PCR products were cleaved with AlwI and analyzed by fluorography after size fractionation on 1.2% agarose gels.
DNA Pooling.
Individual DNA concentrations were determined from absorbance spectra measured with a Hewlett-Packard diode array spectrophotometer. Pooled DNA was generated by mixing 100 ng of DNA from individual samples. DNA pools were created for each of the three racial/ethnic groups (Caucasian, African-American and Hispanic), and each pooling was replicated independently by three investigators at the Mount Sinai School of Medicine. Thus, a total of nine pools were generated (3 races/ethnicities x 3 replicates); all subsequent measurements were carried out using these pools.
Kinetic PCR with Pooled DNA.
In one set of experiments, kinetic PCR was carried out on aliquots of the nine DNA pools at Roche Molecular Systems in a GeneAmp 5700 Sequence Detection System (PE Applied Biosystems), using a "Gold" version of Stoffel Fragment DNA polymerase (23)
as described previously (13)
. All other experiments were carried out at Mount Sinai School of Medicine on a LightCycler (Roche Molecular Biochemicals). Two different DNA polymerases were tested with the latter platform: FastStart Taq (Roche Molecular Biochemicals) and AmpliTaq Gold (PE Applied Biosystems). All three DNA polymerases required heat activation, one of the available methods for achieving hot starts and minimizing primer-dimer formation.
The primers for the PON1 Q191R polymorphism were as follows:
5'-TATTTTCTTGACCCCTACTTACA-3' (allele specific for 191Q)
5'-TTTCTTGACCCCTACTTACG-3' (allele specific primer for 191R)
5'-CCACGCTAAACCCAAATACATCTC-3') reverse common primer)
Reactions were assembled using micropipettors (Jencons, Ltd.). A basic master mix for the analyses on the LightCycler contained 1x AmpliTaq Gold buffer supplemented to final concentrations of 4 mM MgCl2, 2% glycerol, 1x BSA (New England Biolabs), 5 units/20 µl of DNA polymerase, 0.5 µM reverse primer, 200 µM each deoxynucleotide triphosphate (with dUTP replacing dTTP), and 0.25x SYBR Green I (Molecular Probes). Two allele-specific master mixes were produced by addition of one or the other of the allele-specific primers to 0.5 µM in the basic master mix. Finally, individual 20-µl PCR solutions were prepared by the addition of 20 ng of pooled DNA template to an aliquot of one or the other of the allele-specific master mixes. The cycling condition included a heating step at 95° (4 min for FastStart Taq and 9 min for AmpliTaq Gold DNA) followed by 45 cycles of 25 s at 58°C, 25 s at 72°C, and 15 s at 95°C.
LightCycler Data Analysis.
To determine the allele frequency in a pooled DNA sample, four kinetic PCR reactions were carried out in each of the two allele-specific master mixes. Ten replicate measurements of a heteroduplex sample were used to control for specificity of allele-specific PCR. The raw data were exported as an Excel spreadsheet, which gave the fluorescence as a function of cycle number (C) in each sample. The C value, reported as the average of four replicate runs, was determined as (M - I)/S, where M is the logarithmic mean of fluorescence signals, and I and S are the slope and intercept in the linear range of the PCR curve, respectively (Fig. 1)
. A spreadsheet patch to overlay onto LightCycler export data is available by e-mail upon request.4
|
1,
2,
1', and
2' be the corresponding SDs for the cycle numbers. The cycle difference then is
Ct = (C1 - C2) - (C1' - C2') and the SD, in
Ct, is 
Ct and is calculated from the weighted root mean square of the SDs of the cycle numbers.
Allele Frequencies.
Let F be the allele frequency matching primer 1 (191R allele):
![]() |
F:
![]() |
m:
![]() |
Sampling Error and SE of the Means.
The sampling error in the allele frequency is defined here for a sample of size n (two alleles each) as (24)
:
![]() |
:
![]() |
Correction for DNA Polymerase Allele Specificity.
The maximum values for
Ct,
Ct(1), and
Ct(2) were determined using primers 1 and 2, respectively, on 191R and 191Q homozygous controls. The signs of
Ct(1) and
Ct(2) were taken to be positive. Then:
![]() |
| Results |
|---|
|
|
|---|
|
|
|
|
Ct values of 5.7 and 6.0 for the Q allele and 7.5 and 7.5 for the R allele, respectively. Allele frequencies measured with these two polymerases on a LightCycler, uncorrected and corrected for DNA polymerase allele specificity, are presented in Table 5
m) for Caucasians, African-Americans, and Hispanics, respectively.
|
1500 individuals. | Discussion |
|---|
|
|
|---|
30,000 common coding SNPs in the genome (25)
. By comparing allele frequencies of SNPs between a diseased and a healthy population, one can assess whether the gene is related to the disease and the magnitude of the risk associated with it. There are four main reasons for the increasing popularity of SNPs as markers in genetic analysis: (a) compared with microsatellite markers, SNPs are far more prevalent in the genome; (b) some of the SNPs are located in functional domains of genes that directly affect protein structure or expression levels and may, therefore, represent candidate alterations for genetic mechanisms in disease; (c) SNPs are inherited stably compared with microsatellite markers; (d) SNPs are easily adaptable for high-throughput genotyping, offering sufficient power for genetic analyses. One approach that can greatly reduce laboratory effort and increase efficiency is DNA pooling, in which DNA from multiple individuals is pooled before genotyping. Such a strategy has been shown to be effective in identifying disease-related genes in several settings, including Mendelian founder mutations (7 , 26) as well as complex diseases (8) . Rather than assessing each individuals genotype, allele frequency of the tested marker is measured in a pool of DNA. For example, affected individuals can be grouped, as can unaffected individuals. Allele frequencies of the tested SNPs can be ascertained in each group and compared. Association with disease is implied if a difference in allele frequency between the pools is detected. Estimation of allele frequency in only two pools of DNA rather than in a large number of subjects individually can achieve large savings in both labor and materials, especially individual DNA.
To successfully implement DNA pooling in association studies, it is crucial to develop a method that is capable of accurately measuring the allele frequency in a pooled sample. There are several kinetic PCR-based approaches that permit SNP frequency determination in a single PCR reaction, including TaqMan (10 , 27) and molecular beacons probes (28 , 29) . The major disadvantage of these methods is that both require fluorescent labeling of probes, which significantly increases the expense of the assay. Another alternative is allele-specific kinetic PCR, first developed by Germer et al. (13) , in which the allele frequency of a SNP is reflected by the difference in PCR cycles needed to generate detectable PCR product with wild-type- and variant-specific primers. This method does not require expensive fluorescent probes and, in turn, reduces costs. We have demonstrated the feasibility and flexibility of implementing this technology in a population study and have demonstrated that this technology offers a precise tool for conducting genetic epidemiological research. The advantages are as follows.
High Throughput.
In this experiment, four PCR reactions were performed on three pools of individuals of different race/ethnicity from a total of 269 individuals to determine the allele frequency of for the PON1 Q191R polymorphism, increasing the throughput by >20-fold. A much higher than 20-fold increase in throughput can be easily achieved. As sample size increases to
1000, the sampling error becomes similar in magnitude to the measurement error. As a result, we can easily achieve a 250-fold increase in throughput by creating a pool of 1000 or more individual DNAs.
Accuracy.
We have demonstrated that allele-specific kinetic PCR is highly accurate for pooled DNA samples. Three DNA pools comprising samples from individuals of different ethnicity were prepared separately by three laboratory technicians, and the allele frequency of each pool was determined by four replicate kinetic PCRs (Table 4)
. The results for the three separate pools prepared by individual investigators were nearly identical and were highly comparable to those determined by PCR-RFLP analysis.
Conservation of Genomic DNA.
With the conventional PCR-RFLP method,
20 ng DNA is needed from each individual for each SNP tested. A modest scan of 1000 SNPs requires 20,000 ng of DNA from each individual. On the other hand, with kinetic PCR of pooled DNA, the quantity of each DNA sample, which depends on the number of tested markers and the size of the pool, can be drastically reduced. It can be calculated as 160 ng of DNA (i.e., 20 ng per reaction x 2 pools x 4 replicate reactions) multiplied by the number of SNPs, divided by the sample size of the pool. In the case of 1000 SNPs and 1000 samples, only 160 ng of DNA is needed. The amount of DNA can be decreased further by increasing the size of the pool. Conservation of valuable DNA resources is crucial for epidemiological studies, in which collecting blood is always difficult and the amount of genomic DNA is limited and precious.
Robustness and Flexibility.
Kinetic PCR of pooled DNA is a homogeneous assay, meaning that it requires no post-PCR processing. The method is amenable to introduction of new markers. We have demonstrated that the method is compatible with at least two kinetic PCR platforms (Table 3)
and three different thermostable DNA polymerases (Table 2)
. To facilitate the automation of SNP screening, a computerized primer design program can be implemented (30)
.
Implementation to Genetic Epidemiological Studies.
An ideal methodology for dissecting a complex disease thus requires the capability and flexibility to study both gene-environment and gene-gene interactions. In this report, we have discussed strategies of applying the methodology of kinetic PCR of pooled DNA to genetic epidemiological studies as well as its feasibility and limitations.
Before any laboratory experiments are carried out, a set of a priori hypotheses with sound biological rationale should be proposed. A panel of candidate markers (i.e., SNPs), either mechanism specific or genome wide, should be established. An initial screening of these SNPs would be performed, and their frequencies would be compared according to the disease status, i.e., the "affected" versus "unaffected" pool. Once disease-associated markers have been detected with different allele frequencies in two pools, one would then study their interactions with the environmental or other genetic factors in relation to pathogenesis of human diseases. Meanwhile, we have to bear in mind that "false negatives" may arise because certain phenotypes (e.g., disease) are only associated with genotypes (e.g., homozygous variant) not directly represented by allele frequency.
Gene-Environment Interactions.
The major downside of frequency determination by kinetic PCR on pooled DNA is the loss of individual information. Nevertheless, by stratifying samples according to potential risk factors, this technology is readily applicable to investigations of gene-environment interactions. On the basis of a priori hypotheses, pooling strategies would be designed according to exposure scenarios for the proposed environmental factors, e.g., smoking status, use of hormone replacement therapy, and levels of alcohol intake. For example, within the affected and unaffected pools, DNA could be pooled based on an individuals environmental exposure (e.g., "exposed" versus "unexposed"), which usually is available through questionnaires or, sometimes, biological markers. Thus, four pools would be created: "unaffected, unexposed"; "unaffected, exposed"; "affected, unexposed"; and "affected, exposed." Each pool should be matched for potential confounding factors, such as age and race/ethnicity. The importance of such matching is illustrated by the race/ethnicity distribution of PON1 Q191R alleles uncovered in this investigation. Allele frequencies in each pool could then be determined by kinetic PCR and compared among the groups. The presence of gene-environment interactions is implied if a tested marker is enriched in a "multiplicative fashion" in the affected, exposed pool compared with the unaffected, unexposed pool. A formal statistical test for interactions needs to be developed for such an investigation.
Unlike the conventional genetic epidemiological studies in which investigation of gene-environmental interactions is a postlaboratory process and can easily be explored through data stratification on a computer, studying interactions using pooled DNA involves reanalyzing DNA samples in the laboratory. In this case, it becomes crucial to establish a set of a priori hypotheses with strong biological rationale and then to carefully design the pooling strategy accordingly. This approach leaves little room for fishing expeditions in the data set and prevents overexploitation. In this study, we have demonstrated that repooling of DNA (in our case, by ethnicity) can easily be achieved; it offers sufficient flexibility to explore various gene-environment interactions.
Gene-Gene Interactions.
Loss of individual information through DNA pooling imposes difficulty in the investigation of gene-gene interaction. We would propose to first perform initial screening as discussed previously and select genes with different allele distribution between affected and unaffected pools. Because of this selection, a smaller panel of SNPs needs to be genotyped individually in the population. Depending on the number of disease-associated markers, scoring all of them individually still could be a formidable task. Most likely, however, a great majority of the markers tested in the initial screening will turn out be nonfunctional. The feasibility of the proposed strategy of combining kinetic PCR on pooled DNA and high-throughput genotyping needs to be demonstrated in further studies.
| Acknowledgments |
|---|
| Footnotes |
|---|
1 This research was sponsored in part by grants from the NIH (ES09584), the Environmental Protection Agency (R827039), and Roche Molecular Systems, Inc. Dr. Chen was supported in part by a Career Development Award from the National Cancer Institute (CA81750). ![]()
2 To whom requests for reprints should be addressed, at Department of Microbiology, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, New York, NY 10029-6574. Phone: (212) 241-7685. Fax: (212) 534-1684; E-mail: james.wetmur{at}mssm.edu ![]()
3 The abbreviations used are: SNP, single nucleotide polymorphism; RFLP, restriction fragment length polymorphism. ![]()
4 Send requests by e-mail to: james.wetmur{at}mssm.edu ![]()
Received 3/23/01; revised 10/18/01; accepted 10/26/01.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Hu, S. C. Sealfon, F. Hayot, C. Jayaprakash, M. Kumar, A. C. Pendleton, A. Ganee, A. Fernandez-Sesma, T. M. Moran, and J. G. Wetmur Chromosome-specific and noisy IFNB1 transcription in individual virus-infected human primary dendritic cells Nucleic Acids Res., August 2, 2007; (2007) gkm557v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Sharma, G. B. Jansen, N. A. Karrow, D. Kelton, and Z. Jiang Detection and characterization of amplified fragment length polymorphism markers for clinical mastitis in canadian holsteins. J Dairy Sci, September 1, 2006; 89(9): 3653 - 3663. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Brohede, R. Dunne, J. D. McKay, and G. N. Hannan PPC: an algorithm for accurate estimation of SNP allele frequencies in small equimolar pools of DNA using data from high density microarrays Nucleic Acids Res., September 30, 2005; 33(17): e142 - e142. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Cell Growth & Differentiation |