Portugués

September 2009 | Vol 6 | N.º 9 | CNIC-12 [ PDF (140 KB)]

The regulatory basis of common disease: how can genome-wide association studies help us understand cardiovascular disorders?

M. Eva Alonso and Miguel Manzanares
Correspondence:
Miguel Manzanares.
Department of Cardiovascular Developmental Biology
Melchor Fernandez Almagro 3, 28049 Madrid, Spain
Email mealonso@cnic.es, mmanzanares@cnic.es

Abstract
Understanding the genetic component of complex human diseases, such as type 2 diabetes or cardiovascular disorders, was a daunting task until the beginning of the 21st century. Although classical genetic analysis has proved successful in identifying genes and mutations that underlie monogenic diseases, it was not until the development of whole-genome association scans of single-nucleotide polymorphisms that we began to obtain a glimpse of the genes and networks involved in the most prevalent human diseases. To date, more than 200 genetic loci have been robustly and reproducibly associated with various disorders. However, in the vast majority of these cases, we still lack any understanding of how the identified variants relate to the disease aetiology. In addition, many of the loci uncovered by genome-wide association studies fall in intergenic regions, which complicates identification of the culprit gene. We now know that the non-coding fraction of the genome harbours multiple cis-regulatory elements that control the fine-tuned temporal and spatial transcription of genes. These elements are prime candidates for being altered in the disease-associated variants that do not map to coding regions. Uncovering these elements and how they relate to regulatory networks and pathways underlying common human pathologies is still a major challenge.

Background
Human genetics has taken a giant leap in the past few years with the advent of genome-scale analyses of variation that associate specific genes with a given disease in large cohorts. This progress has been a direct result of the efforts of the Human Genome Project, followed shortly by the Human HapMap Project, which provided information on millions of polymorphic sites consisting of single-base changes (single-nucleotide polymorphisms, or SNPs)1. New technologies are being developed to rapidly and comprehensively analyse the genetic variation defining patients’ individual genetic profile. Most importantly, and contrary to classical genetic analysis, efforts to identify the genetic contributions to disease susceptibility are developing even in the absence of an identified familial inheritance pattern. This approach to the screening of genetic variation in patients with and without a given disease has been termed genome-wide association studies (GWAS)2. In these studies, common variants are analysed and can serve as guides to linked loci that are directly related to a disease via the linkage disequilibrium structure of the genome. Therefore, by analysing 500,000 SNPs, it is possible to identify common variants that associate with a small but statistically significant increase in disease risk3.

Whole-genome association studies and emerging sequencing technologies are being used to describe the genetic architecture of common diseases, revealing new susceptibility genes and offering new clues about the mechanisms behind a wide range of health disorders. The strategic focus of common disease genetics is progressing from the basic identification of susceptibility genes to understanding the extent to which rare and common variants explain inherited susceptibility to common diseases, discovering new disease mechanisms, and pioneering and evaluating the transfer of these advances to potential clinical application4,5.

Identification of variations for common diseases
GWAS have reproducibly identified hundreds of associations of common genetic variants with more than 80 diseases and traits. The rapid increase in the number of GWAS provides an unprecedented opportunity to examine the potential impact of common genetic variants on complex disease by systematically cataloguing and summarising key characteristics of the observed associations and the trait/disease-associated SNPs underlying them. Although some of these aspects have been examined on a smaller scale for individual diseases and syndromes, such as type 2 diabetes6, cancer7-10, inflammatory bowel disease11, restless legs syndrome12 and atrial fibrillation13, further insight will be revealed as certain genomic regions are repeatedly associated with different diseases. Some of these associations are not surprising, such as the recurrent association of the HLA/MHC region on chromosome 6p21 with several autoimmune diseases, or that of the gene desert in the vicinity of MYC on 8q24 with various cancers14,15. Others have been more unexpected; for example, the region of 9p21 mapping near the tumour suppressor INK4/ARF locus is associated with a number of seemingly unrelated conditions, such as type 2 diabetes, coronary heart disease and glioma. A possible explanation for this clustering of disease associations is that the variants affect birth weight, which has been described as a risk factor for all three diseases16. As more loci associated with different diseases are identified, we will surely see more examples in which a common aetiology can be identified for various pathologies.

From association to function
Despite the success of GWAS in identifying novel disease-related loci, determining the exact sequence variants that cause an increased risk and how they do so is far from being achieved. As mentioned above, GWAS search for common SNPs that statistically associate with a disease or trait but are not necessarily the causative variants. These could be rare SNPs, with very low occurrence in the population, or even small structural variants that are in linkage disequilibrium with the tested SNPs. Identification of the true causal SNPs might be difficult; it involves the re-sequencing of the region from multiple samples. Nevertheless, this task will be aided by the near completion of the 1,000 Genome Project, which will dramatically increase the coverage of common genetic variants across the genome15. An even more complex task, necessarily linked to the above, will be to understand the role of these variants in disease aetiology. In this task, the use of functional assays—aided, if possible, by the use of animal models—will be needed despite being demanding and time consuming.

How to address regulatory variation
When a sequence variant is located in the coding region of a gene and, furthermore, causes a change in the coded amino acid residue, it is easy to envision how this variation can affect gene function. In fact, disease-associated SNPs located in coding regions are enriched in non-synonymous sites14. However, the vast majority (nearly 90%) of reported trait/disease-associated SNPs fall into non-coding regions, either intronic or intergenic14, which suggests that the disease-associated variants have a regulatory effect.

Non-coding genomic DNA harbours multiple regulatory elements that are crucial for proper gene function by controlling the levels and temporal-spatial patterns of transcription. These can include enhancer and silencer elements, insulators, boundaries or nuclear matrix attachment regions17. However, when analysing regulatory variation, we face additional problems. While the causal variants would be in strong linkage disequilibrium with the disease-associated SNP, this does not mean that the gene whose expression is affected by the causal variant will also be linked, or even nearby15. Hence, the regulatory function of disease-associated genomic regions has been elucidated in only a few studies. One example is the finding that the likely colorectal-risk causal variant, identified by re-sequencing the region to be located near the SMAD7 gene, alters the activity of an evolutionarily conserved transcriptional enhancer that drives expression in the rectum in a Xenopus transgenic assay18. Another example is the 8q24 cancer-risk variant, which also possesses enhancer activity and physically interacts with the MYC gene, located more than 300 kilobases away. Furthermore, in this case, there is evidence that the causal SNP changes the ability of the TCF7L2 transcription factor to bind to a consensus site located on this enhancer19,20.

An example of how we can start to address the regulatory capacities of non-coding disease–associated variants with a combination of evolutionary genomics and functional transgenic assays in model organisms is illustrated in Fig. 1. In a GWAS on type 2 diabetes6, an association was found between chromosome 10 in the IDE-KIF11-HHEX region and two SNPs located in a 50 kb interval downstream of HHEX (Fig. 1A). This gene encodes a homeodomain-containing transcription factor that is involved in pancreas development in mice21, making it an obvious potential target of this non-coding variant. As part of a collaborative effort with groups from Bergen, Padua, Edinburgh and Seville, we investigated whether this region contained enhancer elements that could be related to HHEX function22. We used a multi-species genomic comparison of the region to identify non-coding sequence elements that are evolutionarily conserved. This approach, termed phylogenetic footprinting23, aims at identifying stretches of conserved sequence across different genomes. If these stretches are located in non-coding regions, the fact that they have been conserved during evolution argues strongly for them having a function that has been positively selected against the random mutational drift of the genome24. One such function is acting as an enhancer of gene transcription. When we analysed the HHEX downstream region, we found that the type 2 diabetes–associated SNPs are located in DNA regions that are conserved between human, mouse, chick and frogs (Fig. 1B). In order to test the regulatory capacities of these sequences, we ligated them to a minimal promoter that has no activity and the lacZ reporter gene. These constructs were used to generate transgenic mouse embryos in which we could observe beta-galactosidase activity (the product of the lacZ gene) in the developing pancreas (Fig. 1C, upper panels). We confirmed that these beta-galactosidase positive cells were located in the endocrine pancreas by double labelling with an antibody directed against glucagon (Fig. 1C, lower panels). In conclusion, we have described how the risk-associated region from HHEX is capable of acting as a tissue-specific transcriptional enhancer in the developing pancreas. Further work will be needed to identify the causal variants in this element and how these variants affect HHEX expression. Only then will we be able to understand how subtle changes in the expression of a developmentally regulated transcription factor can result in adult late-onset diseases such as type 2 diabetes.

Figura 1
FOTO AMPLIADA

Click in the image for enlarge

Figure 1. Conserved non-coding elements in the genomic region linked to HHEX and associated with increased risk of type 2 diabetes have regulatory activity in the developing pancreas.
A, Linkage disequilibrium diagram for the region from human chromosome 10q23 that contains HHEX (reproduced with permission from ref 6). B, Phylogenetic footprinting of the 100 kilobase region downstream of human HHEX as compared to the equivalent region from the mouse, the chicken and the amphibian Xenopus genomes. The position of the gene is indicated above the diagram. Peaks show the degree of sequence conservation. Those in blue correspond to the coding region, and those in pink indicate non-coding regions of more than 100 base pairs and 70% similarity in the indicated pairwise comparison. The diagram was generated using VISTA tools (http://genome.lbl.gov/vista). The red arrows in A and B point to the two SNPs in the region with the highest association to type 2 diabetes. C, Sections through the pancreas of 14.5-day mouse embryos that were transgenic for a lacZ reporter construct driven by the human conserved non-coding element containing the SNP rs1111875, which is located closest downstream to HHEX.The top two panels show lacZ activity in the pancreatic anlagen at low (left) and high (right) magnification. Bottom panels show double detection of lacZ activity (blue) and immunochemistry with anti-glucagon antibody (brown), showing that the genomic fragment from HHEX drives gene expression in the endocrine pancreas.

Genome-wide association studies and cardiovascular diseases
GWAS has only recently been used to analyse the genetic component of cardiovascular disease, as indicated by the fact that thus far only 22 such studies have been published on cardiovascular disorders, half of them this year (data from the National Human Genome Research InstituteNHGRI– catalogue of published genome-wide association studies, www.genome.gov/gwastudies). These studies include some not very unexpected findings, such as the strong association of a regulator of the nitric oxide synthase pathway (NOS1AP) with myocardial function25. However, there have beensome surprises, such as the association of a region near the tumour suppressors INK4/ARF with myocardial infarction26,27; this region has recently been shown to contain an enhancer of the poorly characterised non-coding RNA gene ANRIL28. However, it is expected that many more GWAS on cardiovascular disease will be carried out in the near future. Crucial to this is the availability of large cohorts that have been followed over years, such as in the Framingham Heart Study; the data from these studies can be mined with the genome-wide association tools that are now available29.

We have surveyed the literature for cases in which variants associated with cardiovascular disease are located in non-coding regions of the genome and examined their evolutionary conservation (Table 1). It is interesting that in a number of these cases, the nearest genes correspond to transcription factors and other genes involved in early development. Studying whether subtle defects that occur in the embryo can result in late-onset cardiac disease will be of great interest in the future.

Atrial fibrillation (AF), the most common sustained cardiac arrhythmia in humans, is characterised by chaotic electrical activity of the atria. Studies of familial AF have shown that it is caused by mutations in potassium-channel genes, but these mutations account for only a small fraction of all cases30. A genome-wide association study followed by replication studies in three populations of European descent and a Chinese population from Hong Kong demonstrated a strong association between two sequence variants on chromosome 4q25 and AF13. Both variants are adjacent to the PITX2 gene, a developmental gene coding for a homeodomain transcription factor that has a critical function in left-right asymmetry and heart development31. The regions in which the variants are located are conserved in all amniote vertebrates examined (Table 1), making them excellent candidates for regulatory elements. Two recent studies identified an intronic region of the ZFHX3 gene as a second AF-associated variant32,33. It is interesting that in one of these studies this gene was also associated with ischaemic stroke, and in other reports with various cancers as well as with Kawasaki disease. ZFHX3 codes for a homeodomain and zinc finger transcription factor but has been poorly studied, so it remains to be ascertained whether it has a common underlying role in the aetiology of these diseases.

Interestingly, other studies have also found associations between developmental genes and cardiovascular disorders. As an example, an association was found between hypertension and a region in a gene desert located more than 500 kilobases away from the T-box transcription factors TBX3 and TBX5 34, both of which have roles during heart development35. Gene deserts (large genomic regions devoid of known protein-coding genes 36) are often associated with complexly regulated transcription factors and harbour multiple regulatory elements. The gene-desert region associated with hypertension is conserved up to avians (Table 1). Another example is the association of a region in the vicinity of the flamingo-related cadherin CELSR2 with myocardial infarction37. This adhesion molecule has important roles in planar cell polarity during development and is strongly expressed in the mouse embryo38. Although no role has been established for CELSR2 in cardiovascular development, it is worth exploring this possibility.

Finally, two large studies of genomic association to variation in the QT interval39,40 found variants in non-coding regions near a number of ion channel genes that had previously been found to be related to familiar cases of AF or cardiac rhythm syndromes such as Brugada syndrome30,41. A number of these regions are conserved between human and mouse, and it is noteworthy that many of the genes encoding for subunits of these channels show specific expression patterns in the developing heart42. The study of how these genes are regulated during development and disease will be an exciting area of future research that will help us understand how embryonic patterning and tissue physiology are integrated.

Table 1
FOTO AMPLIADA

Click in the image for enlarge

Conclusions and future prospects
Recent GWAS have robustly demonstrated that common genetic variation contributes to the risk of developing diseases, and an increasing number of genomic regions have been shown to be associated with an increased risk of cardiovasculopathies. However, because the majority of GWAS are based on an analysis of tagging SNPs that enable common genetic variation to be interrogated, the genotyped SNPs associated with various diseases are not necessarily the functional variants14.

Although a formal proof will be required for any SNP associated with a disease to be considered functional, we emphasise that, for the case of non-coding variants, the expression patterns driven in transgenic animals by these genomic regions can yield useful clues, in addition to revealing their regulatory function through potential changes in transcription factor sites. In some cases, the affected genes have a complex spatiotemporal expression pattern and multiple roles in development and differentiation; thus, different SNPs within large regions can affect different roles of the same gene. Other functional assays, such as examination of the behaviour of non-coding regions in cell differentiation assays, will also need to be developed. Elucidation of the functional relationship between conserved non-coding elements and their targets genes may hold a key to understanding the disease mechanism uncovered by the analysis of common genomic variation.

The characterisation of new disease-associated loci may serve as the basis for future approaches to early detection of high-risk individuals. However, the power of genetic associations revealed by whole-genome scans is related as much to the development of new diagnostic tools as it is to the discovery of unforeseen genetic determinants. This knowledge will permit the exploration of novel targets for intervention, because such studies will uncover previously unknown pathways and networks underlying disease.

Referencias
1. International Hap Map Consortium. A haplotype map of the human genome. Nature 437, 1299-1320 (2005).
2. Wang, W.Y. et al. Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109-118 (2005).
3. Hardy, J. & Singleton, A. Genomewide association studies and human disease. N. Engl. J. Med. 360, 1759-1768 (2009).
4. Margulies, K.B., Bednarik, D.P. & Dries, D.L. Genomics, transcriptional profiling, and heart failure. J. Am. Coll. Cardiol. 53, 1752-1759 (2009).
5. Manolio, T.A., Brooks, L.D. & Collins, F.S. A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 118, 1590-1605 (2008).
6. Sladek R., et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881-885 (2007).
7. Broderick, P. et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat. Genet. 39, 1315-1317 (2007).
8. Eeles, R.A. et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat. Genet. 40, 316-321 (2008).
9. Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat. Genet. 39, 631-637 (2007).
10. Tomlinson, I.P. et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat. Genet. 40, 623-630 (2008).
11. Cho, J.H. The genetics and immunopathogenesis of inflammatory bowel disease. Nat. Rev. Immunol. 8, 458-466 (2008).
12. Winkelmann, J. et al. Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nat. Genet. 39, 1000-1006 (2007).
13. Gudbjartsson, D.F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353-357 (2007).
14. Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U S A 106, 9362-9367 (2009).
15. Ioannidis, J.P., Thomas, G. & Daly, M.J. Validating, augmenting and refining genome-wide association signals. Nat. Rev. Genet. 10, 318-329 (2009).
16. Shete, S. et al. Genome-wide association study identifies five susceptibility loci for glioma. Nat. Genet. 41, 899-904 (2009).
17. Alonso, M.E. et al. Understanding the regulatory genome. Int. J. Dev. Biol. doi: 10.1387/ijdb.072428ma (2008).
18. Pittman, A.M. et al. The colorectal cancer risk at 18q21 is caused by a novel variant altering SMAD7 expression. Genome Res. 19, 987-993 (2009).
19. Tuupanen, S. et al. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat. Gene.t 41, 885-890 (2009).
20. Pomerantz, M.M. et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat. Genet. 41, 882-884 (2009).
21. Bort, R. et al. Hex homeobox gene-dependent tissue positioning is required for organogenesis of the ventral pancreas. Development 131, 797-806 (2004).
22. Ragvin, A. et al. Long-range regulation links genomic type 2 diabetes and obesity risk regions to HHEX, SOX4 and IRX3. PNAS submitted (2009).
23. Zhang, Z. & Gerstein, M. Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements. J. Biol. 2, 11 (2003).
24. Boffelli, D., Nobrega, M.A. & Rubin, E.M. Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 5, 456-465 (2004).
25. Arking, D.E. et al. A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization. Nat. Genet. 38, 644-651 (2006).
26. McPherson, R. et al. A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488-1491 (2007).
27. Helgadottir, A. et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491-1493 (2007).
28. Jarinova, O. et al. Functional analysis of the chromosome 9p21.3 coronary artery disease risk locus. Arterioscler. Thromb. Vasc. Biol. (2009).
29. Cupples, L.A. et al. The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med. Genet. 8 Suppl 1, S1 (2007).
30. Roberts, R. Mechanisms of disease: genetic mechanisms of atrial fibrillation. Nat. Clin. Pract. Cardiovasc. Med. 3, 276-282 (2006).
31. Franco, D. & Campione, M. The role of Pitx2 during cardiac development. Linking left-right signaling and congenital heart diseases. Trends Cardiovasc. Med. 13, 157-163 (2003).
32. Benjamin, E.J. et al. Variants in ZFHX3 are associated with atrial fibrillation in individuals of European ancestry. Nat. Genet. 41, 879-881 (2009).
33. Gudbjartsson, D.F. et al. A sequence variant in ZFHX3 on 16q22 associates with atrial fibrillation and ischemic stroke. Nat. Genet. 41, 876-878 (2009).
34. Levy, D. et al. Genome-wide association study of blood pressure and hypertension. Nat. Genet. (2009).
35. Stennard, F.A. & Harvey, R.P. T-box transcription factors and their roles in regulatory hierarchies in the developing heart. Development 132, 4897-4910 (2005).
36. Nobrega, M.A. et al. Scanning human gene deserts for long-range enhancers. Science 302, 413 (2003).
37. Kathiresan, S. et al. Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat. Genet. 41, 334-341 (2009).
38. Crompton, L.A., Du Roure, C. & Rodriguez, T.A. Early embryonic expression patterns of the mouse Flamingo and Prickle orthologues. Dev. Dyn. 236, 3137-3143 (2007).
39. Newton-Cheh, C. et al. Common variants at ten loci influence QT interval duration in the QTGEN Study. Nat. Genet. 41, 399-406 (2009).
40. Pfeufer, A. et al. Common variants at ten loci modulate the QT interval duration in the QTSCD Study. Nat. Genet. 41, 407-414 (2009).
41. Ruan, Y., Liu, N. & Priori, S.G. Sodium channel mutations and arrhythmias. Nat. Rev. Cardiol. 6, 337-348 (2009).
42. Franco, D. et al. Divergent expression of delayed rectifier K(+) channel subunits during mouse heart development. Cardiovasc. Res. 52, 65-75 (2001).

Competing interests: The authors declared no competing interests.

Acknowledgements: We wish to thank Jose Luis Gomez Skarmeta (CABD-CSIC, Seville) for an extremely fruitful collaboration and constant encouragement and members of our lab and department at the CNIC for support and discussion. Work in our group is funded by the Spanish Government (BFU2008-00838 and CSD2007-00008), the Regional Government of Madrid (CAM S-SAL-0190-2006) and the ProCNIC Foundation.

 
Enseñanza