Comparing Genetic Ancestry and Self-Described Race in African Americans Born in the United States and in Africa

Cancer Epidemiology, Biomarkers & Prevention
Volume 17, Issue 6 (June 2008)
pages 1329-1338
DOI: 10.1158/1055-9965.EPI-07-2505

Rona Yaeger
Herbert Irving Comprehensive Cancer Center

Alexa Avila-Bront
Department of Medicine
College of Physicians and Surgeons of Columbia University

Kazeem Abdul
Herbert Irving Comprehensive Cancer Center

Patricia C. Nolan
Department of Medicine
College of Physicians and Surgeons of Columbia University

Victor R. Grann
Department of Medicine
College of Physicians and Surgeons of Columbia University

Mark G. Birchette
Department of Biology
Long Island University, Brooklyn, New York

Shweta Choudhry
Department of Biopharmaceutical Sciences and Medicine
University of California-San Francisco, San Francisco, California

Esteban G. Burchard
Department of Biopharmaceutical Sciences and Medicine
University of California-San Francisco, San Francisco, California
Kenneth B. Beckman
Children’s Hospital Oakland Research Institute, Oakland, California

Prakash Gorroochurn
Department of Biostatistics
Columbia University Medical Center, New York, New York

Elad Ziv
Division of General Internal Medicine
University of California-San Francisco, San Francisco, California

Nathan S. Consedine
Department of Psychology
Long Island University, Brooklyn, New York

Andrew K. Joe
Herbert Irving Comprehensive Cancer Center

Genetic association studies can be used to identify factors that may contribute to disparities in disease evident across different racial and ethnic populations. However, such studies may not account for potential confounding if study populations are genetically heterogeneous. Racial and ethnic classifications have been used as proxies for genetic relatedness. We investigated genetic admixture and developed a questionnaire to explore variables used in constructing racial identity in two cohorts: 50 African Americans and 40 Nigerians. Genetic ancestry was determined by genotyping 107 ancestry informative markers. Ancestry estimates calculated with maximum likelihood estimation were compared with population stratification detected with principal components analysis. Ancestry was approximately 95% west African, 4% European, and 1% Native American in the Nigerian cohort and 83% west African, 15% European, and 2% Native American in the African American cohort. Therefore, self-identification as African American agreed well with inferred west African ancestry. However, the cohorts differed significantly in mean percentage west African and European ancestries (P < 0.0001) and in the variance for individual ancestry (P ≤ 0.01). Among African Americans, no set of questionnaire items effectively estimated degree of west African ancestry, and self-report of a high degree of African ancestry in a three-generation family tree did not accurately predict degree of African ancestry. Our findings suggest that self-reported race and ancestry can predict ancestral clusters but do not reveal the extent of admixture. Genetic classifications of ancestry may provide a more objective and accurate method of defining homogenous populations for the investigation of specific population-disease associations.


Genome-wide case-control association studies provide a powerful tool for investigating possible genetic factors that may contribute to the health disparities observed among different racial and ethnic populations. Populations with different ancestral backgrounds may carry different genetic variants, and these may contribute to the variations in disease incidence and outcomes seen in specific racial and ethnic groups (1). Association studies can most easily identify disease-associated alleles when study groups are genetically similar, sharing a similar ancestral background (2). However, individual ancestry is not an easily assayed, simple category; consequently, race continues to be used as a proxy for genetic relatedness in clinical and other biological studies (3-6). There is currently no consensus on how best to examine or characterize different racial or ethnic groups when designing and conducting such studies.

Two main approaches have been used to approximate individual ancestry in biological studies: (a) using self identified race and ethnicity, which may capture common environmental influences as well as ancestral background, and (b) genotyping a panel of markers that show large frequency differentials between major geographic ancestral groupings (7, 8). Both approaches have limitations. Self-identified racial categories may not always consistently predict ancestral population clusters, and evidence suggests that it may take large sample sizes and numerous markers to describe genetic clusters that correspond to self-identified race and ethnicity groupings (9-11). Racial categories are also imprecise and inconsistent, because they may potentially vary within the same individual over time (12, 13). Furthermore, their use risks reinforcing racial divisions in society. On the other hand, more objective analyses that genotype markers that are highly informative for ancestry may not be economically practical and are limited by the requirement of serum or fresh tissue for DNA extraction. Genetically determined ancestry may not capture unmeasured social factors that may affect differences in health outcomes. There are also unique ethical challenges when linking biological phenotypes with genetic markers for specific racial groups, and caution must always be used when attributing biological differences (e.g., disease risk and treatment response) to different populations.

Understanding the ancestral background of study subjects is most important in genetic studies of admixed populations, such as African Americans, who represent an admixture of Africans, Europeans, and Native Americans (14). Genetic studies have shown that African Americans form a diverse group with percent European admixture estimated to range between 7% and 23% (14-16). Genotyping of self-identified African Americans participating in the Cardiovascular Health Study revealed that among self-reported Africans there are differences in genetic ancestry that are correlated with some clinically important endpoints (15).


The African American cohort in our study had a mean of 15% European admixture, which is consistent with previous reports of a range of 7% to 23% European admixture among U.S. African Americans (14-16). Of note, the estimates of 4% European and 1% Native American ancestry in the Nigerian population is likely due to bias in MLE due to the limited number of markers. We found that among participants there was a significantly higher proportion of admixture and higher variability in admixture proportions in the U.S.-born African American cohort compared with a population that emigrated from Africa (that is, Nigerians; Table 3). The significant variation in individual ancestry estimates among the African American cohort suggests that this group, like the Cardiovascular Health Study African American cohort (15), represents a diverse population consisting of several subpopulations. For participation in the African American cohort, subjects identified both parents as African Americans who were born in the United States. Although data regarding grandparental race were not used to screen study participation, these data were collected through a three-generation family tree during administration of the questionnaire. In this study population, all African American subjects described that the race of at least three of their four grandparents was consistent with African ancestry. Individuals and society have historically classified children of mixed-race ancestry as African American, even when one parent is Caucasian, Asian, or Native American. For African Americans, this is a remnant of the ‘‘Jim Crow’’ laws and the ‘‘One Drop’’ rule or ‘‘Rule of Hypodescent.’’ Thus, identification as African American would still occur in cases where the parents and grandparents were of mixed-race ancestry. This could also contribute to the greater European admixture and greater admixture variability seen in the African American cohort…

Categorization of humans in biomedical research: genes, race and disease

Genome Biology 2002
Volume 3, Number 7
Print ISSN 1465-6906; Online ISSN 1465-6914
DOI: 10.1186/gb-2002-3-7-comment2007

Neil Risch
Department of Genetics
Stanford University School of Medicine, Stanford, California

Esteban Burchard
Department of Medicine
University of California, San Francisco, California

Elad Ziv
Department of Medicine
University of California, San Francisco, California

Hua Tang
Department of Statistics
Stanford University, Stanford, California


A debate has arisen regarding the validity of racial/ethnic categories for biomedical and genetic research. Some claim ‘no biological basis for race’ while others advocate a ‘race-neutral’ approach, using genetic clustering rather than self-identified ethnicity for human genetic categorization. We provide an epidemiologic perspective on the issue of human categorization in biomedical and genetic research that strongly supports the continued use of self-identified race and ethnicity.

A major discussion has arisen recently regarding optimal strategies for categorizing humans, especially in the United States, for the purpose of biomedical research, both etiologic and pharmaceutical. Clearly it is important to know whether particular individuals within the population are more susceptible to particular diseases or most likely to benefit from certain therapeutic interventions. The focus of the dialogue has been the relative merit of the concept of ‘race’ or ‘ethnicity’, especially from the genetic perspective. For example, a recent editorial in the New England Journal of Medicine [1] claimed that “race is biologically meaningless” and warned that “instruction in medical genetics should emphasize the fallacy of race as a scientific concept and the dangers inherent in practicing race-based medicine.” In support of this perspective, a recent article in Nature Genetics [2] purported to find that “commonly used ethnic labels are both insufficient and inaccurate representations of inferred genetic clusters.” Furthermore, a supporting editorial in the same issue [3] concluded that “population clusters identified by genotype analysis seem to be more informative than those identified by skin color or self-declaration of ‘race’.” These conclusions seem consistent with the claim that “there is no biological basis for ‘race'” [3] and that “the myth of major genetic differences across ‘races’ is nonetheless worth dismissing with genetic evidence” [4]. Of course, the use of the term “major” leaves the door open for possible differences but a priori limits any potential significance of such differences.

In our view, much of this discussion does not derive from an objective scientific perspective. This is understandable, given both historic and current inequities based on perceived racial or ethnic identities, both in the US and around the world, and the resulting sensitivities in such debates. Nonetheless, we demonstrate here that from both an objective and scientific (genetic and epidemiologic) perspective there is great validity in racial/ethnic self-categorizations, both from the research and public policy points of view…

…Admixture and genetic categorization in the United States…

What are the implications of these census results and the admixture that has occurred in the US population for genetic categorization in biomedical research studies in the US? Gene flow from non-Caucasians into the US Caucasian population has been modest. On the other hand, gene flow from Caucasians into African Americans has been greater; several studies have estimated the proportion of Caucasian admixture in African Americans to be approximately 17%, ranging regionally from about 12% to 23% [22]. Thus, despite the admixture, African Americans remain a largely African group, reflecting primarily their African origins from a genetic perspective. Asians and Pacific Islanders have been less influenced by admixture and again closely represent their indigenous origins. The same is true for Native Americans, although some degree of Caucasian admixture has occurred in this group as well [23]…

Genetic Ancestry in Lung-Function Predictions

New England Journal of Medicine
DOI: 10.1056/NEJMoa0907897

Rajesh Kumar, M.D.
Max A. Seibold, Ph.D.
Melinda C. Aldrich, Ph.D., M.P.H.
L. Keoki Williams, M.D., M.P.H.
Alex P. Reiner, M.D.
Laura Colangelo, M.S.
Joshua Galanter, M.D.
Christopher Gignoux, M.S.
Donglei Hu, Ph.D.
Saunak Sen, Ph.D.
Shweta Choudhry, Ph.D.
Edward L. Peterson, Ph.D.
Jose Rodriguez-Santana, M.D.
William Rodriguez-Cintron, M.D.
Michael A. Nalls, Ph.D.
Tennille S. Leak, Ph.D.
Ellen O’Meara, Ph.D.
Bernd Meibohm, Ph.D.
Stephen B. Kritchevsky, Ph.D.
Rongling Li, M.D., Ph.D., M.P.H.
Tamara B. Harris, M.D.
Deborah A. Nickerson, Ph.D.
Myriam Fornage, Ph.D.
Paul Enright, M.D.
Elad Ziv, M.D.
Lewis J. Smith, M.D.
Kiang Liu, Ph.D.
Esteban González Burchard, M.D., M.P.H.


Background Self-identified race or ethnic group is used to determine normal reference standards in the prediction of pulmonary function. We conducted a study to determine whether the genetically determined percentage of African ancestry is associated with lung function and whether its use could improve predictions of lung function among persons who identified themselves as African American.

Methods We assessed the ancestry of 777 participants self-identified as African American in the Coronary Artery Risk Development in Young Adults (CARDIA) study and evaluated the relation between pulmonary function and ancestry by means of linear regression. We performed similar analyses of data for two independent cohorts of subjects identifying themselves as African American: 813 participants in the Health, Aging, and Body Composition (HABC) study and 579 participants in the Cardiovascular Health Study (CHS). We compared the fit of two types of models to lung-function measurements: models based on the covariates used in standard prediction equations and models incorporating ancestry. We also evaluated the effect of the ancestry-based models on the classification of disease severity in two asthma-study populations.

Results African ancestry was inversely related to forced expiratory volume in 1 second (FEV1) and forced vital capacity in the CARDIA cohort. These relations were also seen in the HABC and CHS cohorts. In predicting lung function, the ancestry-based model fit the data better than standard models. Ancestry-based models resulted in the reclassification of asthma severity (based on the percentage of the predicted FEV1) in 4 to 5% of participants.

Conclusions Current predictive equations, which rely on self-identified race alone, may misestimate lung function among subjects who identify themselves as African American. Incorporating ancestry into normative equations may improve lung-function estimates and more accurately categorize disease severity. (Funded by the National Institutes of Health and others.)

…There are some important limitations of our study. First, our analysis does not address population groups other than self-identified African Americans, such as Latinos, who have more complex patterns of ancestral admixture. Second, the association between lung function and ancestry found in our study may be the result of factors other than genetic variation, such as premature birth, prenatal nutrition, socioeconomic status, and other environmental factors. Third, we did not study a replication population with the same age range as that of the CARDIA cohort. Thus, we may have overestimated the association between ancestry and lung function in the CARDIA participants, who were young adults. Finally, some researcher groups used different statistical approaches to estimate ancestry in their respective study populations. We have found previously, however, that different approaches (e.g., Markov models and maximum-likelihood estimation) produce highly correlated results from the same set of markers. The consistency of our findings across three cohorts, despite the different methods for estimating ancestry, underscores the robustness of the association with ancestry…

The Importance of Race and Ethnic Background in Biomedical Research and Clinical Practice

New England Journal of Medicine
Volume 348, Number 12
pages 1170-1175

Esteban González Burchard, M.D.
Elad Ziv, M.D.
Natasha Coyle, Ph.D.
Scarlett Lin Gomez, Ph.D.
Hua Tang, Ph.D.
Andrew J. Karter, Ph.D.
Joanna L. Mountain, Ph.D.
Eliseo J. Pérez-Stable, M.D.
Dean Sheppard, M.D.
Neil Risch, Ph.D.

A debate has recently arisen over the use of racial classification in medicine and biomedical research. In particular, with the completion of a rough draft of the human genome, some have suggested that racial classification may not be useful for biomedical studies, since it reflects “a fairly small number of genes that describe appearance” and “there is no basis in the genetic code for race.” In part on the basis of these conclusions, some have argued for the exclusion of racial and ethnic classification from biomedical research. In the United States, race and ethnic background have been used as cause for discrimination, prejudice, marginalization, and even subjugation. Excessive focus on racial or ethnic differences runs the risk of undervaluing the great diversity that exists among persons within groups. However, this risk needs to be weighed against the fact that in epidemiologic and clinical research, racial and ethnic categories are useful for generating and exploring hypotheses about environmental and genetic risk factors, as well as interactions between risk factors, for important medical outcomes. Erecting barriers to the collection of information such as race and ethnic background may provide protection against the aforementioned risks; however, it will simultaneously retard progress in biomedical research and limit the effectiveness of clinical decision making.

Race and Ethnic Background as Geographic and Sociocultural Constructs with Biologic Ramifications

Definitions of race and ethnic background have often been applied inconsistently. The classification scheme used in the 2000 U.S. Census, which is often used in biomedical research, includes five major groups: black or African American, white, Asian, native Hawaiian or other Pacific Islander, and American Indian or Alaska native. In general, this classification scheme emphasizes the geographic region of origin of a person’s ancestry. Ethnic background is a broader construct that takes into consideration cultural tradition, common history, religion, and often a shared genetic heritage…

Sociocultural Correlates of Race and Ethnic Background

The racial or ethnic groups described above do not differ from each other solely in terms of genetic makeup, especially in a multiracial and multicultural society such as the United States. Socioeconomic status is strongly correlated with race and ethnic background and is a robust predictor of access to and quality of health care and education, which, in turn, may be associated with differences in the incidence of diseases and the outcomes of those diseases. For example, black Americans with end-stage renal disease are referred for renal transplantation at lower rates than white Americans. Black Americans are also referred for cardiac catheterization less frequently than white Americans. In some cases, these differences may be due to bias on the part of physicians and discriminatory practices in medicine. Nonetheless, racial or ethnic differences in the outcomes of disease sometimes persist even when discrepancies in the use of interventions known to be beneficial are considered. For example, the rate of complications from type 2 diabetes mellitus varies according to racial or ethnic category among members of the same health maintenance organization, despite uniform utilization of outpatient services and after adjustment for levels of education and income, health behavior, and clinical characteristics. The evaluation of whether genetic (as well as nongenetic) differences underlie racial disparities is appropriate in cases in which important racial and ethnic differences persist after socioeconomic status and access to care are properly taken into account…

…Racially Admixed Populations

Although studies of population genetics have clustered persons into a small number of groups corresponding roughly to five major racial categories, such classification is not completely discontinuous, because there has been intermixing among groups both over the course of history and in recent times. In particular, genetic admixture, or the presence in a population of persons with multiple races or ethnic backgrounds, is well documented in the border regions of continents and may represent genetic gradations (clines) — for example, among East Africans (e.g., Ethiopians) and some central Asian groups. In the United States, mixture among different racial groups has occurred recently, although in the 2000 U.S. Census, the majority of respondents still identified themselves as members of a single racial group. Genetic studies of black Americans have documented a range of 7 to 20 percent white admixture, depending on the geographic location of the population studied. Despite the admixture, black Americans, as a group, are still genetically similar to Africans. Hispanics, the largest and fastest growing minority population in the United States, are an admixed group that includes white and Native American ancestry, as well as African ancestry. The proportions of admixture in this group also vary according to geographic region.

Although the categorization of admixed groups poses special challenges, groups containing persons with varying levels of admixture can also be particularly useful for genetic-epidemiologic studies. For example, Williams et al. studied the association between the degree of white admixture and the incidence of type 2 diabetes mellitus among Pima Indians. They found that the self-reported degree of white admixture (reported as a percentage) was strongly correlated with protection from diabetes in this population. Furthermore, as noted above, information on race or ethnic background can provide important clues to effects of culture, access to care, and bias on the part of caregivers, even in genetically admixed populations. It is also important to recognize that many groups (e.g., most Asian groups) are highly underrepresented both in the population of the United States and in typical surveys of population genetics, relative to their global numbers. Thus, primary categories that are relevant for the current U.S. population might not be optimal for a globally derived sample…

