The Contribution of Neanderthals to Phenotypic Variation in Modern Humans
PlumX Metrics
Assessing
the genetic contribution of Neanderthals to non-disease phenotypes in
modern humans has been difficult because of the absence of large cohorts
for which common phenotype information is available. Using baseline
phenotypes collected for 112,000 individuals by the UK Biobank, we can
now elaborate on previous findings that identified associations between
signatures of positive selection on Neanderthal DNA and various modern
human traits but not any specific phenotypic consequences. Here, we show
that Neanderthal DNA affects skin tone and hair color, height, sleeping
patterns, mood, and smoking status in present-day Europeans.
Interestingly, multiple Neanderthal alleles at different loci contribute
to skin and hair color in present-day Europeans, and these Neanderthal
alleles contribute to both lighter and darker skin tones and hair color,
suggesting that Neanderthals themselves were most likely variable in
these traits.
Keywords
Introduction
Interbreeding
between Neanderthals and early modern humans has been shown to have
contributed about 2% Neanderthal DNA to the genomes of present-day
non-Africans. This Neanderthal DNA has apparently had both positive and
negative effects. Together with the rapid decrease in Neanderthal
ancestry after introgression, the depletion of Neanderthal DNA around
functional genomic elements in present-day human genomes suggests that a
large fraction of Neanderthal alleles are deleterious in modern humans.
,
,
,
However, recent studies have also identified a number of introgressed
Neanderthal alleles that have increased in frequency in modern humans
and that might contribute to genetic adaptation to new environments.
Adaptive variants in genes related to immunity, skin and hair
pigmentation, and metabolism have been identified.
,
,
,
,
,
,
,
The
majority of Neanderthal alleles in the genomes of people today are,
however, not strongly adaptive and are therefore present at low
frequencies (<2 28="" a="" address="" alleles="" and="" archaic="" are="" at="" available="" been="" blood-clotting="" both="" class="dropBlock reference-citations" clinical="" contributed="" contribution="" date="" depression="" different="" difficult="" disease="" disorders="" div="" electronic="" for="" frequencies="" frequent="" genotype="" genotypes="" has="" humans.="" in="" including="" individuals="" influence="" information="" is="" it="" large="" lesions="" less="" limited="" link="" loci="" low="" making="" medical="" modern="" neanderthal="" neanderthals="" number="" of="" or="" phenotype="" phenotypic="" populations.="" present-day="" protective="" recent="" records="" risk="" showed="" skin="" study="" such="" that="" the="" them="" these="" to="" traits.="" traits="" used="" variants="" variation.="" whom="">122>
However, evaluating the broader contribution of Neanderthals to common
phenotypic variation in modern humans, or inferring Neanderthal
phenotypes, has not been possible largely because of the limited number
of studies that collect genotype data together with common phenotype
information.
In addition to
collecting genotype data via a custom genotyping array, the UK Biobank
has collected baseline phenotypes, including traits related to physical
appearance, diet, sun exposure, and behavior, as well as disease, for
more than 500,000 people.
The pilot dataset including genotypes and phenotypes for more than
150,000 of the individuals was recently made available for study. Using
these data, we studied the contribution of Neanderthals to common human
phenotypic variation in 112,338 individuals from the UK Biobank to
determine the set of traits to which Neanderthals have contributed and
to evaluate the relative contribution of archaic and non-archaic alleles
to common phenotypic variation in modern humans.
Material and Methods
Datasets from the UK Biobank
We obtained genotype and phenotype data from the pilot phase of the UK Biobank project.
Genotyping was performed with two arrays (UK BiLEVE and UK Biobank
Axiom) that share 95% of markers, resulting in a merged dataset with
genotype information for 152,729 individuals across 822,111 genomic
sites.
Filtering Genotype Data
UK
Biobank quality control (QC) included tests for batch, array, plate,
and sex effects, as well as departures from Hardy-Weinberg equilibrium
and discordance across control replicates. We used information provided
by the UK Biobank to remove a total of 40,391 individuals; of these, 480
were related according to a kinship inference analysis, 17,308 had
significantly decreased heterozygosity levels, and 32,443 had
substantial non-European ancestry according to self-reported information
and a principle-component analysis of the SNP data. Extensive
documentation of the QC for these data is available on the UK Biobank’s
website.
Annotating Non-archaic and Archaic-like SNPs
A
total of 825,927 polymorphic sites were genotyped. We took a two-step
approach to annotate SNPs on the basis of whether they carried an allele
of putative archaic origin. First, we identified potentially
introgressed alleles by selecting SNPs that had one fixed allele in
Yoruba individuals, an African population with little to no inferred
Neanderthal DNA (1000 Genomes Project
14phase 3), and a different allele in a heterozygous or homozygous state in the genome of the Altai Neanderthal 15
and that segregated in any of the UK Biobank individuals (we refer to
these variants as archaic-like SNPs [aSNPs]). We then expanded this by
requiring that the identified aSNPs overlap confidently inferred tracts
of Neanderthal introgression in modern humans4
that have a Neanderthal posterior probability greater than 0.9 and a
length of at least 0.02 cM. In the construction of this introgression
map, a number of criteria were used to ensure that the identified
haplotypes were highly likely to be of introgressed origin: (1) alleles
were required to be shared between non-Africans and Neanderthals but not
be present in sub-Saharan Africans, (2) haplotype lengths had to be
consistent with admixture ∼50,000 years ago, and (3) haplotypes had to
have a lower divergence to a Neanderthal reference genome than to
African genomes.
We then collapsed
sets of SNPs that were in high linkage disequilibrium (LD) into one
representative tag SNP. To do so, we used PLINK
(parameters: --ld-window-r2 0.8 --ld-window 99999) and computed LD
between all SNPs among the 152,729 individuals by combining sets of SNPs
with r2 > 0.8 into clusters. For clusters with at least
one aSNP, we selected a random aSNP as the tag SNP. In clusters without
aSNPs, we chose a random tag SNP. Non-archaic SNPs and aSNPs with no
other SNPs in high LD were defined to be their own tag SNP. We
identified a total of 534,341 tag SNPs, of which 6,671 were of putative
archaic origin and 527,670 were of non-archaic origin.
To
ensure a robust correlation between genotypes and phenotypes, we
required each tag SNP to have a reasonable representation of both
alleles. We therefore kept all tag SNPs where at least 100 individuals
were heterozygous and at least 20 were homozygous for the minor allele,
resulting in 6,210 archaic-like tag SNPs and 439,749 non-archaic tag
SNPs.
Phenotype Data
Baseline phenotype data were available for different subsets of individuals (Table S1).
Of these phenotypes, we used the 136 (including diet, cognitive
functions, physical measurements, and self-reported medical conditions)
for which data were available for at least 80,000 individuals (Table S1).
We excluded phenotypes with complex measurements (e.g.,
electrocardiography). Phenotypes were represented either in categorical
form (72 phenotypes) or as continuous variables (64 phenotypes) (Table S1).
Correlation of Genotype and Phenotype Data
Linear
or logistic regression is typically used in association testing to
account for potentially confounding covariates such as sex, age, and
ancestry; however, applying this standard approach to the UK Biobank is
challenging because some of the phenotypes are represented in
categorical form for two or more categories, whereas other phenotypes
are continuous. Linear regression or generalized linear models are
widely used for continuous variables and require knowledge of the
distribution of data to be modeled. This distribution is likely to
differ between phenotypes, and its assessment is not always trivial.
Logistic regression is typically applied to binary phenotypes, such as
disease phenotypes. However, many of the categorical phenotypes in the
UK Biobank have more than two categories and therefore cannot be
transformed into binary data. Another option is to use a multinomial
logistic regression, which would require testing each of the categories
independently and would vastly increase the complexity of the analysis.
We therefore used the chi-square test (for categorical data) and
Spearman’s correlation (for continuous data) because these statistics
make fewer assumptions and are directly applicable to the two classes of
phenotypes (categorical and continuous) in the UK Biobank.
We excluded
categorical data categories for which fewer than 1,000 individuals were
available. However, neither test accounts for covariates such as
ancestry, age, and sex. There is a strong correlation between ancestry
and the presence of Neanderthal alleles. We therefore carefully selected
individuals with very little variation in ancestry. There is no a priori
reason to assume any correlation between Neanderthal ancestry and
factors such as age and sex (and no previous study has shown such a
correlation).
We explicitly tested the impact of these factors on our
results by (1) comparing results of linear models with and without
covariates and (2) showing that these results were consistent with the
results we obtained with a chi-square test (Table S2).
To do so, we selected all 21 binary phenotypes and computed an
association with all aSNPs by using (1) a chi-square test, (2) a
logistic regression without any other covariates, (3) a logistic
regression with age and sex as covariates, and (4) a logistic regression
with age and sex as covariates and all interactions between age, sex,
and genotype.
We found that the
correlation between association p values with archaic alleles was
between rho = 0.99999 and rho = 1 (Spearman’s correlation) for the
comparisons of (2) and (3) and of (2) and (4), suggesting that including
age and sex has only a marginal impact on the estimation of the
association p value.
To estimate the
similarity between the results of a logistic regression without
covariates (2) and those of a chi-square test (1), we also correlated
association p values between the binary phenotypes and archaic alleles
for (1) and (2) and found that they ranged between rho = 0.65 and rho =
0.67 (Spearman’s correlation; all p ≪ 1.0 × 10−16), suggesting that both tests show highly similar results.
Additionally,
we correlated genotypes for all aSNPs used in this study with age and
sex and found that there was no significant correlation between these
two factors and the aSNP genotypes (false-discovery rate [FDR] <
0.05, min FDRsex = 0.33, min FDRage = 0.28).
These
results suggest that age and sex have very little impact on our
calculation of the phenotype association for binary phenotypes, and we
infer that non-binary phenotypes are also not likely to be affected by
these factors. Applying more sophisticated methods to the analysis of
specific phenotypes could increase power to detect additional
associations.
For both tests, we considered associations that reached p < 1.0 × 10−8
as significant. This addresses the multiple-testing problem encountered
when the associations between 136 phenotypes and approximately 6,000
aSNPs are evaluated (family-wise error rate = 1.0 × 10−8 × 6,000 × 136 = 0.01).
Phenotypic Impact of Archaic and Non-archaic Alleles
For
all tag aSNPs, we computed an association p value between genotype and
phenotype for each phenotype. We then clustered tag aSNPs into archaic
allele-frequency bins of size 1% and selected frequency-matched
non-archaic tag SNPs by matching the number of non-archaic alleles from
each frequency bin to the number of archaic alleles. For each phenotype,
we created 1,000 random frequency-matched non-archaic sets and computed
for each tag SNP an association p value for the phenotype.
To
determine whether the archaic p value distributions were shifted to
lower or higher significant p values than the non-archaic distributions,
we determined the distances between the sets of archaic and non-archaic
distributions. More specifically, for each phenotype, we computed
empirical p values for the component aSNPs with associations p <
1.0 × 10−4 and compared their cumulative density distribution with the 1,000 non-archaic cumulative density p value distributions (Table S3).
We selected the aSNP at which the distance between the archaic
distribution and the non-archaic distribution was largest. We corrected
all p values for each phenotype for multiple testing by using the
Benjamini-Hochberg approach.
Candidate-Gene Analysis and Molecular Mechanism
Given
that archaic alleles are typically present on longer haplotypes that we
cannot determine directly from the UK Biobank array data, we used the
1000 Genomes
(phase 3) individuals to identify aSNPs that were not directly
genotyped in the UK Biobank. We computed LD between these by using PLINK
(see Annotating Non-archaic and Archaic-like SNPs) and combined sets of aSNPs with r2
> 0.8 between all pairs into a haplotype. We defined the borders of
the inferred archaic-like haplotype to be the most distant two aSNPs (Table 1).
Table 1Archaic Alleles with Genome-wide-Significant Phenotype Associations
Phenotype | Meta-phenotype | Tag aSNP | Association p Value | Neanderthal Allele Frequency | Data Type | Archaic Haplotype (hg19) | Overlapping Gene(s) | Missense Mutations | Associated eQTLs | FDR ILS Test |
---|---|---|---|---|---|---|---|---|---|---|
Hair color (natural before graying) | sun exposure | chr16: 89,947,203 (rs62052168) | 3.7 × 10−202 | 0.097 | categorical | chr16: 89,813,988–90,008,296 | SPIRE2, TCF25, MC1R, TUBB3, FANCA | – | FANCA: muscle (skeletal), lung, pancreas, esophagus (muscularis), adipose (subcutaneous), nerve (tibial), artery (tibial), whole blood | 1.84 × 10−9 |
SPIRE2: muscle (skeletal), heart (atrial appendage), adipose (visceral; omentum); skin (not sun exposed; suprapubic), minor salivary gland, esophagus (muscularis), esophagus (mucosa), esophagus (gastresophageal junction), testis, skin (sun exposed; lower leg), adipose (subcutaneous), nerve (tibial), artery (tibial), heart (left ventricle), cells (transformed fibroblasts), artery (aorta), pituitary | ||||||||||
TCF25: uterus, brain (putamen; basal ganglia) | ||||||||||
TUBB3: vagina, esophagus (mucosa) | ||||||||||
MC1R: breast (mammary tissue) | ||||||||||
DBNDD1: breast (mammary tissue), skin (not sun exposed; suprapubic), skin (sun exposed; lower leg), whole blood | ||||||||||
GAS8-AS1 (MIM: 605179): testis | ||||||||||
DEF8: skin (sun exposed; lower leg) | ||||||||||
GAS8 (MIM: 605178): brain (spinal cord; cervical c-1) | ||||||||||
Skin color | sun exposure | chr6: 45,553,288 (rs115127056) | 4.21 × 10−30 | 0.075 | categorical | chr6: 45,533,261–45,680,205 | RUNX2 | – | RUNX2: brain (cerebellum), brain (hippocampus), brain (cerebellar hemisphere) | <2 .2="" 10="" nbsp="" sup="">−222> |
This table shows archaic alleles with genome-wide-significant associations (column 4, p < 1.0 × 10−8)
and their corresponding phenotype (column 1) and meta-phenotype (column
2). Only archaic alleles on confidently inferred archaic introgressed
haplotypes are included. The archaic allele frequency in the UK Biobank
cohort is given in column 5. Gene identifiers for overlapping or nearest
genes (marked with an asterisk) are in column 8. Abbreviations are as
follows: eQTL, expression quantitative trait locus; FDR, false-discovery
rate; and ILS, incomplete lineage sorting.
We then assigned all 13 candidate tag aSNPs with an association p value < 1.0 × 10−8 (Table 1) to archaic haplotypes inferred from 1000 Genomes.
To
determine the targets of these significantly associated aSNPs, we
identified overlapping protein-coding genes (Ensembl version GRCh37) or
assigned the haplotype to the nearest gene if there was no direct
overlap. For each archaic-like haplotype, we identified protein sequence
and regulatory variants among the aSNPs in each haplotype and computed
the predicted effect of the amino acid changes by using the VEP.
Two of the haplotypes with significantly associated aSNPs carried an archaic missense allele (Table 1).
To determine whether significantly associated aSNPs might modify gene
regulation, we used a previously published set of associations between
archaic haplotypes and differential expression in 48 human tissues from
the Genotype-Tissue Expression (GTEx) dataset.
Of the haplotypes with significantly associated aSNPs, eight were also
associated with the expression change of a nearby gene (within 50 kb) in
at least one tissue (Table 1).
Testing whether Inferred Archaic Haplotypes Exceed the Length Expected by Incomplete Lineage Sorting
We
tested whether the lengths of archaic haplotypes exceeded the length of
segments resulting from incomplete lineage sorting (ILS) by using a
conservative age of the Altai Neanderthal according to a mutation rate
of 1.0 × 10−9 per base pair per year and applying the approach presented by Huerta-Sánchez et al.
and the average recombination rates
at the inferred haplotype. We corrected the p values obtained from that
approach for multiple testing by using the Benjamini-Hochberg method
and added them to Table 1.
Haplotype Trees for Candidate Loci
For
each of the 13 inferred archaic haplotypes with significant phenotype
associations, we extracted the genomic sequences of all 1000 Genomes
phase 3 individuals, as well as the genome sequences of the Altai
Neanderthal, Denisovan, and chimpanzee (pantro4) (Table 1).
We removed non-variable sites and sites where either of the archaic
individuals was polymorphic. We then clustered the haplotypes of the
combined set of modern and ancient humans together with the chimpanzee
into core haplotypes by combining haplotypes that differed by fewer than
∼1/1,000 bases. Rooted neighbor-joining trees based on the consensus
sequences of the resulting core haplotypes and with chimpanzee as an
outgroup were computed and are displayed in Figure S1.
Results
We
analyzed 136 baseline phenotypes in 112,338 individuals of British
ancestry from the UK Biobank pilot study. A total of 822,111 SNPs
directly genotyped in this cohort were classified as either “archaic” or
“non-archaic” on the basis of their inclusion in a previously published
map of Neanderthal ancestry
and their similarity to the Altai Neanderthal genome
(Material and Methods).
We note that LD between Neanderthal introgressed alleles tends to be
higher than LD between non-introgressed alleles because of the timing of
Neanderthal introgression. To ensure that the phenotype associations
with archaic and non-archaic haplotypes were unbiased, we selected
a random tag SNP for each set of SNPs in high LD (r2 > 0.8)
and labeled these as “archaic” if the LD set contained at least one
ancient SNP and as “non-archaic” otherwise. To ensure sufficient power
to detect the phenotypic contribution of each allele, we filtered all
tag SNPs for a minimum minor allele frequency (Material and Methods),
resulting in a final set of 6,210 archaic tag SNPs and 439,749
non-archaic tag SNPs. We then retained only variants on archaic
haplotypes that exceeded the length expected by ILS (Material and Methods).
Phenotypes in the UK Biobank are represented either as categorical (72 phenotypes) or continuous (64 phenotypes) data (Table S1).
Linear or logistic regression is typically used in association testing
to account for potentially confounding covariates such as sex, age, and
ancestry. To avoid testing each of the categories independently, which
vastly increases the complexity of the analysis, we applied two
different tests: for continuous data, we applied Spearman’s correlation
to test for an association between each tag SNP and the phenotypic
measurement, whereas for categorical data, we used a chi-square test to
test for associations between tag SNPs and phenotypes (Material and Methods) and considered only associations that reached p < 1.0 × 10−8
as significant. By comparing our results to those of linear models for
subsets of the data, we found that covariates such as age and sex had
very little impact on our calculations of phenotype association (Material and Methods and Table S2).
For 11 phenotypes, a total of 15 associations reached genome-wide significance (p < 1.0 × 10−8; Tables 1 and S4).
Among these 15 associations were Neanderthal alleles that increase both
sitting height and height attained at age 10 years, alleles that reduce
measures of leg impedance (suggesting reduced body fat composition),
and alleles that increase resting pulse rate (Table 1).
Strikingly, more than half of the significantly associated alleles that
we identified are related to skin and hair traits, consistent with
previous evidence that genes associated with skin and hair biology are
over-represented in introgressed archaic regions.
,
,
It was previously only possible to speculate about the precise effect
of the introgressed alleles on skin and hair phenotypes on the basis of
the genes that were in or near the introgressed haplotypes. We can now
directly determine the effect of Neanderthal alleles on these traits in
modern humans by correlating Neanderthal ancestry with phenotypes of
individuals in the UK Biobank cohort.
The
strongest association we found in this study was an archaic allele
under-represented among red-haired individuals. This archaic allele is
on an introgressed haplotype composed of 71 aSNPs and encompassing five
genes: FANCA (MIM: 607139), SPIRE2 (MIM: 609217), TCF25 (MIM: 612326), MC1R (MIM: 155555), and TUBB3 (MIM: 602661) (rs62052168, p = 3.7 × 10−202; Figure 1 and Table 1). MC1R
is a key genetic determinant of pigmentation and hair color and is
therefore a good candidate for this association. More than 20 variants
in MC1R have been shown to alter hair color in humans.
,
,
,
,
,
,
,
None of the variants resulting in red hair in modern humans are present
in either of the two high-coverage Neanderthal genomes that have been
sequenced (Table S5).
Therefore, Neanderthals appear not to carry any of the variants
associated with red hair in modern humans. Further, a
Neanderthal-specific variant (p.Arg307Gly) postulated to reduce the
activity of MC1R and result in red hair was identified by PCR
amplification of MC1R in two Neanderthals.
However, this putative Neanderthal-specific variant is also not present
in the Neanderthals genomes that have been sequenced to date,
suggesting that if this variant was present in Neanderthals, it was
rare. Using the high-coverage Neanderthal genomes, we identified only
one additional Neanderthal-specific MC1R amino acid change for which the
effect on hair color is unknown. However, it is polymorphic among
Neanderthals, indicating that any phenotype that it confers was variable
in Neanderthals (Table S5).
Finally, because the introgressed haplotype we identified in this
cohort is under-represented among red-haired individuals, we conclude
that if variants contributing to red hair were present in Neanderthals,
they were probably not at high frequency.
We also identified strongly associated archaic alleles on two unlinked introgressed haplotypes near BNC2 (MIM: 608669), a gene that has been previously associated with skin pigmentation in Europeans.
The first archaic haplotype (chr9: 16,720,122–16,804,167) is tagged by
an archaic allele (rs10962612) that has a frequency of more than 66% in
European populations (Table S6 and Figure 1) and is associated with increased incidence of childhood sunburn (p = 1.5 × 10−9) and poor tanning (p = 1.6 × 10−22) in the UK Biobank cohort (Table 1). A Neanderthal haplotype in this region was previously identified by Vernot and Akey,
and the association with sun sensitivity is consistent with the
previous finding that Neanderthal alleles on this haplotype result in an
increased risk of keratosis.
All of the Neanderthal-like SNPs overlapping BNC2 on this haplotype have significant scores in a test for recent positive selection in Europeans
(singleton density score > 3), perhaps indicating their importance in recent local adaptation.
Interestingly, a second, less-frequent (19%) archaic haplotype near BNC2 (chr9: 16,891,561–16,915,874; rs62543578; Table S6)
shows strong associations with darker skin pigmentation in individuals
with British ancestry in the UK Biobank cohort (p = 1.6 × 10−14; Figure 1 and Table 1). These results suggest that multiple alleles in and near BNC2,
some of which are contributed by Neanderthals, have different effects
on pigmentation in modern humans. Our analysis identified six additional
associations (p < 1.0 × 10−8) contributing to variation in skin and hair biology at other introgressed loci (Table 1). Individuals with blonde hair show a higher frequency of the Neanderthal haplotype at chr6: 503,851–544,833 (overlapping EXOC2 [MIM: 615329]), whereas individuals with darker hair color show higher Neanderthal ancestry at chr14: 92,767,097–92,801,297 (overlapping SLC24A4 [MIM: 609840]). Two further archaic haplotypes on chromosomes 6 (chr6: 45,533,261–45,680,205, overlapping RUNX2 [MIM: 600211]) and 11 (chr11: 89,996,325–90,041,511; nearest gene: CHORDC1 [MIM: 604353]) are both significantly associated with lighter skin color (Table 1).
The apparent variation in the phenotypic effects of Neanderthal alleles
in this cohort demonstrates that it is difficult to confidently predict
Neanderthal skin and hair color.
Additionally,
it is not clear that phenotypic inference from single variants for
which a function is known on the modern human genetic background
provides sufficient evidence for extrapolating effects in Neanderthals,
especially given the challenges with predicting complex phenotypes in
present-day humans on the basis of genomic data.
In
addition to the introgressed haplotypes contributing to skin and hair
traits, we also found two archaic haplotypes that contribute
significantly to differences in sleep patterns (Table 1). One of the introgressed SNPs modifies the coding sequence of ASB1 (MIM: 605758; rs3191996, p.Ser37Lys; Material and Methods). Archaic alleles near ASB1 (tag aSNP: rs75804782; Figure 2 and Table 1) and EXOC6 (MIM: 609672; tag aSNP rs71550011; Table 1)
are associated with a preference for being an “evening person” and an
increased tendency for daytime napping and narcolepsy, respectively.
Humans show wide variation in diurnal preferences and can be divided
into “chronotypes,” which have been shown to have a genetic component.
Two previous studies of chronotypes identified strongly associated SNPs in the ASB1 region.
,
Of the 540 SNPs with significant genome-wide associations in Hu et al.
(p < 1.0 × 10−8), ten overlapped the region identified near ASB1, and four of these were labeled as introgressed archaic variants. Lane et al. identified two ASB1-adjacent SNPs that showed significant associations with chronotype.
Neither of these are of archaic origin, but they are in high LD with aSNPs on the associated haplotype (maximum r2 =
0.73, based on Europeans in 1000 Genomes phase 3), suggesting that
these are not independent signals. Given the association scores
calculated by Hu et al.,
the association is stronger for the set of aSNPs (p values ranging from 3.4 × 10−6 to 2.6 × 10−9; rs75804782 has the second-most-significant association at p = 4.4 × 10−9) than for the non-archaic SNPs reported by Lane et al.
(rs3769118, p = 1. 9 × 10−6; rs11895698, p = 3.2 × 10−6),
suggesting that the association is likely to be driven by the
introgressed archaic haplotype. Because the natural length of day-night
cycles differs according to latitude and influences circadian rhythms,
we tested for a correlation between the Neanderthal allele frequency at ASB1 and latitude in worldwide non-African populations.
We found a significant correlation between the frequency of the Neanderthal allele near ASB1
(rs75804782) and latitude (Spearman’s rho = 0.21, p = 0.03). The fact
that populations further from the equator have higher frequencies of the
Neanderthal allele at ASB1 than populations nearer the equator (Figure 2B) is consistent with the influence of daylight exposure on circadian rhythm,
although the functional link between these genes and chronotype traits is unclear.
Given
the large number of associations with skin and hair traits, it is
tempting to speculate that Neanderthals might have had an outsized
contribution to these phenotypes. However, the number of significant
associations that can be identified for a trait is dependent on how
polygenic the traits are and how they are measured. Power to measure the
contribution of an allele depends also on the minor allele frequency.
In the case of archaic alleles, which are generally less frequent
(∼1%–5%), this is of particular relevance. We therefore tested whether
the impact of archaic alleles on particular traits is more or less than
that of non-archaic alleles by comparing the contributions of archaic
alleles with the contributions of 1,000 similarly sized sets of
frequency-matched non-archaic tag SNPs. Phenotypes with an enrichment of
low association p values for archaic alleles could indicate a
larger-than-expected contribution of introgressed archaic DNA to these
phenotypes, whereas an enrichment of low p values for non-archaic
alleles suggests a lower contribution from archaic alleles to the
phenotype. We note that our frequency matching of archaic and
non-archaic alleles does not account for multiple other factors that
might differ between these two sets of variants. For example, the longer
haplotypes associated with archaic introgression mean that archaic
variants might be more likely to occur together. However, it is unclear
whether the higher number of archaic alleles on archaic haplotypes would
increase or decrease the chance of being significantly associated with
phenotypes in modern humans. We believe that further matching of, for
example, haplotype length or number of SNPs of a haplotype introduces
new potential biases and does not solve this problem. For each
phenotype, we selected the lower tail of the p value distributions (p
< 1.0 × 10−4) for archaic and non-archaic SNPs and then
tested whether the archaic p value distribution was significantly
different from 1,000 non-archaic distributions (Material and Methods).
For the majority of phenotypes (130/136), we found no difference
between the relative contribution of archaic alleles and that of
non-archaic alleles, indicating that for most phenotypes measured here,
Neanderthal alleles contribute phenotypic variation proportionally to
non-archaic SNPs at similar frequencies (Table S3).
We detected six phenotypes where there was a significant difference
between the p values distributions for archaic alleles and those for
non-archaic alleles (FDR < 0.05). Neanderthal alleles contributed
more variation in four behavioral phenotypes influencing sleep, mood,
and smoking behaviors, suggesting that Neanderthal alleles contribute
more to these traits than expected from their frequency in modern
humans. Conversely, for two associations (ease of skin tanning and pork
intake), non-archaic alleles showed lower association p values (Table S3), indicating that introgressed Neanderthal alleles contribute less than frequency-matched non-archaic alleles to these traits.
Discussion
Largely
on the basis of disease cohorts and signatures of positive selection, a
number of immune, skin, metabolic, and behavioral phenotypes have been
suggested to be influenced by archaic ancestry. Using the UK Biobank
cohort, we have now been able to test the contribution of introgressed
Neanderthal alleles to 136 common, largely non-disease phenotypes in
present-day Europeans. We found that skin and hair traits are
over-represented among the most significant associations with archaic
alleles. However, when we compared the contribution of alleles of
Neanderthal origin with the contributions of alleles of modern human
origin, we found that both archaic and non-archaic variants contribute
equally to skin and hair phenotypes, consistent with a neutral
contribution from Neanderthals and with the idea that Neanderthals
themselves were likely to be variable with respect to these traits. In
fact, for most associations, Neanderthal variants do not seem to
contribute more than non-archaic variants. However, there are four
phenotypes, all behavioral, to which Neanderthal alleles contribute more
phenotypic variation than non-archaic alleles: chronotype, loneliness
or isolation, frequency of unenthusiasm or disinterest in the last
2 weeks, and smoking status. Of these, the significant association
between a Neanderthal variant in ASB1 and preference for
evening activity also shows a correlation between the Neanderthal allele
frequency and latitude, suggesting a link to differences in sunlight
exposure for this phenotype. Additionally, the phenotype of increased
frequencies of unenthusiasm or disinterest in the last 2 weeks was
significantly associated with an archaic haplotype (chr5:
29,936,068–29,974,930; nearest gene: CDH6 [MIM: 603007]),
and Neanderthal alleles also contributed more often to this phenotype
than non-archaic alleles. A number of the associations we detected, such
as dermatological traits, smoking, and mood disorders, overlap
associations found in previous studies.
,
,
Some of the psychiatric and metabolic phenotypes, such as obesity, identified in Simonti et al.
were not replicated in our study. We speculate that this might
partially reflect differences in the criteria for cohort selection;
individuals in the eMERGE cohort are already undergoing medical
treatment, whereas volunteers for the UK Biobank cohort are not.
Multiple
phenotypes significantly influenced by Neanderthal introgression have
some link to sunlight exposure. Given that Neanderthals had inhabited
Eurasia for more than 200,000 years, they were most likely adapted to
lower UVB levels and wider variation in sunlight duration than the early
modern humans who arrived in Eurasia from Africa around 100,000 years
ago.
Skin and hair color, circadian rhythms, and mood are all influenced by
light exposure. We speculate that their identification in our analysis
suggests that sun exposure might have shaped Neanderthal phenotypes and
that gene flow into modern humans continues to contribute to variation
in these traits today.
Acknowledgments
This
research was conducted with the UK Biobank Resource. We thank Aida
Andres, Hernan Burbano, Roger Mundry, Svante Pääbo, Martin Petr, Kay
Prüfer, David Reich, Sriram Sankararaman, Joshua Schmidt, and Benjamin
Vernot for useful discussions and the multimedia department of the Max
Planck Institute for Evolutionary Anthropology for help with figure
preparation. Financial support for this study was provided by the Max
Planck Society.
Supplemental Data
-
Document S1. Figure S1 and Tables S1, S3, S5, and S6
-
Table S2. Testing the Effect of Covariates on Association p Values
-
Table S4. Archaic SNP Associations for All Tested Phenotypes
Web Resources
- 1000 Genomes, http://browser.1000genomes.org/index.html
- dbSNP, https://www.ncbi.nlm.nih.gov/projects/SNP/
- Ensembl Genome Browser, http://www.ensembl.org/index.html
- GTEx Portal, https://www.gtexportal.org/home/
- OMIM, http://www.omim.org/
- UK Biobank, http://www.ukbiobank.ac.uk
- UK Biobank genotyping and quality controls, https://biobank.ctsu.ox.ac.uk/crystal/docs/genotyping_qc.pdf
References
The genetic history of Ice Age Europe.
Nature. 2016; 534: 200-205
The genetic cost of Neanderthal introgression.
Genetics. 2016; 203: 881-891
The strength of selection against Neanderthal introgression.
PLoS Genet. 2016; 12: e1006340
The genomic landscape of Neanderthal ancestry in present-day humans.
Nature. 2014; 507: 354-357
Introgression of Neandertal- and Denisovan-like haplotypes contributes to adaptive variation in human Toll-like receptors.
Am. J. Hum. Genet. 2016; 98: 22-33
Archaic hominin admixture facilitated adaptation to out-of-Africa environments.
Curr. Biol. 2016; 26: 3375-3382
Neandertal origin of genetic variation at the cluster of OAS immunity genes.
Mol. Biol. Evol. 2013; 30: 798-801
Genetic adaptation and Neandertal admixture shaped the immune system of human populations.
Cell. 2016; 167: 643-656.e17
Evidence for archaic adaptive introgression in humans.
Nat. Rev. Genet. 2015; 16: 359-371
Adaptively introgressed Neandertal haplotype at the OAS locus functionally impacts innate immune responses in humans.
Genome Biol. 2016; 17: 246
Resurrecting surviving Neandertal lineages from modern human genomes.
Science. 2014; 343: 1017-1021
The phenotypic legacy of admixture between modern humans and Neandertals.
Science. 2016; 351: 737-741
UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.
PLoS Med. 2015; 12: e1001779
A global reference for human genetic variation.
Nature. 2015; 526: 68-74
The complete genome sequence of a Neanderthal from the Altai Mountains.
Nature. 2014; 505: 43-49
PLINK: a tool set for whole-genome association and population-based linkage analyses.
Am. J. Hum. Genet. 2007; 81: 559-575
The Ensembl Variant Effect Predictor.
Genome Biol. 2016; 17: 122
Functional implications of Neandertal introgression in modern humans.
Genome Biol. 2017; 18: 61
Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA.
Nature. 2014; 512: 194-197
A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci.
Nat. Genet. 2013; 45: 907-911
Melanocortin-1 receptor gene variants determine the risk of nonmelanoma skin cancer independently of fair skin and red hair.
Am. J. Hum. Genet. 2001; 68: 884-894
Characterization of melanocyte stimulating hormone receptor variant alleles in twins with red hair.
Hum. Mol. Genet. 1997; 6: 1891-1897
Pleiotropic effects of the melanocortin 1 receptor (MC1R) gene on human pigmentation.
Hum. Mol. Genet. 2000; 9: 2531-2537
Evidence for variable selective pressures at MC1R.
Am. J. Hum. Genet. 2000; 66: 1351-1361
Human pigmentation genetics: the difference is only skin deep.
BioEssays. 1998; 20: 712-721
Human pigmentation genes: identification, structure and consequences of polymorphic variation.
Gene. 2001; 277: 49-62
Variants of the melanocyte-stimulating hormone receptor gene are associated with red hair and fair skin in humans.
Nat. Genet. 1995; 11: 328-330
The Asp84Glu variant of the melanocortin 1 receptor (MC1R) is associated with melanoma.
Hum. Mol. Genet. 1996; 5: 1663-1666
A melanocortin 1 receptor allele suggests varying pigmentation among Neanderthals.
Science. 2007; 318: 1453-1455
Comprehensive
candidate gene study highlights UGT1A and BNC2 as new genes determining
continuous skin color variation in Europeans.
Hum. Genet. 2013; 132: 147-158
Detection of human adaptation during the past 2000 years.
Science. 2016; 354: 760-764
Pitfalls of predicting complex traits from SNPs.
Nat. Rev. Genet. 2013; 14: 507-515
Life between clocks: daily temporal patterns of human chronotypes.
J. Biol. Rhythms. 2003; 18: 80-90
GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person.
Nat. Commun. 2016; 7: 10448
Genome-wide association analysis identifies novel loci for chronotype in 100,420 individuals from the UK Biobank.
Nat. Commun. 2016; 7: 10889
The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.
Nature. 2016; 538: 201-206
Circadian typology: a comprehensive review.
Chronobiol. Int. 2012; 29: 1153-1175
Colloquium paper: human skin pigmentation as an adaptation to UV radiation.
Proc. Natl. Acad. Sci. USA. 2010; 107: 8962-8968
Article Info
Publication History
Published: October 5, 2017
Accepted:
September 5,
2017
Received:
June 8,
2017
IDENTIFICATION
DOI: 10.1016/j.ajhg.2017.09.010Copyright
© 2017 The Authors.
User License
Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) |ScienceDirect
Access this article on ScienceDirectFigures
- Figure 1Archaic Haplotypes Associated with Skin and Hair Phenotypes
- Figure 2Archaic Haplotype Associated with Chronotype
Tables
Related Articles
- Editing the Phenotype: A Revolution for Quantitative GeneticsBirchlerCellOctober 05, 2017
- High-Dimensional Phenotypic Mapping of Human Dendritic Cells Reveals Interindividual Variation and Tissue SpecializationAlcántara-Hernández et al.ImmunityDecember 05, 2017
- Phenotypic Convergence: Distinct Transcription Factors Regulate Common Terminal FeaturesKonstantinides et al.CellJune 14, 2018
- TCR Transgenic Mice Reveal Stepwise, Multi-site Acquisition of the Distinctive Fat-Treg PhenotypeLi et al.CellJune 07, 2018
- Jagged 1 Rescues the Duchenne Muscular Dystrophy PhenotypeVieira et al.CellNovember 12, 2015Open Archive
Nenhum comentário:
Postar um comentário
Observação: somente um membro deste blog pode postar um comentário.