Soon after the recent set of ENCODE papers came out, several scientists raised concerns regarding the estimates about the fraction of the genome that appears to be functional, that the authors put forward: according to them, ~80% of the human genome is functional.
This, of course, greatly differs to what most of us think, considering, among other things, that the fraction of the genome that is evolutionarily conserved through purifying selection appears to be under 10% (what about the rest? We think it divides between junk DNA and some “unknowns”).
The problem mainly arose from the definition of “functional” that ENCODE used, one that is so loose, that may not be useful at all.
In fact, “according to ENCODE, for a DNA segment to be ascribed functionality it needs to (1) be transcribed or (2) associated with a modified histone or (3) located in an open-chromatin area or (4) to bind a transcription factors or (5) to contain a methylated CpG dinucleotide” (Graur et al., 2013). You would agree that this criteria is very lenient, hence, the 80% estimate.
A recent paper, ruthlessly discusses the ENCODE paper and takes great issue with the “80%” estimate. The authors “detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome“. The manuscript reviewers could have suggested the authors to tone it down a little, but from what I found out in the web, evolutionary biologists tend to be very strong about their opinions on paper, when discussing the work of others they disagree with.
I encourage you to read the article, which is freely available. In the meantime, here are a few quotes:
The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be
ENCODE adopted a strong version of the causal role definition of function, according to which a functional element is a discrete genome segment that produces a protein or an RNA or displays a reproducible biochemical signature (for example, protein binding). Oddly, ENCODE not only uses the wrong concept of functionality, it uses it wrongly and inconsistently
We identified three main statistical infractions. ENCODE used methodologies encouraging biased errors in favor of inflating estimates of functionality, it consistently and excessively favored sensitivity over specificity, and it paid unwarranted attention to statistical significance, rather than to the magnitude of the effect.
At this point, we must ask ourselves, what is the aim of ENCODE: Is it to identify every possible functional element at the expense of increasing the number of elements that are falsely identified as functional? Or is it to create a list of functional elements that is as free of false positives as possible
Comparative studies have repeatedly shown that pseudogenes, which have been so defined because they lack coding potential due to the presence of disruptive mutations, evolve very rapidly and are mostly subject to no functional constraint (Pei et al. 2012). Hence, regardless of their transcriptional or translational status, pseudogenes are nonfunctional!
For example, according to ENCODE, the putative function of the H4K20me1 modification is “preference for 5’ end of genes.” This is akin to asserting that the function of the White House is to occupy the lot of land at the 1600 block of Pennsylvania Avenue in Washington, D.C.
So, what have we learned from the efforts of 442 researchers consuming 288 million dollars? According to Eric Lander, a Human Genome Project luminary, ENCODE is the “Google Maps of the human genome” (Durbin et al. 2010). We beg to differ, ENCODE is considerably worse than even Apple Maps.
Evolutionary conservation may be frustratingly silent on the nature of the functions it highlights, but progress in understanding the functional significance of DNA sequences can only be achieved by not
ignoring evolutionary principles
High-throughput genomics and the centralization of science funding have enabled Big Science to generate “high-impact false positives” by the truckload (The PLoS Medicine Editors 2005; Platt et al. 2010; Anonymous 2012; MacArthur 2012; Moyer 2012). Those involved in Big Science will do well to remember the depressingly true popular maxim: “If it is too good to be true, it is too good to be true.”
We conclude that the ENCODE Consortium has, so far, failed to provide a compelling reason to abandon the prevailing understanding among evolutionary biologists according to which most of the human genome is devoid of function
(…) according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 – 10 = 70% of the genome is perfectly invulnerable to
deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means (…)