Wednesday 27 October 2021

EPIGENETICS

 In an earlier post “From DNA to chromosomes: connecting the dots”, I explained the terminology used in genetics including gene expression and DNA mutations. I stated that genetic mutations are irreversible which is why they are transmitted from the parents to the offspring from one generation to the next.


Genes play a very important role in our health. Genetic mutations that we have inherited from our parents can predispose humans to certain conditions such as heart disease, diabetes or breast cancer.


It is not only the genetic mutations that we have inherited that play a role in our health. Our behaviours and environment, what we eat and how active we are can also influence our health by affecting our genes. EPIGENETICS is the study of how our behaviours and environment affect the way our genes work. The prefix epi- in epigenetics comes from the Greek language and implies features that are "on top of" or "in addition to changes in genetic sequence.”


Unlike genetic mutations that are irreversible, epigenetic alterations are reversible, the DNA sequence does not change. Genetic mutations change gene expression i.e. instructions on which protein is to be produced and when whereas epigenetic changes act as a switch to turn genes “on” or “off”. 


Even though epigenetics is reversible, it can be inherited. An example of epigenetic inheritance, discovered about 10 years ago in mammals, is parental imprinting. In parental imprinting, certain autosomal genes have seemingly unusual inheritance patterns. For example, the mouse Igf2 gene is expressed in a mouse only if it was inherited from the mouse's father (https://www.ncbi.nlm.nih.gov/books/NBK21276/). It has been suggested that certain foods can turn a gene “on” or “off” implying that they van greatly influence our health and well-being.


Examples of epigenetics are:

  • DNA methylation. A methyl (-CH3) group adds itself to a DNA like a cap and prevents gene expression. In other words, the gene is switched off and cannot produce the protein it is programmed to do. If demethylation occurs, i.e. removal of the methyl group, the gene will be turned on again.
  • Another example is histone modification. Histones are important because very long DNA molecules wrap around histone proteins and give chromosomes a compact structure so that they can fit in the cell’s nucleus; they also regulate gene expression. In this case of epigenetics, DNA wraps around histones in such a way that they cannot be accessed by proteins that read and copy them. They are effectively turned off. Chemical groups can be added or removed from histones and turn them on again..
  • Non-coding RNA: We have seen in my earlier post that DNA produces coding RNA that then leaves the nucleus and is used to make proteins. DNA also produces non-coding RNA that attaches itself to coding RNA along with certain proteins and helps to control gene expression. It can, however, also break down the coding RNA and prevent it from making proteins. It can also instruct proteins to modify histones to switch genes “on” or “off”.

According to the National Center for Biotechnology Information NCBI, lifestyle factors such as diet, obesity, physical activity, tobacco smoking, alcohol consumption, environmental pollutants, psychological stress and working on night shifts have been found to modify epigenetic patterns.  Other scientic articles link epigenetic mechanisms to various diseases including cancers of almost all types, cognitive dysfunction, and respiratory, cardiovascular, reproductive, autoimmune, and neurobehavioral illnesses. Known or suspected drivers behind epigenetic processes include many agents, including heavy metals, pesticides, diesel exhaust, tobacco smoke, polycyclic aromatic hydrocarbons, hormones, radioactivity, viruses, bacteria, and basic nutrients. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1392256/)


Studies of identical twins who have been reared apart show the effects that epigenetics can have on the siblings (https://www.theatlantic.com/science/archive/2018/05/twin-epigenetics/560189/).


Even though genetics has been the major focus of scientists upto now, epigenetics will increasingly dominate scientific interest in the future.


Monday 4 October 2021

Pre-conversion Hindu names of Aldona's gaunkars

Any interested reader can find the pre-conversion Hindu names of Aldona's gaunkars in ANNEX VIII: Gaunkar name-changes of my book The Last Prabhu, second revised edition, that can be ordered from amazon.com in paperback and eBook (Kindle) formats. The paperback is also available from Book Depository. Readers in India can order the paperback version from Book Depository as well as pothi.com

HOW HUMANS LOST THEIR TAILS: A SINGLE MUTATION DID THE TRICK

Have you ever wondered why humanoids lost their tails? Charles Darwin attributed it to evolutionary changes when humanoids no longer needed them for their survival. 

Scientists have now discovered that "Tail-loss evolution in hominoids may have been initiated by the AluY insertion, additional genetic changes may have then acted to stabilize the no-tail phenotype in early hominoids" so that the tails did not return.

The not yet peer reviewed publication by Bio Xia et al. is copied in extenso below.

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The genetic basis of tail-loss evolution in humans and apes

2

Bo Xia1,2*, Weimin Zhang2, Aleksandra Wudzinska2, Emily Huang2, Ran Brosh2, Maayan Pour1, Alexander Miller4, Jeremy S. Dasen4, Matthew T. Maurano2, Sang Y. Kim5, Jef D. Boeke2,3,6*

and Itai Yanai1,38

Institute for Computational Medicine, NYU Langone Health, New York, NY 10016, USA 10 Institute for Systems Genetics, NYU Langone Health, New York, NY 10016, USA

Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, 12 NY 10016, USA

Department of Neuroscience and Physiology, NYU Langone Health, New York, NY 10016, 14 USA

Department of Pathology, NYU Langone Health, New York, NY 10016, USA

16 18

* Correspondence: Bo.Xia@nyulangone.edu; Jef.Boeke@nyulangone.org; 20 Itai.Yanai@nyulangone.org

Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY, 11201, USA

page1image25467840

1

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The loss of the tail is one of the main anatomical evolutionary changes to have occurred along the lineage leading to humans and to the “anthropomorphous apes”1,2. This

morphological reprogramming in the ancestral hominoids has been long considered to
have accommodated a characteristic style of locomotion and contributed to the evolution

of bipedalism in humans3–5. Yet, the precise genetic mechanism that facilitated tail-loss evolution in hominoids remains unknown. Primate genome sequencing projects have

made possible the identification of causal links between genotypic and phenotypic
changes6–8, and enable the search for hominoid-specific genetic elements controlling tail

development9. Here, we present evidence that tail-loss evolution was mediated by the 10 insertion of an individual Alu element into the genome of the hominoid ancestor. We

demonstrate that this Alu element – inserted into an intron of the TBXT gene (also called 12 or Brachyury10–12) – pairs with a neighboring ancestral Alu element encoded in the

reverse genomic orientation and leads to a hominoid-specific alternative splicing event. 14 To study the effect of this splicing event, we generated a mouse model that mimics the

expression of human TBXT products by expressing both full-length and exon-skipped 16 isoforms of the mouse TBXT ortholog. We found that mice with this genotype exhibit the

complete absence of a tail or a shortened tail, supporting the notion that the exon-
18 skipped transcript is sufficient to induce a tail-loss phenotype, albeit with incomplete

penetrance. We further noted that mice homozygous for the exon-skipped isoforms 20 exhibited embryonic spinal cord malformations, resembling a neural tube defect

condition, which affects ~1/1000 human neonates13. We propose that selection for the 22 loss of the tail along the hominoid lineage was associated with an adaptive cost of

potential neural tube defects and that this ancient evolutionary trade-off may thus 24 continue to affect human health today.

2

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The tail appendage varies widely in its morphology and function across vertebrate species4,5For primates in particular, the tail is adapted to a range of environments, with implications for the animal’s style of locomotion14,15. The New World howler monkeys, for example, evolved a

prehensile tail that helps the animal to grasp or hold objects while occupying arboreal habitats16. Hominoids – which include humans and the apes – however, are distinct among the primates in their loss of an external tail (Fig. 1a). The loss of the tail is inferred to have occurred ~25 million

years ago when the hominoid lineage diverged from the ancient Old World monkeys (Fig. 1a), leaving only 3-4 caudal vertebrae to form the coccyx, or tailbone, in modern humans17. It has

long been speculated that tail loss in hominoids has contributed to bipedal locomotion, whose 10 evolutionary occurrence coincided with the loss of tail18–20. Recent progress in developmental

biology has led to the elucidation of the gene regulatory networks that underlie tail
12 development9,21. Specifically, the absence of the tail phenotype in the Mouse Genome

Informatics has so far recorded 31 genes from the study of mutants and naturally occurring
14 variants21,22 (Supp. Table 1). Expression of these genes is enriched in the development of the

primitive streak and posterior body formation, including the core gene regulation network for
16 inducing the mesoderm and definitive endoderm such as Tbxt, Wnt3a, and Msgn1. While these

genes and their relationships have been studied, the exact genetic changes that drove the
18 evolution of tail-loss in hominoids remain unknown, preventing an understanding of how tail loss

affected other human evolutionary events, such as bipedalism.

20

Results

22 A hominoid-specific intronic AluY element in TBXT
We screened through the 31 human genes – and their primate orthologs – involved in tail

24 development, with the goal of identifying a genetic variation associated with the loss of the tail in hominoids (Supp. Table 1). We first examined protein sequence conservation between the

3

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

hominoid genomes and its closest sister lineage, the Old World monkeys (Cercopithecidae). However, we failed to detect candidate variants in hominoid coding sequences that might

provide a genetic mechanism for tail-loss evolution (Supp. File 1). We next queried for
hominoid-specific genomic rearrangements in the non-coding regions of genes related to tail

development. Surprisingly, we found a hominoid-specific Alu element inserted in intron 6 of TBXT (Fig. 1b and Supp. File 1)10,11TBXT codes a highly-conserved transcription factor

critical for mesoderm and definitive endoderm formation during embryonic development12,23–25Heterozygous mutations in the coding regions of the TBXT orthologs in tailed animals, such as

mouse10, Manx cat26, dog27 and zebrafish28, lead to the absence or reduced form of the tail, and 10 homozygous mutants are typically not viable. Moreover, this particular Alu insertion is from the

AluY subfamily, a relatively ‘young’ but not human-specific subfamily shared between the 12 genomes of hominoids and Old World monkeys, the activity of which coincides with the

evolutionary time when early hominoids lost their tails29.

14

The AluY element in TBXT is not inserted in the vicinity of a splice site; rather, it is >500 bp from 16 exon 6 of TBXT, the nearest coding exon. As such, it would not be expected to lead to an

alternative splicing event, as found for other intronic Alu elements affecting splicing30–32.
18 However, we noted the presence of another Alu element (AluSx1) in the reverse orientation in

intron 5 of TBXT, that is conserved in all simians. Together, the AluY and AluSx1 elements form 20 an inverted repeat pair (Fig. 1b). We thus posited that upon transcription, the simian-specific

AluSx1 element pairs with the hominoid-specific AluY element, forming a stem-loop structure in 22 the TBXT pre-mRNA and trapping exon 6 in the loop (Fig. 1c). An inferred RNA secondary

structure model supported an interaction between these two Alu elements33 (Fig. S1). The
24 secondary structure of the transcript may thus conjoin the splice donor and receptor of exons 5

and 7, respectively, and promote the skipping of exon 6, leading to a hominoid-specific and in- 26 frame alternative splicing isoform, TBXT-Δexon6 (Fig. 1c). Indeed, we validated the existence

4

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

of TBXT-Δexon6 transcripts in human and its corresponding absence in mouse, which lacks the Alu elements, using an embryonic stem cells (ESCs) in vitro differentiation system that induces

TBXT expression similar to that present in the primitive streak of the embryo (Fig. S2)34,35.
Considering the high conservation of TBXT exon 6 and its potential transcriptional regulation

function (Fig. S3), we thus hypothesized that in humans and apes, the TBXT-Δexon6 isoform
protein disrupts tail elongation during embryonic development, leading to the reduction or loss of

an external tail (Fig. 1c).

8

Fig. 1 | Evolution of tail loss in hominoids. a, Tail phenotypes across the primate phylogenetic tree. b10 UCSC Genome browser view of the conservation score through multi-species alignment at the TBXT

locus across primate genomes36. The hominoid-specific AluY element is labelled in red. c, Schematic of 12 the hypothesized mechanism of tail-loss evolution in hominoids.

page5image23427920

5

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

AluY insertion in TBXT induces alternative splicing, and requires interaction with AluSx1 To test whether both AluY and AluSx1 are required to induce the hominoid-specific alternatively

spliced isoform of TBXT, we used CRISPR/Cas9 in human ESCs to individually delete the
hominoid-specific AluY element and – in a separate line – its potentially interacting counterpart AluSx1 (Fig. 2a, S4a). Again, we adapted the hESC in vitro differentiation system to mimic the TBXT expression in the embryo (Fig. S2)34. We found that deleting the AluY almost completely

eliminated the generation of the TBXT-Δexon6 isoform transcript (Fig. 2b, middle). Similarly, deleting the interacting partner, AluSx1, sufficed to repress this alternatively spliced isoform

(Fig. 2b, right). These results support the notion that the hominoid-specific AluY insertion
10 induces a novel TBXT-Δexon6 AS isoform, through an interaction with the neighboring AluSx1

element (Fig. 2c, top). 12

Interestingly, we found that wild-type differentiated hESCs also express a minor, previously un- 14 annotated transcript that excludes both exon 6 and exon 7, leading to a frameshift and early

truncation at the protein level (Fig. 2b, left, and S4b). Whereas deleting AluY slightly enhanced 16 the abundance of this TBXT-Δexon6&7 transcript, deleting AluSx1 in intron 5 completely

eliminated this transcript (Fig. 2b). This may be best explained by a secondary interaction of the 18 AluSx1 element with a distal AluSq2 element in intron 7. In this scenario, the secondary

interaction would occur at a lower probability than the AluY-AluSx1 interaction pair (Fig. 2c20 bottom). These results further support an interaction among intronic transposable elements

affecting splicing of the conserved TBXT regulator (Fig. 2c, S4b).

6

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. 2 | Both AluY and AluSx1 are required for TBXT alternative splicing. a, CRISPR-generated homozygous knock-outs of the AluY element in intron 6 and (in a separate line) AluSx1 element in intron

5 of TBXTb, RT-PCR results of TBXT transcripts isolated from differentiated hESCs of wild-type, ΔAluY, and ΔAluSx1 genotypes. Each genotype (ΔAluY and ΔAluSx1) was analyzed by two independent

replicate clones. c, A schematic of Alu interactions and the corresponding TBXT transcripts in human, indicating that an AluY-AluSx1 interaction leads to the TBXT-Δexon6 transcript. The TBXT-Δexon6&7

transcript may stem from an AluSx1-AluSq2 interaction. 7

page7image23427712

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Tbxt-Δexon6 is sufficient to induce tail loss in mice
To test whether the TBXT-Δexon6 isoform is sufficient to induce tail loss, we generated a

heterozygous mouse TbxtΔexon6/+ model (Fig. 3a). TBXT is highly conserved in vertebrates and human and mouse protein sequences share 91% identity with a similar exon/intron

architecture11. We could thus simulate a TBXT-Δexon6 isoform by deleting exon 6 in mice,
forcing splicing of exons 5 with exon 7. The TbxtΔexon6/+ heterozygous mouse thus mimics the TBXT gene in human, which expresses both full-length and Δexon6 splice isoforms (Fig. 2b

and 3b-c).

10 Studying the phenotypes of the TbxtΔexon6/+ mice, we found that simultaneous expression of both isoforms led to strong but heterogeneous tail morphologies, including no-tail and short-tail

12 phenotypes (Fig. 3d-e, S5). Specifically, 21 of the 63 heterozygous mice showed tail phenotypes, while none of their 35 wild-type littermates showed phenotypes (Table 1). The

14 incomplete penetration of phenotypes among the heterozygotes was stable across generations and founder lines: no-/short-tailed (TbxtΔexon6/+) parent can give birth to long-tailed TbxtΔexon6/+

16 mice, whereas long-tailed (TbxtΔexon6/+) parents can give birth to pups with varied tail phenotypes

18 induce tail loss.

20 To control for the possibility that zygotic CRISPR targeting induced off-targeting DNA changes at the Tbxt locus, we performed Capture-seq covering the Tbxt locus and ~200kb of both

22 upstream and downstream flanking regions37 (Fig. S6). Capture-seq did not detect any off- targeting at the Tbxt locus across three independent founder mice, supporting our conclusion

24 that the observed tail phenotype from the TbxtΔexon6/+ mice derived from the Tbxt-Δexon6 isoform.

26

(Table 1, Fig. S5), providing further evidence that the presence of

TBXT-Δexon6 suffices to

8

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Table 1. Genotype and phenotype analyses of the TbxtΔexon6/+ intercrossed F2 pups.

page9image25587136 page9image25580224 page9image25673728 page9image25673920 page9image25674112 page9image25674304

Genotypes

Tbxt∆exon6/∆exon6 Tbxt∆exon6/+

Tbxt+/+

Total F2 pups

0 63 35

Pups with tail phenotypes

0 21 0

Tail phenotypes
No- Short- Kinked-

tail tail tail 0 0 0 4 9 8 0 0 0

Intercross (type 1)*

0 17(7)** 7(0)

Intercross (type 2)

0 46(14) 28(0)

page9image25674496 page9image25674688 page9image25674880 page9image25675072 page9image25675264 page9image25675456 page9image25675648 page9image25675840 page9image25676032 page9image25676224 page9image25676416 page9image25676608 page9image25676800page9image25676992 page9image25677184 page9image25677376 page9image25677568 page9image25677760 page9image25677952 page9image25678144 page9image25678336 page9image25678528 page9image25678720 page9image25678912 page9image25679104 page9image25679296page9image25679488 page9image25679680 page9image25679872 page9image25680064 page9image25680256 page9image25680448 page9image25680640 page9image25680832 page9image25681024

Note: *: Type 1 intercrossing: at least one of the parent mice is no-/short-tailed. Type 2 intercrossing: both parent mice are long-tailed.

**: Numbers in parentheses indicate the number of pups with tail phenotypes.

6 8

Homozygous removal of Tbxt-Δexon6 is lethal
10 The human TBXT gene expresses a mixture of TBXT-Δexon6 and TBXT-full length transcripts –

induced as we inferred by the AluY insertion and interaction with AluSx1 – while mouse Tbxt
12 only expresses the full length Tbxt. Thus, we next inquired into the mode by which homozygous TBXT-Δexon6 mutation (TbxtΔexon6/Δexon6) affects development. Intercrossing the TbxtΔexon6/+ mice

14 across multiple litters and replicated in different founders, we failed to produce viable homozygotes (Table 1). Dissecting intercrossed embryos at E11.5 showed that homozygotes

16 either arrested development at ~E9 or developed with spinal cord malformations that consequently led to death at birth (Fig. S7). We noted that the TbxtΔexon6/Δexon6 embryos showed

18 malformations of the spinal cord similar to spina bifida in humans. Together, while the TbxtΔexon6/+ mice present incomplete penetrance of the tail phenotypes requires further

20 investigation, these results indicate that the TBXT-Δexon6 isoform, which in human is induced by the intronic AluY-AluSx1 interaction, may indeed be the key driver of tail-loss evolution in

22 hominoids.

9

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. 3 | TBXT-Δexon6 isoform is sufficient to induce tail-loss phenotype. a, CRISPR design for generating a TbxtΔexon6/+ heterozygous mouse model. b, TbxtΔexon6/+ mouse mimics TBXT gene expression products in human. c, Sanger-sequencing of the Tbxt RT-PCR product shows that deleting exon 6 in Tbxt

leads to correct splicing by fusing exon 5 and 7. d, A representative TbxtΔexon6/+ founder mouse (day 1) showing an absence of the tail. Two additional founder mice are shown in Figure S5. e, TbxtΔexon6/+

heterozygous mice display heterogeneous tail phenotypes varied from absolute no-tail to long-tails. sv, sacral vertebrae; cv, caudal vertebrae.

page10image23401552

10

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Discussion

We have presented evidence that tail-loss evolution in hominoids was driven by the intronic insertion of an AluY element. As opposed to disrupting a splice site, we inferred that this

element interacts with a neighboring (simian-shared) AluSx1 element in the neighboring intron, leading to an alternatively spliced isoform which skips an intervening exon (Fig. 1c).

Experimental deletion of AluY or its interacting counterpart eliminates such TBXT alternative splicing in differentiated hESCs in the primitive streak state (Fig. 2).

8

Alternative splicing mediated by Alu-pairing in the TBXT gene demonstrates how an interaction 10 between intronic transposable elements can dramatically modulate gene function to affect a

complex trait. The human genome contains ~1.8 million copies of SINE transposons – including 12 ~1 million Alus – of which more than 60% are intronic38. Systematically searching for such

interactions may lead to the identification of additional functional roles by which these elements 14 impact human development and disease. Interestingly, inverted Alu pairs have been found to

contribute to the biogenesis of exonic circular RNAs (circRNA)39 through ‘backsplicing’. Thus, it 16 is an interesting possibility that the interactions between paired transposable elements may

create functional splice variants and circRNA isoforms from the same genetic locus.

18

We found that expressing the Tbxt-Δexon6 transcript – along with the full-length transcript – in 20 mice was sufficient to induce no-tail phenotypes, though with incomplete penetrance (Fig. 3 and

Table 1). It is possible that a heterogeneity of tail phenotypes also existed in the ancestral
22 hominoids upon the initial AluY insertion. Thus, while tail-loss evolution in hominoids may have

been initiated by the AluY insertion, additional genetic changes may have then acted to stabilize 24 the no-tail phenotype in early hominoids (Fig. S8). Such a set of genetic events would explain

how a change to the AluY in modern hominoids would not result in the re-appearance of the tail.

11

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The specific evolutionary advantage for the loss of the tail is not clear, though it likely involved enhanced locomotion in a non-arboreal lifestyle. We can assume however that the selective

advantage must have been very strong since the loss of the tail may have included an evolutionary trade-off of neural tube defects, as demonstrated by the presence of spinal cord

malformations in the TbxtΔexon6/Δexon6 mutant at E11.5 (Fig. S7). Interestingly, mutations leading to neural tube defect and/or sacral agenesis have been detected in the coding and noncoding

regions of the TBXT gene40–43. We thus speculate that the evolutionary tradeoff involving the loss of the tail – made ~25 million years ago – continues to influence health today. This

10 evolutionary insight into a complex human disease may in the future lead to the design of therapeutic strategies.

12

14 Acknowledgments
We thank Naoya Yamaguchi, Eric Wang, John Shin, Susan Liao, Huiyuan Zhang, and the

16 members of the Yanai and Boeke labs for constructive comments and suggestions. We thank Megan Hogan and Raven Luther for sequencing assistance, and Michael Ceriello and Ahmad

18 Naimi for assistance with the mice work. This work was supported in part by the NHGRI RM1 HG009491 to J.B., and by the NYU Grossman School of Medicine with funding to I.Y. MTM is

20 partially funded by NIH grant R35GM119703. B.X. was partially supported by the NYSTEM pre- doctoral fellowship (C322560GG).

22

24 B.X. conceived the project. B.X., J.D.B. and I.Y designed the experiments with contribution from W.Z. and S.Y.K. B.X. led and conducted most of the experimental and analysis components,

26 with contribution from W.Z., A.W., R.B. and M.P. E.H., R.B. and M.T.M. contributed to the 12

Author contributions

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

capture-seq validation. A.M. and J.S.D. helped with embryo analysis work. J.D.B. and I.Y.
supervised the study. B.X. drafted the manuscript. B.X., J.D.B. and I.Y. edited the manuscript

with contribution from all authors.

4

J.D.B is a Founder and Director of CDI Labs, Inc., a Founder of and consultant to Neochromosome, Inc, a Founder, SAB member of and consultant to ReOpen Diagnostics, LLC

and serves or served on the Scientific Advisory Board of the following: Sangamo Inc., Modern Meadow Inc., Sample6 Inc., Tessera Therapeutics Inc. and the Wyss Institute. The other

10 authors declare no competing interests.

Competing interests

13

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Methods

2

Protein sequences were downloaded from NCBI, and analyzed by the MUSCLE algorithm using MEGA X software with default settings44. Multiple species gene sequence

alignment and analysis were done through the Ensembl Comparative Genomics (release 104) module45, using available species of hominoids (human, chimpanzee, bonobo, gorilla,

orangutan and gibbon) and Old World monkeys (macaque, crab-eating macaque, pig-tailed macaque, olive baboon, drill, black snub-nosed monkey, golden snub-nosed monkey, Ma's night 10 monkey). The identified candidate gene (TBXT) was then visualized through the UCSC genome

browser36 (Figure 1), highlighting the multiple sequence alignment mapping to the human 12 genome (hg38) sequences.

14 RNA secondary structure prediction
RNA secondary structure prediction of the TBXT intron5-exon6-intron6 sequence was

16 performed using the ViennaRNA RNAfold Web Services (http://rna.tbi.univie.ac.at/). The algorithm calculates the folding probability using minimum free energy (MFE) matrix with default

18 parameters. In addition, the calculation included the partition function and base pairing probability matrix.

20

22 Human ESCs (WA01, also called H1, from WiCell Research Institute) were cultured with StemFlex Medium (Gibco, Cat. No. A3349401) in a feeder cell-free condition. Cells were grown

24 on tissue culture-grade plates coated with hESC-qualified Geltrex (Gibco, Cat. No: A1413302). Geltrex was 1:100 diluted in DMEM/F-12 (Gibco, Cat. No. 11320033) supplemented with 1X

Gene/protein sequence analysis

Human ESCs culture and differentiation

14

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GlutaMax (100X, Gibco, Cat. No. 35050061) and 1% Penicillin-Streptomycin (Gibco, Cat. No. 15070063). Before seeding hESCs, the plate was treated with Geltrex working solution in a

tissue culture incubator (37°C and 5% CO2) for at least 1h.
Human ESC maintenance and culturing condition were performed according to the

manufacturer’s protocol of StemFlex Medium. Briefly, StemFlex complete medium was made by combining StemFlex basal medium (450mL) with 50mL of StemFlex supplement (10X), plus 1%

Penicillin-Streptomycin. Each Geltrex-coated well on a 6-well plate was seeded with 200K cells for ~80% confluence in 3-4 days. Human ESCs were cryopreserved in PSC Cryomedium (Gibco, Cat. No. A2644601). The culturing medium was supplemented with 1X RevitaCell

10 (100X, Gibco, Cat. No. A2644501, which is also included in the PSC Cryomedium kit) when cells had gone through stressed condition, such as freezing-and-thawing or nucleofection.

12 RevitaCell supplemented medium was replaced with regular StemFlex complete medium on the second day. hESCs grown under RevitaCell condition might become stretched, but would

14 recover after replacing to the normal StemFlex complete medium.
The human ESC differentiation assay to induce a gene expression pattern of primitive

16 streak was adapted from Xi et al34. On day -1, freshly cultured hESC colonies were dissociated into clumps (2-5 cells) with Versene buffer (with EDTA, Gibco, Cat. No. 15040066). The

18 dissociated cells were seeded on Geltrex-coated 6-well tissue culture plates at 25,000 cells/cm(0.25M per well in the 6-well plates) in StemFlex complete medium. Differentiation to the

20 primitive streak state was initiated on the next day (day 0) by switching StemFlex complete medium to basal differentiation medium. Basal differentiation medium (50mL) was made with

22 48.5mL DMEM/F-12, 1% GlutaMax (500uL), 1% ITS-G (500uL, Gibco Cat. No. 41400045), and 1% penicillin-streptomycin (500 μL), and supplemented with 3μM GSK3 inhibitor CHIR99021

24 (10μL of 15mM stock solution in DMSO. Tocris, Cat. No. 4423). The cells were collected at differentiation day 1 to 3 for downstream experiments, which confirmed the expression

26 fluctuations of mesoderm genes (TBXT and MIXL1) in a 3-day differentiation period (Fig. S3)3415

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Mouse ESC culture and differentiation
Mouse ESCs derived from C57BL/6J strain background were cultured in a feeder cell-

free condition. Cells were grown on tissue culture-grade plates coated with mESC-qualified gelatin. Before plating cells, the plastic tissue culture-treated plates were coated with 0.1%

gelatin (EMD Millipore ES-006-B) at room temperature for at least 30min, followed by switching to mESC medium and warming up the medium at 37°C and 5% CO2 incubator for at least

30min.
The feeder cell-free mESC culturing medium, also called ‘80/20’ medium, comprises

10 80% 2i medium and 20% mESC medium by volume. 2i medium was made from a 1:1 mix of Advanced DMEM/F-12 (Gibco, Cat. No. 12634010) and Neurobasal-A (Gibco, Cat. No.

12 10888022), 1X N-2 supplement (Gibco, Cat. No. 17502048), 1X B-27 supplement (Gibco, Cat. No. 17504044), 1X Glutamax (Gibco, Cat. No. 35050061), 0.1 mM Beta-Mercaptoethanol

14 (Gibco, Cat. No. 31350010), 1000 units/mL LIF (Millipore, Cat. No. ESG1107), 1 μM MEK1/2 inhibitor (Stemgent, Cat. No. PD0325901), and 3 μM GSK3 inhibitor CHIR99021 (Tocris, Cat.

16 No. 4423). mESC medium was made from Knockout DMEM (Gibco, Cat. No. 10829018), containing 15% Fetal Bovine Serum (GeminiBio, Cat. No. 100-106), 0.1 mM Beta-

18 Mercaptoethanol, 1X MEM Non Essential Amino Acids (Gibco, Cat. No. 11140050), 1X Glutamax, 1X Nucleosides (Millipore, Cat. No. ES-008-D) and 1000 units/mL LIF.

20 mESC differentiation for inducing Tbxt gene expression was adapted from Pour et al in a feeder cell-free condition46. Cells were first plated in 80/20 medium for 24 hours on a gelatin-

22 coated 6-well plate, followed by switching to N2/B27 medium without LIF or 2i for another 2-day culturing. The N2/B27 medium (50mL) is made with 18mL Advanced DMEM/F-12, 18mL

24 Neurobasal-A, 9mL Knockout DMEM, 2.5mL Knockout Serum Replacement (Gibco, Cat. No. 10828028), 0.5mL N-2 supplement, 1mL B-27 supplement, 0.5mL Glutamax (100X), 0.5mL

26 Nucleosides (100X), and 0.1 mM Beta-Mercaptoethanol. Then the N2/B27 medium was 16

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

supplemented with 3μM GSK3 inhibitor CHIR99021 for induced differentiation (day 0). The cells were collected at differentiation day 1 to 3 for downstream experiments, which showed consistent results of Tbxt gene expression fluctuations in a 3-day differentiation period.

4

All guide RNAs of the CRISPR experiments were designed using CRISPOR algorithm through its predicted target sites integrated in the UCSC genome browser47. Guide RNAs were

cloned into the pX459V2.0-HypaCas9 plasmid (AddGene plasmid #62988), or its custom derivative by replacing the puromycin resistance gene to blasticidin resistance gene. Guide

10 RNAs in this study were designed in pairs to delete the intervening sequences. The sequence and targeting sites of the guide RNAs were listed below:

CRISPR targeting

page17image25626496 page17image25626688 page17image25626880 page17image25627072

PAM Guide RNA sequence

Locus (hg38/mm10) chr6:166161988-166162010 chr6:166161557-166161579 chr6:166163730-166163752 chr6:166163290-166163312 chr17:8438362-8438384 chr17:8438920-8438942

page17image25627264 page17image25627456 page17image25627648 page17image25627840

Delete AluY TGG
TGG GATAGACCATAAAGATCCCC

AGACTGTGCCCACTCTCGGG

page17image25628032 page17image25628224 page17image25628416 page17image25628608 page17image25628800 page17image25628992page17image25629184

Delete AluSx1 GGG
AGG GAATGGGGGGAGCTTAAACC

exon6

12

Delete Tbxt- TGG

CACAGTAGTTGTCCCGCTAG

page17image25629376 page17image25629568 page17image25629760 page17image25629952 page17image25630144 page17image25630336page17image25630528

ATTTCGGTTCTGCAGACCGG GGG CAAGATGCTGGTTGAACCAG

page17image25630720 page17image25630912 page17image25631104 page17image25631296 page17image25631488 page17image25631680page17image25631872

All oligos (plus Goldern-Gate assembly overhangs) were synthesized from Integrated 14 DNA Technologies (IDT) and ligated into empty pX459V2.0 vector following standard Golden

Gate Assembly protocol using BbsI restriction enzyme (NEB, Cat. No. R3539). The constructed 16 plasmids were purified from 3mL E. coli cultures using ZR Plasmid MiniPrep Purification Kit

(Zymo Research, Cat. No. D4015) for sequence verification. Plasmids delivered into ESCs were 18 purified from 250mL E. coli cultures using PureLink HiPure Plasmid Midiprep Kit (Invitrogen,

Cat. No. K210005). To facilitate DNA delivery to ESCs through nucleofection, the purified
20 plasmids were resolved in Tris-EDTA buffer (pH 7.5) for a concentration of at least 1 μg/μL in a

sterile hood.

17

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

DNA delivery
DNA delivery into human/mouse ESCs for CRISPR/Cas9 targeting were performed

using a Nucleofector 2b Device (Lonza, Cat. No. BioAAB-1001). Human Stem Cell Nucleofector Kit 1 (Cat. No. VPH-5012) and mESC Nucleofector Kit (Lonza, Cat. No. VVPH-1001) were used

for delivering DNA into human and mouse ESCs, respectively. ESCs were double-feeded the day before the nucleofection experiment to maintain a superior condition.

Before performing nucleofection on human ESCs, 6cm tissue culture plates were treated with 0.5μg/cmrLaminin-521 in a 37°C and 5% COincubator for at least 2h. rLaminin-521-

10 treated plates give better viability when seeding hESCs as single cells. Cultured human ESCs were then washed with PBS, and dissociated into single cells using TrypLE Select Enzyme (no

12 phenol red. Gibco, Cat. No. 12563011). One million hES single cells were nucleofected using program A-023 according to the manufacturer’s instruction of the Nucleofector 2b device.

14 Transfected cells were transferred on the rLaminin-521-treated 6cm plates with pre-warmed StemFlex complete medium supplementing with 1X RevitaCell but not Penicillin-Streptomycin.

16 Antibiotic selection was performed 24h after nucleofection with puromycin (Gibco, Cat. No. A1113802).

18 Mouse ESCs were dissociated into single cells using StemPro Accutase (Gibco, Cat. No. A1110501) and five million cells were transfected using program A-023 according to the

20 manufacturer’s instruction. Cells were plated onto gelatin-treated 10cm plates, followed by antibiotic selection 24h after nucleofection with blasticidin (Gibco, Cat. No. A1113903).

22 Together with the pX459V2.0-HypaCas9-gRNA plasmids for nucleofection, a single- strand DNA oligo were co-delivered for micro homology-induced deletion of the targeted sites48.

24 These ssDNA sequences were synthesized from IDT through its Ultramer DNA Oligo service, including phosphorothioate bond modification on the three bases of each end. Detailed

26 sequence information was listed below (“|” indicates a junction site): 18

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

A*C*T*ACACCAGTGATTCTCCAAATACGGTGCCCAGGCCAGAGCCTCAGC Delete TBXT-AluACCACCC|CCCTGGCCTAATTCTCTCTATAACACTGTATATGTCTAGACTTA

ATTTTGTC*T*G*G

Genotyping of CRISPR/Cas9-targeted sites were performed through PCR following standard protocol. The genotyping primers were listed below:

Sequence

page19image25688512 page19image25688704

Delete TBXT- AluSx1

C*A*G*TGCTGCCTGGAGAATTGTTAGTAGTTTGGAAATTGAAGCCACAGTAGTTGT CCCGC|ACCAGGAGAGTGAGCAGTAAAAGGGTCTACCCCCAGCTAGGAAGGCACCT CCCGTC*T*C*T

Delete Tbxt-exon6

T*T*T*ATTCTAGAGCCCATTAACATATCACTCCTGCTCACTTGGTAGAAAGCCACCG|CAGGGGTCCCCAAGGAGGCTTTCATTTCAATATCCATGTGCCTCAGAACATG*C*C* C

page19image25642112 page19image25642304 page19image25642496

Target site Orientation Delete TBXT-AluY: F

page19image25642688 page19image25642880 page19image25643072

CAGCCAGGCTCAAGAATTCC
R GACTTCCTAACCCAATAAGGTCC

page19image25643264 page19image25643456

read through
Delete 
TBXT-AluY: left F

junction
Delete 
TBXT-AluSx1: F

read through
Delete 
TBXT-AluSx1: F

left junction
Delete 
Tbxt-exon6: F

read through
Delete 
Tbxt-exon6: left F

junction
Delete 
Tbxt-exon6: F

right junction

page19image25643648 page19image25643840 page19image25644032

CAGCCAGGCTCAAGAATTCC
R GTGTTCCTAATATTGGAGCATGC

TCCTAGGCTGATTGAACAACCAG R CAAGGCAGGTGAGCTTTCC

TCCTAGGCTGATTGAACAACCAG R TTAAGCTCCCCCCATTC

GCAGTCTGAGTCCTACCTGTG
R TGTCAGTCTGGTTCTACACCTGGAGAGTCTTTGATC

GCAGTCTGAGTCCTACCTGTG R GTACAGGACCTACTTGGAGAGC

GACAGGACTGAGTCTCAAGC
R TGTCAGTCTGGTTCTACACCTGGAGAGTCTTTGATC

page19image25644224 page19image25644416 page19image25644608 page19image25644800 page19image25644992page19image25645184 page19image25645376 page19image25645568 page19image25645760 page19image25645952page19image25646144 page19image25646336 page19image25646528 page19image25646720 page19image25646912page19image25647104 page19image25647296 page19image25647488 page19image25647680 page19image25647872page19image25648064 page19image25648256 page19image25648448 page19image25648640 page19image25648832page19image25649024 page19image25649216 page19image25649408 page19image25649600 page19image25649792

4
Splicing isoforms detection

Total RNAa were collected from the undifferentiated or differentiated cells of both human and mouse ESCs, using standard column-based purification kit (QIAGEN RNeasy Kit, Cat. No.

74004). DNase treatment was applied during the purification to remove any potential DNA
10 contamination. Following extraction, RNA quality was checked through electrophoresis based

on the ribosomal RNA integrity. Reverse transcription was performed with 1μg of high-quality 12 total RNA for each sample, using High-Capacity RNA-to-cDNATM Kit (Applied Biosystems, Cat.

No. 4387406). DNA oligos used for PCR/RT-PCR/RT-qPCR were listed below:

19

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

page20image25650752 page20image25650944 page20image25651136

Target site Orientation Human TBXT: exon3-8 F

Sequence

page20image25651328 page20image25651520 page20image25651712 page20image25651904 page20image25652096

(Fig. 2B) Human TBXT (Fig. S2)

GGTGACTGCTTATCAGAACGAGGAG R TACTGAGGCTGCATTTCCTTCTTAACC F CCTCATAGCCTCATGGACACCTG
R TCTTAACCTGAGACTGCCACTGG

page20image25652288 page20image25652480 page20image25652672 page20image25652864 page20image25653056 page20image25653248page20image25653440 page20image25653632

Human MIXL1 (Fig. F S2)

GGCGTCAGAGTGGGAAATCC R GGCAGGCAGTTCACATCTACC

page20image25653824 page20image25654016 page20image25654208 page20image25654400 page20image25654592

Human ACTB1 (Fig. F S2)

CACCATTGGCAATGAGCGGTTC R AGGTCTTTGCGGATGTCCACGT

page20image25654784 page20image25654976 page20image25655168 page20image25655360 page20image25655552

Human TBXT: exon4-7 F (Fig. S2)

CAGAACGAGGAGATCACAGCTC R GGTACTGACTGGAGCTGGTAGG

page20image25655744 page20image25655936 page20image25656128 page20image25656320 page20image25656512

Mouse Tbxt: exon4-7 F (Fig. S2)

CCAGAATGAGGAGATTACAGCCCT R GGATACTGGCTAGAGCCAGTAGG F CATTGCTGACAGGATGCAGAAGG

R TGCTGGAAGGTGGACAGTGAGG

page20image25656704 page20image25656896 page20image25657088 page20image25687168 page20image25688128

Mouse Actb1 (Fig. S2) 2

page20image25687936 page20image25687744 page20image25687552 page20image25687360 page20image25686976

4 6 8

10 12 14 16

All mouse work was done following NYULH’s animal protocol guidelines. The TbxtΔexon6/+ heterozygous mouse model was generated through zygotic microinjection, using an experimental protocol adapted from Yang et al49. Briefly, Cas9 mRNA (MilliporeSigma, Cat. No. CAS9MRNA), synthetic guide RNAs, and single-stranded DNA oligo were co-injected into the 1- cell stage zygotes following the described procedures49. Synthetic guide RNAs were ordered from Synthego as their custom CRISPRevolution sgRNA EZ Kit, with the same targeting sites as used in the CRISPR deletion experiment of mouse ESCs (AUUUCGGUUCUGCAGACCGG and CAAGAUGCUGGUUGAACCAG). The co-injected single-stranded DNA oligo is the same as above mentioned as well. Processed embryos were then in vitro cultured to the blastomeric stage, followed by embryo transferring to the pseudopregnant foster mothers. Following zygotic microinjection and transferring, founder pups were screened based on their abnormal tail phenotypes. DNA samples were collected through ear punches at day ~21 for genotyping.

Upon confirming the heterozygous genotype (TbxtΔexon6/+), founder mice were backcrossed with wild-type C57B/6J mice for generating heterozygous F1 pups. Due to the

Mouse work

20

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

varied tail phenotypes, intercrossing between F1 heterozygotes were performed in two
categories: type1 intercrossing includes at least one parent being no-/short-tailed, whereas type

2 intercrossing were mated between two long-tailed F1 heterozygotes. Both types of
intercrossing produced heterogeneous tail phenotypes in F2 TbxtΔexon6/+ pups, confirming the

incomplete penetrance of tail phenotypes, and the absence of homozygotes (TbxtΔexon6/Δexon6), as summarized in Table 1. To confirm the embryonic phenotypes in homozygotes, embryos were

dissected at E11.5 gestation stage from the timed pregnant mice through the standard protocol. Adult mice (>12 weeks) were anesthetized for X-ray imaging of vertebra using a Bruker In-Vivo

Xtreme IVIS imaging system.

10

12 Capture-seq genotyping
Capture-seq, or targeted sequencing of the loci of interest, was performed as previously

14 described37. Conceptually, capture-seq uses custom biotinylated probes to pull down the genomic loci of interest from the standard whole-genome sequencing libraries, thus enabling

16 sequencing of the specific genomic loci in a much higher depth while reducing the cost. Genomic DNA were purified from mESCs or ear punches of founder mice using Zymo

18 Quick-DNA Miniprep Plus Kit (Cat. No. D4068) according to manufacturer’s instruction. DNA sequencing libraries compatible for Illumina sequencers were prepared following standard

20 protocol. Briefly, 1μg of gDNA was sheared to 500-900 base pairs in a 96-well microplate using the Covaris LE220 (450 W, 10% Duty Factor, 200 cycles per burst, and 90-s treatment time),

22 followed by purification with a DNA Clean and Concentrate-5 Kit (Zymo Research, Cat. No. D4013). Sheared and purified DNA were then treated with end repair enzyme mix (T4 DNA

24 polymerase, Klenow DNA polymerase, and T4 polynucleotide kinase, NEB, Cat. No. M0203, M0210 and M0201, respectively), and A-tailed using Klenow 3'-5'exo- enzyme (NEB, Cat. No.

21

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

M0212). Illumina sequencing library adapters were subsequently ligated to DNA ends, followed by PCR amplification with KAPA 2X Hi-Fi Hotstart Readymix (Roche, Cat. No. KR0370).

Custom biotinylated probes were prepared as bait through nick translation, using BAC
DNA and/or plasmids as the template. The probes were prepared to comprehensively cover the

whole locus. We used BAC lines RP24-88H3 and RP23-159G7, purchased from BACPAC
Genomics, to generate bait probes covering mouse Tbxt locus and ~200kb flanking sequences

in both upstream and downstream regions. The pooled whole-genome sequencing libraries
were hybridized with the biotinylated baits in solution, and purified through streptavidin-coated

magnetic beads. Following pull-down, DNA sequencing libraries were quantified with Qubit 3.0 10 Fluorometer (Invitorgen, Cat. No. Q33216) using a dsDNA HS Assay Kit (Invitorgen, Cat. No.

Q32851). The sequencing libraries were subsequently sequenced on an Illumina NextSeq 500 12 sequencer in paired-end mode.

Sequencing results were demultiplexed with Illumina bcl2fastq v2.20 requiring a perfect 14 match to indexing BC sequences. Low quality reads/bases and Illumina adapters were trimmed

with Trimmomatic v0.39. Reads were then mapped to mouse genome (mm10) using bwa 16 0.7.17. The coverage and mutations in and around Tbxt locus were checked through

visualization in a mirror version of UCSC genome browser.

18

22

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Supplementary Figures:

2

Fig. S1 | RNA structure prediction using RNAfold algorithm of the ViennaRNA package33a, Predicted RNA secondary structure of the TBXT intron5-exon6-intron6 sequence. The paired

AluY-AluSx1 region is highlighted. b, Mountain plot of the RNA secondary structure prediction, showing the ‘height’ in predicted secondary structure across the nucleotide positions. Height is

computed as the number of base pairs enclosing the base at a given position. Overall, the AluSx1 and AluY regions are predicted to form helices with high probability (low entropy).

page23image23506304

23

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. S2 | Studying TBXT expression isoforms using primitive streak in vitro differentiation. a, Human and mouse ESCs in vitro differentiation for inducing TBXT

expression. Human and mouse ESCs differentiation assay was adapted from Xi et al34 and Pour et al35, respectively. b, Quantitative RT-PCR of TBXT and MIXL1 expression during hESC

differentiation, indicating correct induction of mesodermal gene expression program34c, Quantitative RT-PCR of Tbxt expression during mESC differentiation. d, RT-PCR of TBXT/Tbxt

transcripts in human and mouse, highlighting a unique Δexon6 splicing isoform in human.

page24image23399264

24

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. S3 | TBXT exon 6 conservation and functional domain analysis. a, Protein sequence alignment of the TBXT exon 6 region in representative mammals. With the exception of humans

and chimpanzees, all are tailed. b, The exon 6-derived peptide of TBXT overlaps with large fractions of transcription regulation domains. TA, transcription activation; TR, transcriptional

repression. Functional domain annotation of Brachyury was adapted from Kispert et al12.

page25image23399472

25

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. S4 | Validation of hESC CRISPR-deletion clones and the TBXT expression isoforms. a, PCR validation of the hESC clones with deletions of AluY or AluSx1 in TBXT. PCR validation

for each clone or control samples were performed in pairs, each amplifying the AluSx1 locus (Sx1) or the AluY locus (Y), respectively, with primers that bind the two flanking sequences of

the deleted region. Each genotype included two independent clones of AluY deletion or AluSx1 deletion, corresponding to the two replicates in Figure 2B. b, Sanger sequencing of the TBXT-

Δexon6 and TBXT-Δexon6&7 transcripts detected in Figure 2B. The sequencing results were aligned to the full length TBXT mRNA sequence.

page26image23507344

26

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. S5 | TbxtΔexon6/+ founder mice generated through CRISPR/Cas9 targeting in the zygotes. a, Schematic of zygotic injection of CRISPR/Cas9 reactions. b, Two TbxtΔexon6/+

founder mice (in addition to the one shown in Fig. 3) indicating an absence or reduced form of the tail (Founders 2 & 3). c, Sanger sequencing of the exon 6-deleted allele isolated from the

genomic DNA of TbxtΔexon6/+ founder mice. Founder 1 had an unexpected insertion of 23 base pairs at the CRISPR cutting site in the original intron 5 of Tbxt. Both founder 2 and 3 had the

exact fusion between the two CRISPR cutting sites in introns 5 and 6.

page27image23400096

27

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. S6 | Capture-seq at the Tbxt locus of founder mice did not detect off target mutations. a, Capture-seq of the founder mice using baits generated from bacterial artificial

chromosomes (RP24-88H3 and RP23-159G7). The shallow-covered regions are typically repeat sequences in the mouse genome and are consistent across samples. Control DNAs were

obtained from wild-type or exon6-deleted mESCs through CRISPR targeting. b, A zoom-in view of the Capture-seq results at the Tbxt locus, highlighting the CRISPR-deleted exon 6 region.

8

page28image23503808

28

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

2

Fig. S7 | Analysis of TbxtΔexon6/Δexon6 embryos at the E11.5 stage. TbxtΔexon6/Δexon6 embryos
either develop spinal cord defects (middle) that die at birth or arrest at approximately stage E9

of development (right). Red and black dashed lines mark the embryonic tail regions and limb buds, respectively. Green arrowheads in the middle panel indicate malformed spinal cord

regions.

8

page29image23508384

29

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Fig. S8 | A model for tail-loss evolution in the early hominoids. The AluY insertion in TBXT marked an early genetic event that initiated tail-loss evolution in the hominoid common

ancestor. Additional genetic changes may have then acted to stabilize the no-tail phenotype in the ancient hominoids.

6

page30image23400928

30

4 6 8

Darwin, C. The descent of man, and selection in relation to sex. (degruyter.com, 2008). doi:10.4324/9780203789537

  1. Hunt, K. D. The evolution of human bipedality: ecology and functional morphology. J. Hum. Evol. 26, 183–202 (1994).

  2. Williams, S. A. & Russo, G. A. Evolution of the hominoid vertebral column: The long and the short of it. Evol Anthropol 24, 15–32 (2015).

  3. Hickman, G. C. The mammalian tail: a review of functions. Mamm. Rev. 9, 143–157 (1979).

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

References

1.
2. 
Human evolution: an introduction to man’s adaptations. (Routledge, 2017).

10 6.

12 7.

14 8.

16 9.

18 10.

20 11. 22

Rogers, J. & Gibbs, R. A. Comparative primate genomics: emerging patterns of genome content and dynamics. Nat. Rev. Genet. 15, 347–359 (2014).

Rhesus Macaque Genome Sequencing and Analysis Consortium et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234 (2007).

Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).

Kimelman, D. Tales of tails (and trunks): forming the posterior body in vertebrate embryos. Curr Top Dev Biol 116, 517–536 (2016).

Herrmann, B. G., Labeit, S., Poustka, A., King, T. R. & Lehrach, H. Cloning of the T gene required in mesoderm formation in the mouse. Nature 343, 617–622 (1990).

Edwards, Y. H. et al. The human homolog T of the mouse T(Brachyury) gene; gene structure, cDNA sequence, and assignment to chromosome 6q27. Genome Res. 6, 226– 233 (1996).

24
26
28
30 
15. 32 16. 34 17.

12. Kispert,A.,Koschorz,B.&Herrmann,B.G.TheTproteinencodedbyBrachyuryisa tissue-specific transcription factor. EMBO J. 14, 4763–4772 (1995).

13. Wilde,J.J.,Petersen,J.R.&Niswander,L.Genetic,epigenetic,andenvironmental contributions to neural tube closure. Annu. Rev. Genet. 48, 583–611 (2014).

14. Sehner,S.,Fichtel,C.&Kappeler,P.M.Primatetails:Ancestralstatereconstructionand determinants of interspecific variation in primate tail length. Am. J. Phys. Anthropol. 167, 750–759 (2018).

Russo, G. A. Postsacral vertebral morphology in relation to tail length among primates and other mammals. Anat Rec (Hoboken) 298, 354–375 (2015).

Lemelin, P. Comparative and functional myology of the prehensile tail in New World monkeys. J Morphol 224, 351–368 (1995).

Narita, Y. & Kuratani, S. Evolution of the vertebral formulae in mammals: a perspective on developmental constraints. J Exp Zool B Mol Dev Evol 304, 91–106 (2005).

31

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

18. 19. 20. 21. 22. 23. 24.

16 25.

18 26.

20 27. 22

Young,N.M.,Wagner,G.P.&Hallgrímsson,B.Developmentandtheevolvabilityof human limbs. Proc. Natl. Acad. Sci. USA 107, 3400–3405 (2010).

Pontzer,H.,Raichlen,D.A.&Rodman,P.S.Bipedalandquadrupedallocomotionin chimpanzees. J. Hum. Evol. 66, 64–82 (2014).

Bauer,H.R.ChimpanzeebipedallocomotionintheGombeNationalPark,EastAfrica. Primates (1977).

Mallo,M.Thevertebratetail:ageneplaygroundforevolution.CellMol.LifeSci.77,1021– 1030 (2020).

Smith,C.L.&Eppig,J.T.Themammalianphenotypeontology:enablingrobustannotation and comparative analysis. Wiley Interdiscip Rev Syst Biol Med 1, 390–399 (2009).

Wilkinson,D.G.,Bhatt,S.&Herrmann,B.G.ExpressionpatternofthemouseTgeneand its role in mesoderm formation. Nature 343, 657–659 (1990).

Yamaguchi,T.P.,Takada,S.,Yoshikawa,Y.,Wu,N.&McMahon,A.P.T(Brachyury)isa direct target of Wnt3a during paraxial mesoderm specification. Genes Dev. 13, 3185–3190 (1999).

Tosic, J. et al. Eomes and Brachyury control pluripotency exit and germ-layer segregation by changing the chromatin state. Nat. Cell Biol. 21, 1518–1531 (2019).

Buckingham, K. J. et al. Multiple mutant T alleles cause haploinsufficiency of Brachyury and short tails in Manx cats. Mamm. Genome 24, 400–408 (2013).

Haworth, K. et al. Canine homolog of the T-box transcription factor T; failure of the protein to bind to its DNA target leads to a short-tail phenotype. Mamm. Genome 12, 212–218 (2001).

Schulte-Merker, S., van Eeden, F. J., Halpern, M. E., Kimmel, C. B. & Nüsslein-Volhard, C. no tail (ntl) is the zebrafish homologue of the mouse T (Brachyury) gene. Development 120, 1009–1015 (1994).

Batzer, M. A. & Deininger, P. L. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3, 370–379 (2002).

Wallace, M. R. et al. A de novo Alu insertion results in neurofibromatosis type 1. Nature 353, 864–866 (1991).

Lev-Maor, G. et al. Intronic Alus influence alternative splicing. PLoS Genet. 4, e1000204 (2008).

Payer, L. M. et al. Alu insertion variants alter mRNA splicing. Nucleic Acids Res. 47, 421– 431 (2019).

Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol Biol 6, 26 (2011).

Xi,H.etal.InVivoHumanSomitogenesisGuidesSomiteDevelopmentfromhPSCs.Cell Rep. 18, 1573–1585 (2017).

2 4 6 8

10

12

14

28. 26 29.

28 30. 30 31. 32 32.

34 33. 34.

36

24

32

2 4 6 8

10

12

14

  1. Pour,M.etal.Emergenceandpatterningdynamicsofmousedefinitiveendoderm.SSRN Journal (2021). doi:10.2139/ssrn.3848112

  2. Kent,W.J.etal.ThehumangenomebrowseratUCSC.GenomeRes.12,996–1006 (2002).

  3. Brosh,R.etal.Aversatileplatformforlocus-scalegenomerewritingandverification.Proc. Natl. Acad. Sci. USA 118, (2021).

  4. InternationalHumanGenomeSequencingConsortiumetal.Initialsequencingandanalysis of the human genome. Nature 409, 860–921 (2001).

  5. Jeck,W.R.etal.CircularRNAsareabundant,conserved,andassociatedwithALU repeats. RNA 19, 141–157 (2013).

  6. Shaheen,R.etal.T(brachyury)islinkedtoaMendelianformofneuraltubedefectsin humans. Hum. Genet. 134, 1139–1141 (2015).

  7. Shields,D.C.etal.Associationbetweenhistoricallyhighfrequenciesofneuraltubedefects and the human T homologue of mouse T (Brachyury). Am. J. Med. Genet. 92, 206–211 (2000).

bioRxiv preprint doi: https://doi.org/10.1101/2021.09.14.460388; this version posted September 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

16 42.

Morrison, K. et al. Genetic mapping of the human homologue (T) of mouse T(Brachyury) and a search for allele association between human T and spina bifida. Hum. Mol. Genet. 5, 669–674 (1996).

18
20
22 
44. 24 45. 26 46. 28 47. 30 48. 32 49. 34

43. Postma,A.V.etal.MutationsintheT(brachyury)genecauseanovelsyndromeconsisting of sacral agenesis, abnormal ossification of the vertebral bodies and a persistent notochordal canal. J. Med. Genet. 51, 90–97 (2014).

Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).

Herrero, J. et al. Ensembl comparative genomics resources. Database (Oxford) 2016, (2016).

Pour, M. et al. Emergence and patterning dynamics of mouse definitive endoderm. SSRN Journal (2021). doi:10.2139/ssrn.3848112

Concordet, J.-P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).

Chen, F. et al. High-frequency genome editing using ssDNA oligonucleotides with zinc- finger nucleases. Nat. Methods 8, 753–755 (2011).

Yang, H., Wang, H. & Jaenisch, R. Generating genetically modified mice using CRISPR/Cas-mediated genome engineering. Nat. Protoc. 9, 1956–1968 (2014).

33