Abstract
Restriction-like endonuclease (RLE) bearing non-LTR retrotransposons are site-specific elements that integrate into the genome through target primed reverse transcription (TPRT). RLE-bearing elements have been used as a model system for investigating non-LTR retrotransposon integration. R2 elements target a specific site in the 28S rDNA gene. We previously demonstrated that the two major sub-classes of R2 (R2-A and R2-D) target the R2 insertion site in an opposing manner with regard to the pairing of known DNA binding domains and bound sequences-indicating that the A- and D-clades represent independently derived modes of targeting that site. Elements have been discovered that group phylogenetically with R2 but do not target the canonical R2 site. Here we extend our earlier studies to show that a separate R2-A clade element, which targets a site other than the canonical R2 site, does so by using the amino-terminal zinc fingers and Myb motifs. We further extend our targeting studies beyond R2 clade elements by investigating the ability of the amino-terminal zinc fingers from the nematode NeSL-1 element to target its integration site. Our data are consistent with the use of an amino-terminal DNA binding domain as one of the major targeting determinants used by RLE-bearing non-LTR retrotransposons to secure a protein subunit near the insertion site. This amino-terminal DNA binding domain can undergo modifications, allowing the element to target novel sites. The binding orientation of the amino-terminal domain relative to the insertion site is quite variable.
Introduction
Non-long-terminal-repeat (non-LTR) retrotransposons are a major class of eukaryotic transposable elements. These elements are vertically inherited and can impact the evolution of their host's genome in many ways.Citation1–Citation5 Non-LTR retrotransposons replicate through an ordered series of DNA cleavage and polymerization events using encoded nucleic acid binding, endonuclease, and polymerase functions.Citation6 The element encoded protein(s), once translated, form a ribonucleoprotein (RNP) particle with the transcript from which they were translated—a process called cis-preference. The RNP binds to the target DNA, cuts one of the DNA strands, and uses the target site's exposed 3′-OH to prime reverse transcription of the element RNA into cDNA (cDNA)—a process called target primed reverse transcription (TPRT).Citation7 The opposing target DNA strand is then cleaved.Citation6 The cDNA is turned into double stranded DNA, completing the integration event in a process that is not yet well understood.Citation6
Non-LTR retrotransposons can be grouped into those elements that harbor a restriction-like endonuclease (RLE) to initiate TPRT and those that harbor an apurinic-apyrimidinic endonuclease (APE) to initiate TPRT (reviewed in refs. Citation8–Citation10).Citation7,Citation11–Citation14 The RLE-bearing elements tend to be site specific (i.e., inserting into a given sequence within a genome) while the APE-bearing elements tend to be nonspecific—although examples of both site-specific and nonspecific targeting can be found within each group.Citation1,Citation11,Citation15,Citation16 Phylogenetic analysis of the reverse transcriptase domain from RLE and APE bearing elements indicates that the RLE-bearing elements are likely the earlier branching group (reviewed in ref. Citation10).
The site specificity of RLE-bearing non-LTR retrotransposons makes them attractive systems with which to study the TPRT integration reaction. In addition, once it is understood how RLE-bearing elements target DNA, it may be possible to engineer these elements for use as site specific gene targeting vehicles. We are conducting a systematic study of the DNA targeting functions of RLE-bearing non-LTR retrotransposons.Citation6,Citation17–Citation19 DNA recognition is the first step in the integration reaction and is therefore the step that must undergo modification when an element targets a novel site. Target site recognition is believed to be achieved primarily through distinct DNA binding motifs located within the coding region of the element, as opposed to any inherent specificity of the RLE. The endonuclease is believed to be largely nonspecific while the DNA binding domains target insertion events to specific sites in the genome.Citation6,Citation12,Citation16,Citation18,Citation20
RLE-bearing non-LTR retrotransposons encode a single multifunctional protein with RNA binding, DNA binding, DNA endonuclease, and reverse transcriptase activities. These retrotransposons have been phylogenetically classified into at least five clades based upon sequence comparisons of the reverse transcriptase domain: R2, R4, Genie, CRE, and NeSL ().Citation12,Citation21–Citation24 All of these clades share a similar basic ORF structure with a central reverse transcriptase domain (RT), a carboxyl-terminal cysteine-histidine motif (cchc), and a carboxyl-terminal restriction-like endonuclease. The major differences between the clades resides within the N-terminal domain. Elements belonging to the Genie, CRE, and NeSL clades typically contain two N-terminal zinc fingers (ZFs), while R2 clade elements contain a Myb motif and a variable number of ZFs. The R4 clade appears to lack both ZFs and Myb motifs. Within each clade N-terminal structural variants exist (e.g., a variable number of ZFs). In addition, the name-sake of the NeSL clade, NeSL-1, contains a cysteine protease domain (PRO) of unknown function.Citation21
R2 is the most well studied of the five clades. The R2 designation is actually a double criteria designation—a phylogenetic grouping in conjunction with an insertion site designation. R2 elements insert into a particular sequence within the 28S rDNA gene ( and ).Citation1,Citation12,Citation25 R2 elements are further subdivided into the R2-A, R2-B, R2-C, and R2-D groups based on reverse transcriptase phylogeny.Citation25 Each R2 group has a different configuration of ZFs and Myb motifs in the N-terminal domain of the encoded R2 protein. The two major subdivisions are the R2-A and the R2-D groups (). The R2-D group, exemplified by the well studied Bombyx mori R2 element (R2Bm), has a single N-terminal ZF and a Myb motif. R2Bm uses two protein subunits to integrate.Citation6 These two subunits take on different conformations and roles in the integration reaction ().Citation6,Citation26 One protein subunit is bound upstream of the insertion site, and the other one is bound downstream of the insertion site. The upstream subunit binds to the DNA using an undetermined DNA binding motif, cleaves the target DNA, and performs TPRT.Citation6 The downstream subunit uses the N-terminal ZF and Myb motifs to bind to the DNA.Citation18 The downstream subunit cleaves the second DNA strand and is hypothesized to perform second strand synthesis. The 3′ untranslated region (UTR) of the R2 RNA is bound to the upstream subunit, and the 5′ UTR is bound to the downstream subunit.Citation26 The R2-A group binds the target site differently than the R2-D group. The R2-A group elements have three N-terminal zinc fingers as well as a Myb motif and are thought to be the more ancestral R2 group. The R2-A group element from Limulus polyphemus (Lp), R2Lp, uses the N-terminal ZFs and a Myb to bind to the upstream DNA sequences as opposed to downstream sequences.Citation17 The different DNA binding modes indicate R2-A and R2-D are independent targeting (or retargeting) events to the canonical R2 site. The distinct DNA binding modes also suggest that the R2-A and R2-D elements use a different linkage configuration between the various nucleic acid binding domains and catalytic activities involved in the integration reaction.Citation17
Recently, elements that phylogenetically group with the R2-A group have been discovered that do not target the canonical R2 site. R9 from Adineta vaga (R9Av) targets a site within the 28S about 1436 bp upstream of the R2 site, and R8 from Hydra magnipapillata (R8Hm) targets a sequence in the 18S rDNA ().Citation27,Citation28 The R2 site is thought to be the more ancestral site as most species that have retained R2 clade elements have done so at the R2 site. Given this interpretation, the R8 and R9 sites are instances of R2 clade elements acquiring novel site specificity.Citation27 In this paper we investigate the role that the N-terminal ZF and Myb motifs have in targeting an R2 clade element to a novel genomic site. We show that the N-terminal domain of R9Av has been modified so as to target the R9 site and not the R2 site. Interestingly, the orientation of the binding of the ZFs and Myb motif differs from the R2-A and the R2-D elements that target the R2 site. We also extended our DNA targeting studies to elements beyond the R2 clade. Of the five clades—R2, R4, Genie, CRE, and NeSL—only R2 has a Myb motif in the N-terminal region (). That Myb motif is a major contributor to the specificity observed in the R2 clade elements, with the ZFs providing fewer DNA contacts as a whole than the Myb motif.Citation17,Citation18 Genie, CRE, and NeSL clades only have ZFs—typically two ZFs—and no Myb motifs. In order to ascertain how Genie, CRE, and NeSL clade elements target DNA, we examined the DNA binding potential of a NeSL clade element. We show that the NeSL-1 element uses its two N-terminal ZFs to target DNA. Along with our previously reported study, our results indicated that the N-terminal ZFs (and Myb motif if present) may represent a universal targeting module for all site specific RLE-bearing non-LTR retrotransposons that contain these motifs. The Myb and ZFs can undergo modification, allowing novel sites to be targeted. During modification, individual ZF and Myb motifs can be acquired or lost. In addition, the physical/temporal linkage configurations between the various nucleic acid binding activities (5′ UTR RNA binding, 3′ UTR RNA binding, upstream DNA binding, and downstream DNA binding) and catalytic activities (first strand cleavage, TPRT, second strand cleavage, and second strand synthesis) may get reconfigured as elements transition to target new sites in the genome.
Results and Discussion
DNA binding activity of an R2 element directed to a noncanonical target site: Mapping the DNA binding activity of R9Av.
R2-D group elements that target the R2 site (e.g., R2Bm) use the N-terminal ZF and Myb motif to secure an R2 protein subunit downstream of the insertion site.Citation6,Citation18 R2-A group elements that target the R2 site (e.g., R2Lp) use their N-terminal ZFs and Myb motif to secure an R2 protein subunit upstream of the insertion site.Citation17 Although both R2-D and R2-A group elements target the same site, they do so differently. To better understand R2-clade mechanistic plasticity and site specificity, R9Av was examined. R9Av is an R2-A group element that does not target the canonical insertion site (). R9Av is also interesting as it generates a 126 bp target site duplication (TSD) upon insertion as opposed to the blunt or small 1–10 bp deletion observed for most R2 insertions.Citation29
Full-length non-LTR retrotransposon proteins are very difficult to express and purify in a soluble and active form. For this reason, N-terminal derived polypeptides containing the ZFs and Myb motifs expected to be involved in targeting were expressed and purified in order to ascertain if the R9Av ZF and Myb motifs have modified so as to direct the R2 clade element to the R9 site instead of the R2. Initially, four target DNAs were used in electrophoretic mobility shift assays (EMSA) to identify if and where the R9Av N-terminal ZF and Myb bind (). The R9Av polypeptide spanned from ZF3 through the Myb motif and was given the name R9Av ZF3-Myb (). A diagram of the four target site DNAs used in the EMSA reactions are shown in . Target 1 consisted of the segment of the 28S that becomes duplicated upon R9 insertion (126 bp TSD region) along with 112 bp of upstream flanking sequence (net 238 bp). Target 2 was the 126 bp TSD region along with 101 bp of downstream flanking sequence (net 227bp). Target 3 was the 112 bp of upstream flanking sequence. Target 4 was the 101 bp of downstream flanking sequence. Each target DNA was end-labeled with 32P and put into binding reactions with varying amounts of R9Av ZF3-Myb polypeptide. The polypeptide concentrations ranged from 240 nM to 9 nM in 3-fold increments. The binding reactions contained 9 ng of DNA, which translates to around 9.6 nM for Targets 1 and 2, and around 21 nM for Targets 3 and 4.
The R9Av ZF3-Myb polypeptide bound best to Target 1, with observable binding occurring in the presence of 9 nM of polypeptide—roughly a 1:1 molar ratio of protein to DNA (, Lane 5). Greater than 50% of Target 1 was bound in the presence of 27 nM polypeptide (, Lane 4), and 100% of target bound at 80 nM—a 8.3:1 protein to DNA molar ratio (, Lane 3). At the higher protein to DNA ratios, additional protein- DNA complexes occur (, Lanes 2–4). Presumably the fastest migrating protein-DNA complex represents a single polypeptide bound to DNA (hereafter called a monomer) as it appears in the lower protein concentration samples. The slower migrating protein-DNA complexes, then, may represent higher order protein-DNA complexes (e.g., dimer, trimer, etc.). Target 3 was also bound efficiently by the R9Av ZF3-Myb polypeptide in that protein-DNA complexes were observed at low protein concentrations (e.g., , Lane 5, a 1:2.3 molar ratio of protein to DNA). Target 2 was bound much less efficiently (at least 9× less efficient) than Target 1. Observable R9Av ZF3-Myb polypeptide binding to Target 2 did not occur until the 80 nM of polypeptide level—a 8.3:1 protein to DNA molar ratio (, Lane 3). The R9Av ZF3-Myb polypeptide did not bind appreciably to Target 4 ().
The EMSA reactions involving the R9Av ZF3-Myb polypeptide indicate that this region of the R9 protein is involved in DNA targeting. Because Target 1 and Target 3, which have the upstream DNA in common, were bound most efficiently by the R9Av ZF3-Myb polypeptide (respectively), it appears that the protein motifs contained in the polypeptide are responsible for securing an R9 protein subunit to DNA sequences upstream of the TSD region. However, as Target 2, which contains the TSD region, bound the R9Av ZF3-Myb polypeptide better than Target 4, it is possible that some base-specific interactions may occur within the TSD region in addition to the upstream sequence. Any presumed association of the polypeptide with the TSD region could not be due to the action of either ZF3 or ZF2 as the binding profile of the R9Av ZF1-Myb polypeptide mirrored the binding of the R9Av ZF3-Myb polypeptide on the four targets used in (data not shown). Binding to Target 1 was the most robust, followed by Target 3, then Target 2, and lastly Target 4. It is possible that the greater association of protein with Target 2 relative to Target 4 is related to local DNA structure (e.g., bent DNA for positioning a nucleosome) causing increased association of the polypeptide.
To better map the site of interaction between the R9Av ZF3-Myb polypeptide and the target DNA, a DNase I protection based DNA footprint analysis was performed. Assuming the binding of the R9Av ZF3-Myb polypeptide to target DNA is specific to the upstream sequence and perhaps the TSD region, as indicated by the EMSA results, it should be possible to precisely map the site (or sites) of interaction by DNase footprinting. A footprint signal is indicative of specific binding. So as not to unduly skew the mapping results, the footprint analysis was done using Target 1.
The R9Av ZF3-Myb polypeptide was bound to target DNA that had been end-labeled on either the top strand or the bottom strand, respectively. The R9Av ZF3-Myb polypeptide:DNA ratios were adjusted so as to form primarily monomers or monomers and dimers similar to that seen in , Lanes 4 and 5. The binding reactions were subsequently subjected to DNase I treatment under conditions that yield one cleavage event per DNA target. The DNase I treated protein-DNA complexes were fractionated into bound and free/reference fractions by EMSA (data not shown) prior to analysis on a denaturing polyacrylamide gel (). Segments of DNA that show protection from DNase I cleavage when compared with reference DNA indicate areas of DNA bound by the R9Av ZF3-Myb polypeptide. Of the bound fractions, , Lane 3 are from the dimer and , Lane 4 are from the monomer protein complex. The areas of DNase I protection denoting R9Av ZF3-Myb binding have been marked with long thick black lines next to the denaturing gel. Short thin black lines denote DNase I hypersensitive sites that were induced by the binding of the R9Av ZF3-Myb polypeptide.
The footprinted region for a single polypeptide monomer is restricted to sequences upstream of the TSD region, specifically from base pair position −47 to −27 on the top strand and from base pair position −46 to −27 on the bottom strand (, Lane 4). The base pair numbering is relative to the presumptive site of bottom strand cleavage and TPRT. The TPRT site is inferred from the orientation of inserted R9Av elements as well as the DNA cleavage sites required to generate the observed TSD. Negative numbers represent base pair positions upstream of the site of TPRT, and positive numbers represent base pairs positions downstream of the TPRT site. No obvious additional footprint signal was observed in any higher order complexes (e.g., the dimer complex, , Lane 3) beyond that defined for the monomer (, Lane 4). There is a DNase I hypersensitive site on the bottom strand at base pair position −46 induced by binding of a single polypeptide unit. Additional DNase I hypersensitive sites are observed in the presence of additional polypeptide units being bound (see top strand , Lane 3, positions +16 and +74).
In order to ascertain the binding orientation of the R9 Myb relative to the ZFs and to gain a higher resolution footprint, a missing nucleoside footprint analysis was performed on the R9Av ZF3-Myb polypeptide.Citation30 A second missing nucleoside footprint was performed on the truncated polypeptide, R9Av ZF1-Myb, which is missing ZF3 and ZF2. DNA with random abasic sites (i.e., missing nucleoside DNA) was exposed to protein in a binding reaction followed by EMSA fractionation. The resulting DNA fractions were analyzed on a denaturing polyacrylamide gel. Bands corresponding to a DNA base that, when missing, interfered with protein binding yielded a footprint signal. The missing nucleoside data for the R9Av ZF3-Myb polypeptide bound to DNA are presented in . Only the fastest migrating protein-DNA complex (i.e., the monomer) was analyzed. The DNA bases that were found to interact with the R9Av ZF3-Myb polypeptide are located from −43 to −31 on the top strand and from −46 to −31 on the bottom strand, in good agreement with DNase I footprint data. The missing nucleoside data for the R9Av ZF1-Myb are presented in .
The missing nucleoside data have been summarized for both polypeptides, along with the DNase I data, in . Both DNase I and missing nucleoside footprint data confirm that the R9Av N-terminal DNA binding region, ZF3-Myb, binds to DNA sequences upstream of the 126 bp TSD region. If a specific interaction exists between the ZF3-Myb polypeptide and the TSD region, the interaction is not stable enough to footprint in our reactions. In addition, no additional footprint signal was detected in any of the higher order complexes, indicating a single specific binding site for the polypeptide on the target DNA is likely. The shorter R9 polypeptide, R9 ZF1-Myb, made contacts on both strands in the region of −40 to −31, but just on the top strand in the region spanning −43 to −41. The lack of footprinting on the bottom strand in the region of −46 to −40 with the shorter polypeptide indicated that either, or both, ZF3 and ZF2 associate with the bottom strand in this region. The region from −40 to −31, which footprinted on both strands with both polypeptides, is likely where the Myb motif binds. ZF1 binding may then account for the top strand signal from −43 to −41. This interpretation is consistent with other R2 elements where the Myb motif contacted both DNA strands over roughly a 10 bp region and ZF1 contacted at least one strand over a 3–5 bp region.Citation17,Citation18 Additional studies would have to be done to confirm the ZF assignments in R9Av; however, it is clear that the Myb motif binds closest to the insertion site and the ZFs farther away.
R9Av and R2Lp are both R2-A group elements that use their N-terminal ZFs and Myb to target different sites in the genome. R9Av targets the R9 site, and R2Lp targets the R2 site. Both elements use their respective N-terminal DNA binding module to bind a protein subunit upstream of the insertion site.Citation17 Interestingly, R9Av and R2Lp bind in opposite orientation to each other with respect to the binding order of the Myb motif and ZFs relative to the insertion site. In R2Lp, the ZFs are closest to the insertion site, while in R9Av the Myb motif binds closest to the insertion site. In the R2-D clade element R2Bm, the Myb motif and ZF bind downstream of the insertion site, with the ZF being closest to the insertion site.Citation18
DNA targeting by a non-R2 RLE-bearing non-LTR retrotransposon.
For both R2-A and R2-D group elements, the Myb domain appears to account for the largest continuous swath of base specific contacts for the subunit using the N-terminal motifs to bind to DNA.Citation17,Citation18 Indeed, we have been unsuccessful in getting the ZFs of R2-A and R2-D group transposons to bind tightly enough to be footprinted in the absence of the Myb motif (R9Av data not shown and refs. Citation17 and Citation18). Of the known RLE-bearing non-LTR retrotransposons, only R2 clade elements contain a Myb motif. In this aspect at least, R2 is not representative of other RLE-bearing non-LTR retrotransposons. N-terminal ZFs (typically two ZFs) have been identified in elements belonging to the NeSL, CRE, and Genie clades (). In order to extend our studies of target site recognition beyond R2, the NeSL clade element NeSL-1Ce was examined. NeSL-1Ce contains two N-terminal ZFs and targets the spliced leader-1 gene of Caenorhabditis elegans (). In order to test if the NeSL-1Ce ZFs function in target recognition similar to R2's Myb plus ZF(s) pairing, a polypeptide containing the NeSL-1Ce N-terminal ZFs (NeSL-1 ZFs) was cloned, expressed, and purified. The purified polypeptide was assayed for DNA binding activity against a 125 bp target DNA containing the NeSL-1 insertion site. EMSA analysis showed a slower migrating complex consistent with the NeSL-1 ZFs polypeptide binding to target DNA (). DNase I footprint analysis () was used to determine if the binding was specific. Regions of DNA protected from DNase I degradation in the presence of bound polypeptide were localized to two closely spaced regions: (1) top strand base pair positions −21 to −19, bottom strand −20 to −17; (2) top strand −9 to −7, bottom strand −7 to −5. Base pair positions are relative to the NeSL insertion site (i.e., TPRT site), with negative integers representing base pair positions upstream of the TPRT site and positive integers representing base pair positions downstream of the TPRT site. The DNase I footprint has been overlaid on the insertion site sequence in . The NeSL-1 ZF polypeptide was found to bind to DNA sequences upstream of the insertion site (i.e., within the spliced leader exon). The two zones of DNase I protection observed likely correspond to the binding of the two ZFs, respectively. The binding orientation of the two ZFs is unknown.
Summary of RLE-bearing non-LTR retrotransposon DNA binding modes and implications for the integration model.
The insertion model posited for R2 elements, and by analogy all RLE-bearing non-LTR retrotransposons, requires two subunits of protein to affect element insertion, one subunit bound to each side of the insertion site. The catalytic domains of the two subunits must be in opposite orientation to each other in order to carry out the two half reactions required for insertion (see for more information derived from R2Bm). Amino acid alignments of R2 elements would appear to argue that there is tight integration between the RT and the carboxyl-terminal domain. There is not much room for flexible linkers between the highly conserved domains in the carboxyl terminal domain, unlike the N-terminal domain where there is variability in the number and makeup of the conserved motifs as well as variable spacing between some of the conserved regions.Citation12 Examinations of 5′ junctions of native R2 insertions in various Drosophila species indicate that the processes of second strand cleavage and second strand synthesis are rapidly evolving.Citation29 In R2Bm, where most of the biochemistry has been done, the subunit that performs second strand cleavage (and presumably second strand synthesis) interacts with the 5′ RNA and binds to the target DNA downstream of the insertion site using the N-terminal ZF and Myb ( and ). The upstream subunit binds through an (as yet) unidentified protein domain (indicated by the single question mark in ). In R2Bm, the subunit that uses the unidentified DNA binding domain binds the 3′ RNA and performs first strand cleavage and TPRT. The unidentified DNA binding domain has been hypothesized to be located in the carboxyl-terminal domain.Citation6,Citation12
Collectively, our N-terminal DNA binding data on R2 and NeSL elements indicate that most site-specific RLE-bearing non-LTR retrotransposons likely use their N-terminally located DNA binding motifs to load a transposon subunit onto the target site. There appears to be variability in how the ZFs and Myb motifs bind target DNA (). In some cases the N-terminal motifs are used to secure the upstream subunit and in other cases the downstream subunit. In some cases the ZFs are closest to the insertion site, and in other cases the Myb is closest to the insertion site. This variability may indicate plasticity in how the binding and catalytic domain functions are wired into the overall insertion mechanism. The orientation changes may relate to the absolute orientation of the RLE and RT catalytic domains relative to the insertion site, although flexible linkers could either decouple binding orientation from catalytic orientation or even re-wire the linkage to be opposite of what is known for R2Bm.
In R2Lp the upstream and downstream subunits appear to be swapped relative to R2Bm.Citation17 However, orientation of the ZFs and Myb motif relative to the insertion site are identical to R2Bm (i.e., the ZFs are nearest the insertion site).Citation17 The R2Lp downstream subunit is hypothetical and is based upon the R2Bm model of insertion. The downstream R2Lp subunit is marked with two question marks in , one question mark to signify the presence of the subunit being hypothetical and one question mark to signify that, if present, the subunit would be expected to bind to DNA using the same unidentified (carboxyl-terminal?) protein domain that secures the R2Bm upstream subunit to target DNA. In the case of R9Av, the N-terminal ZFs and Myb bind upstream of the insertion site as in R2Lp, but the binding orientation of the ZFs and Myb appear flipped compared with R2Lp (). The two R9Av subunits would be expected to be near each other in space, assuming the 146 bp TSD region was wrapped around a nucleosome and the downstream subunit bound just downstream of the TSD.Citation28 As in R2Lp, the R9Av downstream subunit would be targeted via the unidentified DNA binding domain. In the case of NeSL-1, the upstream subunit is again bound using the N-terminal ZFs (orientation unknown), and the hypothetical downstream subunit would be bound to target DNA using the unidentified DNA binding domain. In each case, it is tempting to speculate that the subunit that binds the 3′ UTR RNA binds to DNA using the hypothetical carboxyl-terminal DNA binding domain (or the RLE) and performs TPRT as the RT and conserved carboxyl-terminal motifs appear to be more tightly linked. The variable N-terminal domain is attached to the RT through a variable length spacer.Citation12 The subunit that performs the rapidly evolving second strand cleavage and second strand synthesis reactions would be bound to target DNA using the N-terminal ZFs (and Myb). If this model continues to hold, it may be possible to engineer an RLE-bearing element (e.g., R2Bm) to target elsewhere in the genome by swapping out the DNA binding motifs. For RLE-bearing retrotransposons to be used as site-specific gene targeting vehicles, however, the unidentified DNA binding domain will need to be identified. In addition, a greater knowledge of the 3D and globular domain structure is of great interest.
Materials and Methods
Generating expression constructs.
Constructs containing R9 and NeSL-1 derived N-terminal putative DNA binding motifs were generated and named as follows: R9Av ZF3-Myb corresponds to codons 54–295 of the extended ORF (stop codon to stop codon) from R9 Adineta vaga transposon (R9Av) GenBank GQ398057.1, R9Av ZF1-Myb corresponds to codons 154–295 of R9Av, and NeSL-1 ZFs corresponds to codons 110–261 of the Caenorhabditis elegans NeSL-1transposon extended ORF (stop codon to stop codon). The polymerase chain reaction (PCR) primers used to amplify the above regions of interest from A. vaga and C. elegans genomic DNA are listed in . A. vaga genomic DNA was a gift from Irina Arkhipova (Josephine Bay Paul Center for Comparative Molecular Biology and Evolution). C. elegans genomic DNA was a gift from Andre Pires d Silva (University of Texas Arlington). The R9 fragments were cloned into the Gateway® donor vector pENTR/TEV/D-TOPO (Invitrogen, K2535,20) using the manufacturer's protocol and then recombined into the Gateway®-compatible bacterial-expression destination vector pDESTTAP.Citation17 The NeSL-1 ZFs were cloned into the NdeI and BamHI sites of the bacterial expression vector pET28a (Novagen, 69864-3). Initial ligation and recombination reactions were transformed into electroporation competent XL-1 Blue Escherichia coli (Agilent, 200259) or chemically competent Oneshot Top10 E. coli (Invitrogen, C4040-10) for screening purposes. Resulting colonies were screened by PCR and sequenced (Big Dye, Applied Biosystems, 4337455). The expression constructs were maintained in Arctic Express RIL DE3 E. coli cells (Stratagene, 230193) for expression.
Protein expression and purification.
Cells were grown in 200 mL of Luria Bertani medium to an A600 of 0.6–0.7 at 37°C in an incubator shaker. The culture was then cooled to 12°C. Isopropyl-β-D-Galactoside (IPTG) was added to the cooled culture at a final concentration of 1 mM for the R9Av clones and 0.1 mM for the NeSL-1 clone. After the addition of IPTG, the cultures were further incubated for 24 h at 12°C in an incubator shaker. The cultures were centrifuged at 4,000× g for 20 min at 4°C. The pellets were washed with cold 10 mM TRIS-HCl pH 7.5 and were either used directly or stored at −80°C.
The R9Av pellets were resuspended in 2.5 mL of Solution A [50% glycerol, 100 mM HEPES, 5 mM β-mercaptoethanol, and 2 mg/mL of lysozyme (Amresco, 0663)] and incubated on ice for 15 min and at room temperature for 15 min. The resuspended cells were lysed by adding 13.2 ml of solution B (100 mM HEPES, 1 M NaCl, 5 mM β-mercaptoethanol, and 0.1% Triton X-100) and incubating on ice for 30 min. The resuspended pellet was then centrifuged for 20 h at 69,888× g at 2°C. The supernatant was mixed with the Talon resin (Clontech, 635501) that had been prewashed with 10 ml of Talon column buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 0.2% triton X-100, 5 mM imidazole pH 7.5). The resin bound protein was washed with 20 mL of column buffer containing 10 mM imidazole and eluted with 300 ul of column buffer containing 300 mM NaCl and 150 mM imidazole. An equal volume 100% glycerol was added to the eluate for storage at −20°C.
The NeSL-1 pellets were resuspended in 8 mL of lysis buffer (100 mM Hepes pH 7.5, 1 M NaCl, 1 mM β-mercaptoethanol and 0.2% triton X-100, 10 units of DNase I) and passed through a French press two times. The cell lysates were centrifuged in an Eppendorf centrifuge at 12,000 RPM for 10 min at 4°C. The supernatant was adjusted to 10% glycerol and passed over the Talon resin columns by gravity flow, as previously described.Citation17
Protein concentrations were determined by samples run on a SDS 6% PAGE along with a bovine serum albumin standard. SDS PAGE were stained with Sypro Orange (Biorad, 170-3120) or Comassie Blue R-250 (Amresco, 0472-10G) and the band intensities were measured using imageJ 1.38X software.Citation31 The apparent purities were 80% or greater.
Electrophoretic mobility shift assays and DNA footprints.
The 5′ 32P end labeled DNA substrates were generated () and purified as previously described.Citation6 The binding reactions in were performed in 13 uL reactions: 10 uL of a solution containing 7.5 mM TRIS-HCl (pH 7.5), 50 mM NaCl, 2.75 mM MgCl2, 0.5 mM CaCl2, 0.8 mM ditriothritol, 9 ng of target DNA, 50 ng of poly dI-dC (Sigma Aldrich, P4929-5UN); and 3 uL of protein diluted to an appropriate concentration in protein storage buffer (see above). The binding reactions were incubated at 25°C for 20 min and then loaded on a 1X TBE (89 mM Tris base, 89 mM boric acid, 2 mM EDTA) native 5% polyacrylamide gel. The gels were run at 230 V for 30 min. Gels were dried and visualized on a phosphorimager screen.
The binding reactions for DNase I footprints were similar but lacked poly dI-dC and were performed under conditions that gave approximately 40–60% bound species (see also refs. Citation6 and Citation32). 0.012 units of DNase I were used. Binding reactions treated with DNase were fractionated on native polyacrylamide to isolate the bound, free, and reference fractions. The bound and reference fractions were analyzed on a denaturing 6% polyacrylamide gel. The reference DNA fraction was from reactions that did not contain transposon protein. Missing nucleoside footprints were as previously described except for the use of the binding conditions noted above for DNase I footprints.Citation18,Citation30
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Abbreviations
LTR | = | long terminal repeat |
UTR | = | untranslated region |
TPRT | = | target primed reverse transcription |
TE | = | transposable element |
RLE | = | restriction-like endonuclease |
PDR | = | polymerase chain reaction |
EMSA | = | electrophoretic mobility shift assay |
ZF | = | zinc finger |
Bm | = | Bombyx mori |
Av | = | Adineta vaga |
Lp | = | Limulus polyphemus |
Hm | = | Hydra magnipapillata |
bp | = | base pair |
nt | = | nucleotide |
APE | = | apurinic-apyrimidinic endonuclease |
Acknowledgments
The authors wish to thank Dr. Irina Arkhipova and Dr. Andre Pires da Silva for generously providing A. vaga and C. elegans genomic DNA, respectively. The authors thank Dr. Kimberly Bowles, Dr. Brad Reveal, Athena Jagdish, and Matthew Nelson for critical reading of the manuscript and for useful discussions. Finally the authors wish to thank and Aiswarya Pat for help with cloning and Micki Christensen for copy-editing. Funding was generously provided by NSF.
References
- Malik HS, Burke WD, Eickbush TH. The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol 1999; 16:793 - 805; PMID: 10368957
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature 2001; 409:860 - 921; PMID: 11237011; http://dx.doi.org/10.1038/35057062
- Han JS, Boeke JD. LINE-1 retrotransposons: modulators of quantity and quality of mammalian gene expression?. Bioessays 2005; 27:775 - 784; PMID: 16015595; http://dx.doi.org/10.1002/bies.20257
- Ye J, Eickbush TH. Chromatin structure and transcription of the R1- and R2-inserted rRNA genes of Drosophila melanogaster. Mol Cell Biol 2006; 26:8781 - 8790; PMID: 17000772; http://dx.doi.org/10.1128/MCB.01409-06
- Beck CR, Collier P, Macfarlane C, Malig M, Kidd JM, Eichler EE, et al. LINE-1 retrotransposition activity in human genomes. Cell 2010; 141:1159 - 1170; PMID: 20602998; http://dx.doi.org/10.1016/j.cell.2010.05.021
- Christensen SM, Eickbush TH. R2 target-primed reverse transcription: ordered cleavage and polymerization steps by protein subunits asymmetrically bound to the target DNA. Mol Cell Biol 2005; 25:6617 - 6628; PMID: 16024797; http://dx.doi.org/10.1128/MCB.25.15.6617-6628.2005
- Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 1993; 72:595 - 605; PMID: 7679954; http://dx.doi.org/10.1016/0092-8674(93)90078-5
- Eickbush TH. Craig NL, Craigie R, Gellert M, Lambowitz AM. R2 and Related Site-Specific Non-Long Terminal Repeat Retrotransposons. Mobile DNA II 2002; Washington, DC ASM Press 813 - 835
- Moran JV, Gilbert N. Craig NL, Craigie R, Gellert M, Lambowitz AM. Mammalian LINE-1 Retrotransposons and Related Elements. Mobile DNA II 2002; Washington, DC ASM Press 836 - 869
- Eickbush TH, Jamburuthugoda VK. The diversity of retrotransposons and the properties of their reverse transcriptases. Virus Res 2008; 134:221 - 234; PMID: 18261821; http://dx.doi.org/10.1016/j.virusres.2007.12.010
- Yang J, Malik HS, Eickbush TH. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc Natl Acad Sci USA 1999; 96:7847 - 7852; PMID: 10393910; http://dx.doi.org/10.1073/pnas.96.14.7847
- Burke WD, Malik HS, Jones JP, Eickbush TH. The domain structure and retrotransposition mechanism of R2 elements are conserved throughout arthropods. Mol Biol Evol 1999; 16:502 - 511; PMID: 10331276
- Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HHJ. High frequency retrotransposition in cultured mammalian cells. Cell 1996; 87:917 - 927; PMID: 8945518; http://dx.doi.org/10.1016/S0092-8674(00)81998-4
- Feng Q, Moran JV, Kazazian HHJ, Boeke JD. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 1996; 87:905 - 916; PMID: 8945517; http://dx.doi.org/10.1016/S0092-8674(00)81997-2
- Christensen S, Pont-Kingdon G, Carroll D. Comparative studies of the endonucleases from two related Xenopus laevis retrotransposons, Tx1L and Tx2L: target site specificity and evolutionary implications. Genetica 2000; 110:245 - 256; PMID: 11766845; http://dx.doi.org/10.1023/A:1012704812424
- Mandal PK, Bagchi A, Bhattacharya A, Bhattacharya S. An Entamoeba histolytica LINE/SINE pair inserts at common target sites cleaved by the restriction enzyme-like LINE-encoded endonuclease. Eukaryot Cell 2004; 3:170 - 179; PMID: 14871947; http://dx.doi.org/10.1128/EC.3.1.170-179.2004
- Thompson BK, Christensen SM. Independently derived targeting of 28S rDNA by A- and D-clade R2 retrotransposons: plasticity of integration mechanism. Mobile Genet Elements 2011; 1:29 - 37
- Christensen SM, Bibillo A, Eickbush TH. Role of the Bombyx mori R2 element N-terminal domain in the target-primed reverse transcription (TPRT) reaction. Nucleic Acids Res 2005; 33:6461 - 6468; PMID: 16284201; http://dx.doi.org/10.1093/nar/gki957
- Christensen S, Eickbush TH. Footprint of the retrotransposon R2Bm protein on its target site before and after cleavage. J Mol Biol 2004; 336:1035 - 1045; PMID: 15037067; http://dx.doi.org/10.1016/j.jmb.2003.12.077
- Volff JN, Korting C, Froschauer A, Sweeney K, Schartl M. Non-LTR retrotransposons encoding a restriction enzyme-like endonuclease in vertebrates. J Mol Evol 2001; 52:351 - 360; PMID: 11343131
- Malik HS, Eickbush TH. NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans. Genetics 2000; 154:193 - 203; PMID: 10628980
- Burke WD, Malik HS, Rich SM, Eickbush TH. Ancient lineages of non-LTR retrotransposons in the primitive eukaryote, Giardia lamblia. Mol Biol Evol 2002; 19:619 - 630; PMID: 11961096
- Eickbush TH, Malik HS. Craig NL, Craigie R, Gellert M, Lambowitz AM. Origins and Evolution of Retrotransposons. Mobile DNA II 2002; Washington, DC ASM Press 1111 - 1146
- Kojima KK, Fujiwara H. Cross-genome screening of novel sequence-specific non-LTR retrotransposons: various multicopy RNA genes and microsatellites are selected as targets. Mol Biol Evol 2004; 21:207 - 217; PMID: 12949131; http://dx.doi.org/10.1093/molbev/msg235
- Kojima KK, Fujiwara H. Long-term inheritance of the 28S rDNA-specific retrotransposon R2. Mol Biol Evol 2005; 22:2157 - 2165; PMID: 16014872; http://dx.doi.org/10.1093/molbev/msi210
- Christensen SM, Ye J, Eickbush TH. RNA from the 5′ end of the R2 retrotransposon controls R2 protein binding to and cleavage of its DNA target site. Proc Natl Acad Sci USA 2006; 103:17602 - 17607; PMID: 17105809; http://dx.doi.org/10.1073/pnas.0605476103
- Kojima KK, Kuma K, Toh H, Fujiwara H. Identification of rDNA-specific non-LTR retrotransposons in Cnidaria. Mol Biol Evol 2006; 23:1984 - 1993; PMID: 16870681; http://dx.doi.org/10.1093/molbev/msl067
- Gladyshev EA, Arkhipova IR. Rotifer rDNA-specific R9 retrotransposable elements generate an exceptionally long target site duplication upon insertion. Gene 2009; 448:145 - 150; PMID: 19744548; http://dx.doi.org/10.1016/j.gene.2009.08.016
- Stage DE, Eickbush TH. Origin of nascent lineages and the mechanisms used to prime second-strand DNA synthesis in the R1 and R2 retrotransposons of Drosophila. Genome Biol 2009; 10:R49; PMID: 19416522; http://dx.doi.org/10.1186/gb-2009-10-5-r49
- Hayes JJ, Tullius TD. The missing nucleoside experiment: a new technique to study recognition of DNA by protein. Biochemistry 1989; 28:9521 - 9527; PMID: 2611245; http://dx.doi.org/10.1021/bi00450a041
- Abramoff MD, Magalhaes PJ. Image Processing with ImageJ. Biophotonics International 2004; 11:36 - 42
- Brenowitz B, Senear DF, Kingston RE. Ausubel FM. DNase I Footprint Analysis of Protein-DNA Binding. Current Protocols in Molecular Biology 2003; Hoboken, NJ John Wiley and Sons, Inc 12.4 - 12.5
- Burke WD, Muller F, Eickbush TH. R4, a non-LTR retrotransposon specific to the large subunit rRNA genes of nematodes. Nucleic Acids Res 1995; 23:4628 - 4634; PMID: 8524653; http://dx.doi.org/10.1093/nar/23.22.4628
- Teng SC, Wang SX, Gabriel A. A new non-LTR retrotransposon provides evidence for multiple distinct site-specific elements in Crithidia fasciculata miniexon arrays. Nucleic Acids Res 1995; 23:2929 - 2936; PMID: 7659515; http://dx.doi.org/10.1093/nar/23.15.2929