PASCANI, MOLDOVA
Maize and Sorghum Research Institute

Probable Bg encoded proteins and origin of Bg-rbg and PIF families of transposable elements

— Koterniak, VV

Although the nucleotide sequence of Bg, the autonomous element of the Bg-rbg system of transposons, was published in 1991 (Hartings H et al, Molecular and General Genetics 227: 91–96, 1991; Maydica. 36: 355–359, 1991), no work on Bg encoded transposase has been published since then (at least to the author’s knowledge). It is possible, however, to assume that some conclusions about properties of Bg encoded protein(s) could be made proceeding from analysis of this sequence. Such an approach (especially taking into account the absence of experimental molecular data on the Bg transposase) to a certain degree is justified by the broad possibilities of sequence database analysis. Thus, analysis of DNA sequences of Arabidopsis thaliana, Oryza sativa, and Caenorhabditis elegans allowed Kapitonov and Jurka to reveal helitrons, a new class of eukaryotic transposons (Kapitonov VV and Jurka J, PNAS 98: 8714–8719, 2001). Later, supposed insertions of this class of transposons were found in the maize genome (Lal SK et al, The Plant Cell 15: 381–391, 2003).

Probable structure and properties of Bg encoded proteins were concluded from nucleotide sequence of this autonomous element. All transposons’ sequences analyzed in this work were obtained from the National Center for Biotechnology Information database. The BLAST analysis was performed on the NCBI server, and the CLUSTALW analysis was carried out on the server of European Bioinformatics Institute using default parameters.

In analyzing nucleotide sequence of Bg element (GenBank accession number X56877.1), it is possible to reveal probable transcription, translation and intron-junction motifs in this element. Such motifs suggest that the Bg element may encode several, most likely three, proteins, which for convenience hereinafter will be referred to as PPBg1, PPBg2 and PPBg3 (Fig. 1).

The translation site of the first probable protein begins from the start codon in position 813, followed by a second largest ORF that terminates with the TAA stop codon in position 1550. Arguments in favor of this translation start site were given in the works of Hartings et al. (Hartings H et al, Molecular and General Genetics 227: 91–96, 1991; Maydica 36: 355–359, 1991). The authors indicate that, although upstream to the mentioned starting position, there is no clear TATA box. This region is characterized by high G and C content, showing, in this respect, a similarity with both the Ac element (Kunze R et al, EMBO J. 6: 1555–1563, 1987) and some mammal housekeeping genes (Hartings H et al, Molecular and General Genetics 227: 91–96, 1991; Maydica 36: 355–359, 1991).

The ATG codon beginning from position 2862 can be the translation start for the second, PPBg2 protein. Although another two consecutive ATG codons are six nucleotides upstream of this codon, only the codon beginning from the position 2862 is situated in the sequence CAGCCATGG, which is in good agreement with the consensus sequence for eukaryotic initiation site (Kozak M, Nucleic Acids Research 12: 857–872, 1984). Two regions, spaced 28–34 and 75–83 bp upstream of this codon, are similar to TATA and CAAT boxes (positions 2828–2834 and 2779–2787, respectively). The sequence following the indicated translation start is interrupted by two introns (positions 2974–3056 and 3125–3143), after which the longest ORF of the Bg element is terminated by the TAA codon in position 3962.

 

a) PPBg1

   1-60 MAFEVEEDDA HPRRRSNATV TDEQDDCHRK GKGVIGASSS DVAGTSTMPE NSVEVVTQNG
 61-120 RYMLVYFYVM VNKKSLFMVL YSCIFKILWS YTAGRRVKLK GGIESPWSHG EPYGNGFSCN
121-180 YCTSRIKGGG ATRLREHLGG LPGNVAACIN VPLNVKAIMT DQVAVRRIRR RRNNDLRHYV
181-240 EREVRESNKG LGTSSKARIP LDEEGQIQMA LRESLREYDE ERGIGCSSGS RSASCSANQQ
241-245 TRLDR

 

b) PPBg2

   1-60 MAQIRGNLSK EKDLLDRIIR VIKKRLKYML DDTLIVAAGA LDPKTLYTTK LARKPSTRHA
 61-120 KLASSSKIAS AAIEQYAFFC EKRGLFAGEE AERSATNGRM SAGFYLTNYY ASLVVALQFL
121-180 VLLLNGSCSF FAAEWWSAYG GEYKELQMLA RRIVSQCLSS SGCERNWSTF ALVHTKLRNR
181-240 LGYEKLHKLV YVHYNLKLRI QHFENDMQSL QEMQVFKDTE LDPYSVMIDC AMYDEGNPIM
241-300 DWLCNSRSES TPILDEYDDN DIESPIPSRV LMDEFGMDFN TRDGKKKRKA RLVDIEEEME
301-332 DDVESDSSEG SPINVELCDS SSDDGTGILC EE

 

c) PPBg3

   1-60 MAFEVEEDDA HPRRRSNATV TDEQDDCHRK GKGVIGASSS DVAGTSTMPE NSVEVVTQNG
 61-120 RYMLVYFYVM VNKKSLFMVL YSCIFKILWS YTAGRRVKLK GGIESPWSHG EPYGNGFSCN
121-180 YCTSRIKGGG ATRLREHLGG LPGNVAACIN VPLNVKAIMT DQVAVRRIRR RRNNDLRHYV
181-240 EREVRESNKG LGTSSKARIP LDEEGQIQMA LRESLREYDE ERGIGCSSGS RSASCSANQQ
241-300 TRLDRVSGID KISLCNGWCL MIGKTTHGKM KQIMHLLMIM MTAMAQIRGN LSKEKDLLDR
301-360 IIRVIKKRLK YMLDDTLIVA AGALDPKTLY TTKLARKPST RHAKLASSSK IASAAIEQYA
361-420 FFCEKRGLFA GEEAERSATN GRMSAGFYLT NYYASLVVAL QFLVLLLNGS CSFFAAEWWS
421-480 AYGGEYKELQ MLARRIVSQC LSSSGCERNW STFALVHTKL RNRLGYEKLH KLVYVHYNLK
481-540 LRIQHFENDM QSLQEMQVFK DTELDPYSVM IDCAMYDEGN PIMDWLCNSR SESTPILDEY
541-600 DDNDIESPIP SRVLMDEFGM DFNTRDGKKK RKARLVDIEE EMEDDVESDS SEGSPINVEL
601-615 CDSSSDDGTG ILCEE

Fig. 1. Amino acid sequences of probable Bg encoded proteins PPBg1, PPBg2 and PPBg3 (a–c, respectively). The 38 amino acid sequence of PPBg3, which is absent in PPBg1 and PPBg2, is highlighted in orange. Numbers of exons’ bases for PPBg1 are 813–1550; for PPBg2 they are 2862–2973, 3057–3124, and 3144–3962; and for PPBg3 they are 813–1550, 2622–2724, 2850–2973, 3057–3124, and 3144–3962. DNA binding domains similar to analogous basic domains of the Ac transposase (accession no. TQZMCA, residues 136–145 (HLRTSHSLVK)) are highlighted in yellow. In PPBg3, the residues similar to the DDE motif of the PIF element transposase (see discussion in the text) are shown in bold.

 

A possibility for alternative splicing of Bg transcripts. It is necessary to mention that termination codon of PPBg1 is situated in a 8 bp sequence (bases 1545-1552) which differs from the consensus sequence of exon-intron junction by only one nucleotide (earlier, the similarity of this sequence to intron splice-junction site was noted also by Hartings et al (Hartings H et al, Maydica 36: 355–359, 1991)). The motifs characteristic for intron-junction sites can be found also downstream of this site and upstream above-mentioned TATA and CAAT boxes of the translation start site of the second, PPBg2 protein. This indicates on the possibility of alternative splicing for Bg transcripts. One of its versions could be formation of a third, PPBg3 protein, which besides sequences of PPBg1 and PPBg2 contains an insertion of 38 amino acids absent in both proteins (Fig. 1c).

It is interesting to note that by alternative splicing the formation of TnpA and TnpD proteins participating in transposition of the En/Spm element occurs (Masson P. et al, Cell 58: 755–765, 1989). It is suggested that these proteins (with high degree of homology) play different roles in the transposition process: TnpA selectively binds DNA at subterminal repeats that leads to DNA bending, whereas TnpD acts as endonuclease (e.g. Kunze R and Weil CF, In “Mobile DNA II”, ASM Press, Washington. P. 565–610, 2002). Taking into account that PPBg3 also has a high degree of homology with PPBg1 and PPBg2 (containing amino acids regions identical to both proteins) and that some regions of these proteins show similarity with DNA binding, dimerization and catalytic activity domains (see below) it is not excluded that similar mechanisms of division of functions for PPBg1–PPBg3 may take place in case of Bg (or rbg) transposition.

Encoding of several proteins by the Bg element favors the earlier assumption about participation of a nonautonomous rbg element product in rbg excision (Maydica 48: 275–281, 2003). Although the sequence of the rbg element is not available in the GenBank database, rbg differs from Bg by small deletions and insertions, and the two elements share more than 75% homology based on sequence data (Hartings H et al, Molecular and General Genetics 227: 91–96, 1991). Nonautonomous elements are, as a rule, the defective derivative of autonomous elements (e.g. Fedoroff NV, Cell 56: 181–191, 1989). In case the autonomous element encodes several proteins, and if a mutation in one of these proteins leads to apparition of nonautonomous element, there is the possibility for the latter to encode other fully-functional proteins. This gives the nonautonomous element the ability to participate, together with the proteins encoded by the active autonomous element, in forming transposition complexes. Such participation can be the basis of the previously-described specificity of interaction between different Bg elements and rbg-containing o2-m(r) alleles, which is manifested by a high variability of reversion frequency of these alleles and by varied character of dosage effects of Bg elements. Thus, the Bg-hf element determines high reversion frequency of both o2-hf and o2-lf alleles, whereas the Bg-lf element can induce high reversion frequency of the o2-hf but not the o2-lf, allele (Maydica 48: 275–281; 2003). In addition, Bg-lf shows strong positive dosage effects in relation to reversion frequency of the o2-hf allele (Maydica 48: 275–281; 2003) but these effects are not significant in the crosses involving only the o2-lf allele (Genetika (Moscow) 39: 769–774, 2003; Maydica 48: 275–281; 2003).

Some properties of probable Bg proteins. The BLAST analysis indicated that both PPBg1 and PPBg2 show significant similarity with transposable elements and transposases of different species. Thus, PPBg1 has significant homology with transposase-like proteins of A. thaliana (BAB03069.1; T52187; BAB02511.1; e values from 2·10-4 to 4.3), since significant similarity with PPBg2 (e values from 3·10-71 to 0.25) was found with mentioned sequences of O. sativa, A. thaliana, Musa acuminata, Phytophthora infestans, and Fusarium oxysporum (NP_920655.1; T52187; AAR96007.1; AAT40862.1; AAC16005.1; respectively).

It is possible to note that PPBg1 and PPBg2 differ significantly in acid and basic residues. Relative content of acid residues in PPBg2 is about 1.3 times higher than in PPBg1. In PPBg2 (and in PPBg3), these amino acids may form multiple DDE motifs of varied pattern. DDE motifs are characteristic for different eukaryotic and bacterial transposases, and are connected with their catalytic activity (e.g. Mahillon J and Chandler M, Microbiology and Molecular Biology Review 62: 725–774, 1998). In PPBg3, an interesting DDE motif can be found, formed by D residues in positions 250 and 325 and by E residue in position 375, which gives a D-(74)-D-(49)-E pattern similar to the analogous motif of the PIF element transposase (Zhang X et al, Genetics 166: 971–986, 2004; see also discussion below). Seven bp downstream of E375 is the R residue, one of the additional residues characteristic for DDE motifs. It is important to mention that the first D residue of this motif is situated in the amino acid region absent in both PPBg1 and PPBg2 proteins (Fig. 1c, bold letters).

A recently-published study of another representative of the hAT superfamily, the Hermes element, indicates that the active site of this element’s transposase is composed of a DDE motif (Zhou L et al, Nature 432: 995–1001, 2004). An interesting feature of this motif (D180, D248, E572) is wide separation between the second D and E residues which is similar to an analogous structure existing in RAG1 recombinase (D708-E962) (Zhou L et al, Nature 432: 995–1001, 2004). A motif similar to the previously-mentioned DDE pattern of the Hermes transposase may also be found in PPBg3 (e.g. D175, D244, E579). Experimental studies, of course, are needed to determine which of the above-indicated DDE motifs (the one similar to the PIF or to the Hermes element) is involved in Bg transposition. However, the presence of multiple DDE motifs, especially in PPBg2 and PPBg3 (e.g., D206 of PPBg2 can give 12 combinations of the D-(51-54)-D-(34-41)-E type), may indicate that the enzymatic activity of Bg proteins is based on the DDE chemistry.

On the other hand, PPBg1 contains more basic residues in comparison to PPBg2 (1.2 times their relative content). The BLAST analysis indicates that these residues may form DNA binding domains similar to the domains of tomato’s E4/E8BP-1 protein (accession no. T07868; e=0.005; Fig. 2a) and to the 3AF1 protein of Nicotiana tabacum (accession no. CAA44608.1; e=0.023).

 

a)

92  TAGRRVKLKGGIESPWSHGEPYGNGFS---CNYCTSRIKGGGATRLREHLGG--LPGNVAACINVPLNVKAIMTD 161
188 TASKSARKGRPLDDAWQHATPVDGKKQRTVCNYCGFISSSGGITYLKTHLGGGDPTGSLKGCPNVPPEVKRVMKE 262

 

b)

133 AEWWSAYGGEYKELQMLARRIVSQCLSSSGCERNWSTFALVHTKLRNRLGYEKLHKLVYVHYNLKLRIQHFEND 206
215 AEWWSAYGSSTPNLQNFAIKVLSLTCSATGCERNWGVFQLLHTKRRNRLTQCRLNDMIFVKYNRALQRRYKRND 288

Fig. 2. Similarity of some domains of PPBg1 and PPBg2 (upper line) revealed by BLASTP analysis. (a) A part of PPBg1 sequence similar to tomato’s DNA binding protein E4/E8BP-1 (accession no. T07868); (b) A part of PPBg2 sequence similar to hAT dimerization domain of A. thaliana) (accession no. NP_680299.1). Identical residues are shown with a black background, and the similar ones are shown with a grey background.

 

The DNA binding domain of PPBg1 (Fig. 2a) is also part of a region showing similarity with the hAT dimerization domain of A. thaliana (accession no. NP_188371.1, data not shown), although similarity with this domain was less strong (e=0.002) than was the similarity of the analogous domain of PPBg2 with the hAT dimerization domain of A. thaliana (accession no. NP_680299.1; e=9·10-14, Fig. 2b). Higher expression of hAT dimerization domain in PPBg2 may indicate that this protein (or/and the C-end of PPBg3) plays an important role in oligomerization of Bg-encoded proteins. This favors the previous conclusion based on dosage effects of different Bg elements displayed with mutable o2-m(r) alleles, suggesting that the Bg-encoded transposase acts as an oligomer able to form inactive aggregates at high concentration (Genetika (Moscow) 39: 769–774, 2003; Maydica 48: 275–281, 2003).

Sequences of all probable Bg-encoded proteins also show similarity with the short basic DNA binding domain that is present at the N-end of the probable Ac transposase (Feldmar S and Kunze R, EMBO J. 13: 4003–4010, 1991; accession no. TQZMCA) (Fig. 1, underlined). The BLAST analysis did not reveal significant similarity between another DNA binding domain of the Ac transposase (residues 159–206; see review of Kunze R and Weil CF in “Mobile DNA II”, ASM Press, Washington. P. 565–610, 2002) and PPBg1–PPBg3 sequences.

Unexpected similarity between Bg and PIF elements. Molecular analysis of the PIF element showed that it is a member of a new family of transposons (Walker E. L. et al, Genetics 146: 681–693, 1997). However, the PIF terminal inverted repeats (TIRs) of 14 bp and the target site duplications (TSD) of 3 bp caused by its insertion, are similar to those of the members of CACTA superfamily (Walker EL et al, Genetics 146: 681–693, 1997), which includes maize En/Spm elements. Walker et al. suggest that this similarity in TDS and TIRs length may indicate that a common ancestral group existed for the PIF and CACTA families (Walker EL et al, Genetics 146: 681–693, 1997). Zhang et al. consider the PIF element to belong to a new superfamily of eukaryotic transposons, which is distantly related to the IS5 group of bacteria (Zhang X. et al., PNAS 98: 12572–12577, 2001).

Although Bg and PIF belong to different families (and even superfamilies) of transposons, it seems that they have a certain similarity in the expression of their activity in plant ontogenesis. Thus, excisions of the PIF element occur during, immediately after, or prior to meiosis (Walker E. L. et al, Genetics 146: 681–693, 1997). Activity of the Bg element assessed by reversion frequency of rbg-containing mutable o2-m(r) alleles indicates that excision of the nonautonomous rbg element from these alleles occurs mostly during gametophyte and endosperm development (Salamini F, Cold Spring Harbor Symp. Quant. Biol. 45: 467–476, 1981; Montanelli C et al, Molecular and General Genetics 197: 209–218, 1984) and only rarely at late premeiotic stages of plant ontogeny (MNL 76: 54; Genetika (Moscow) 39: 709–712, 2003; and unpublished data).

Certain similarities of the Bg and PIF elements can also be found in the organization of their internal structures. The PIF element contains two transcription units, ORF1 and ORF2 (TPase), both having their own promoter (Zhang X et al, Genetics 166: 971–986, 2004). Bg elements also contain two large ORFs (positions 813–1547 and 3144–3959), the first one coding PPBg1 and the second being the largest exon for PPBg2 (and PPBg3). However, in the case of the Bg element, the promoter sequences can be found only for PPBg2 (see above). Another interesting similarity can be observed between the first PPBg2 intron (position 2974–3056) and the two similar introns of the PIF transposase (Zhang X et al, Genetics 166: 971–986, 2004). The length of the indicated Bg intron is the same as the mean length of the PIF introns (83 bp), and it has a similarly high content of A/T bases (73% versus 71% for PIF transposase introns). The above-mentioned D-(74)-D-(49)-E motif of PPBg3 is also very similar to the DDE motif of the PIF transposase: 74 bases between D residues (see fig. 1 in Zhang X et al, Genetics 166: 971–986, 2004) and the DD47E or DD48E spacing (Zhang X et al, Genetics 166: 971–986, 2004). In addition, the CLUSTALW analysis of PPBg1, PPBg2, Ac and the PIF transposases showed that one of the probable Bg-encoded proteins, PPBg1, is closer to the PIF than to the Ac transposase, notwithstanding that Bg and Ac are the members of the same hAT superfamily (Fig. 3a).

 

Fig. 3. Phylogenetic trees built on the basis of CLUSTALW analysis of: (a) amino acid sequences of PPBg1, PPBg2 and transposases of Ac and PIF elements (accession nos. P08770 and AAL11884.1, respectively); (b) nucleotide sequences of Ac, Bg, En/Spm, PIF elements (accession nos. X05424.1, X56877.1, M25427.1, AF412282.1, respectively).

 

Possible origin of Bg-rbg and PIF families of transposons. A possible explanation of the unexpected similarity between the Bg and PIF elements could be their origin in certain ancestral forms of transposons of the hAT and CACTA families (hereinafter referred to as anchAT and ancCACTA) as a result of the insertion of a common transposon (or closely related transposons, further referred to as ancIS(s)) not belonging to the hAT or CACTA families (Fig. 4). This insertion can explain the observed similarity between Bg and PIF as a result of the presence of a common (or close) sequence of inserted transposon(s).

Comparison between the Ac and Bg elements and comparison between the En and PIF elements may indicate that Bg retains more features in common with its anchAT progenitor than the PIF element has with ancCACTA. Besides higher similarity in their nucleotide sequences (Fig. 3b), Ac and Bg are characterized by the same length of TDS (8 bp) and by similarity in their TIR and subterminal repeats (Hartings H et al, Molecular and General Genetics 227: 91–96, 1991). In addition to the previously-indicated similarities between the En and PIF elements in their TSD and TIRs (Walker E. L. et al, Genetics 146: 681–693, 1997), a certain homology is observed between the amino acid sequence of translated PIF ORF1 (see fig. 1 in Zhang X et al, Genetics 166: 971–986, 2004) and an En/Spm element’s protein (accession no. AAA66266.1, not shown).

 

Fig. 4. Possible origin of Bg and PIF elements from ancestral transposons belonging to hAT and CACTA families (designated as anchAT and ancCACTA, respectively) as a result of insertion in their sequences of an ancestral IS-like transposon (or closely related transposons, designated as ancIS(s)).

 

The possible transposable ancIS elements inserted in the anchAT and ancCACTA transposons could be the members of the bacterial IS5 group of transposons. Participation of this group of bacterial transposons in such horizontal gene transfers indicates the similarity of their transposases with the PIF element transposase (Zhang X et al, PNAS 98: 12572–12577, 2001).

It seems that the above-demonstrated possibility of chimeric origin of transposons is not a very rare event for mobile elements. Thus, the IRMA element belonging to the En/Spm family contains sequences not homologous to the En/Spm element; moreover, these non-En/Spm sequences include a small region that shares a sequence similarity with the PIF-12 element (Walker EL et al, Genetics 146: 681–693, 1997). In this context, an interesting hypothesis can be mentioned (proposed by Malik and Eickbush) about the origin of LTR retrotransposons as a result of fusion of non-LTR retrotransposons and DNA-mediated transposons (Malik HS and Eickbush TH, Genome Research 11:1187–1197, 2001).

Another possible explanation of the above-indicated similarity between the Bg and PIF elements could be a recombination between anchAT and ancCACTA elements from one side with ancIS transposon(s) from the other. These fusion events could be provoked by a certain sequence similarity between separate domains of these elements. Such similarity could be conditioned by significant conservatism of biochemical reactions determining transposition mechanisms in transposons belonging to distant organisms.



Please Note: As is the policy with the printed version, notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors.

Return to the MNL Volume 79 Index
Return to the index of Maize Newsletters
Return to the Maize Genome Database Page