Identification of EMS-Induced Causal Mutations in a Non-Reference ...

3 downloads 82 Views 484KB Size Report
Mar 6, 2011 - tween a mutant and another wild-type accession that has a sufficient .... aligned to the public data of th
Identification of EMS-Induced Causal Mutations in a Non-Reference Arabidopsis thaliana Accession by Whole Genome Sequencing Naoyuki Uchida1,3, Tomoaki Sakamoto2,3, Tetsuya Kurata2,* and Masao Tasaka1,* 1

Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, 630-0192 Japan Plant Global Education Project, Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, 630-0192 Japan 3 These authors contributed equally to this work *Corresponding authors: Masao Tasaka, E-mail, [email protected]; Fax, +81-743-72-5489; Tetsuya Kurata, E-mail, [email protected]; Fax, +81-743-72-6251 (Received January 7, 2011; Accepted March 6, 2011) 2

Techniques

The most frequently used method to identify mutations induced by a commonly used mutagen, EMS (ethyl methane sulfonate), in Arabidopsis thaliana has been map-based cloning. The first step of this method is crossing a mutant with a plant of another accession as it requires polymorphisms between accessions for linkage analysis. Therefore, to perform the method routinely, it is greatly preferred to use accession combinations between which enough polymorphisms are already known. Further, it requires laborious examination of a large number of F2 recombinants using many markers to detect each polymorphism. After linkage analysis narrows down the chromosomal region containing the causal mutation, sequencing candidate genes one by one within the region is necessary until the mutation is finally identified. Overall, this method is generally time-consuming and labor intensive, and it becomes harder when multiple loci are involved in phenotypes. A few recent reports showed that causal mutations induced by EMS could be identified by deep-sequencing technologies with less labor compared with the conventional method when mutants were generated in the Arabidopsis reference Columbia background whose genome organization is well known. Here we report that we succeeded in rapid identification of EMS-induced causal mutations in a non-reference accession background, whose whole genome sequence is not publicly available, using one round of whole genome sequencing. Moreover, in our case, we could monitor the causal locus and the transgenic reporter locus simultaneously, implying that this methodology could theoretically be applicable to analyzing even complex traits. We describe the pipeline of this methodology and discuss its characteristics. Keywords: Arabidopsis thaliana  Mutation identification  SGT1b  SNP  UNI  Whole-genome sequencing.

Abbreviations: chr, chromosome; CDS, coding sequence; EMS, ethyl methane sulfonate; NB-LRR, nucleotide-binding site-leucine-rich repeat; SNP, single nucleotide polymorphism. The output data generated by next-generation sequencing in this paper have been submitted to the DDBJ Sequence Read Archive (DRA) under the accession number DRA000344.

Introduction The most frequently used method to identify mutations induced by a commonly used mutagen, EMS (ethyl methane sulfonate), in Arabidopsis thaliana has been map-based cloning. This method starts by generating an F1 hybrid via a cross between a mutant and another wild-type accession that has a sufficient number of sequence polymorphisms to carry out linkage analysis. In order to carry out this method routinely, it is greatly preferred to use accession combinations between which a large number of polymorphisms are already known. For recessive mutants, the next step is examination of each individual that exhibits a phenotype of interest among the F2 population using many markers to detect polymorphisms. After this linkage analysis narrows down the chromosomal region containing the mutation to a relatively short interval, candidate genes within the interval are sequenced one by one using conventional Sanger sequencing until the mutation is finally identified. Overall, these processes are time-consuming and labor intensive. When multiple loci are involved in the processes of identification of the mutation, the achievement of the end result becomes much harder. Next-generation sequencing technologies that produce enormous amounts of sequence data have been opening up possibilities of novel methodologies for various aspects of plant

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029, available online at www.pcp.oxfordjournals.org ! The Author 2011. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: [email protected]

716

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029 ! The Author 2011.

Downloaded from https://academic.oup.com/pcp/article-abstract/52/4/716/1854249 by guest on 18 June 2018

SNPing causal SNPs by whole genome sequencing

research (Diamandis 2009, Lister et al. 2009, Varshney et al. 2009, Rounsley and Last 2010). One such aspect is to find causal genes responsible for phenotypes of interest; these include variation found in wild strains and mutations artificially induced by mutagenesis, although there have been a limited number of such reports. Ossowski et al. (2010) reported the de novo identification of spontaneous mutations in the Arabidopsis reference Columbia (Col) genome whose genome sequence has been reported (Arabidopsis Genome Initiative 2000) and whose genome organization is well known. Schneeberger et al. (2009) and Cuperus et al. (2010) also succeeded in identification of EMS-induced mutations responsible for phenotypes of interest in the Col background. The principle of the methodology that Schneeberger et al. and Cuperus et al. used had been originally proposed as an approach termed bulked segregant analysis (Giovannoni et al. 1991, Michelmore et al. 1991), and recently Lister et al. (2009) highlighted a modified version of this approach as an application using next-generation sequencing technologies. A mutant plant with the Col background harboring a recessive mutation responsible for a phenotype was crossed to a plant of another accession to generate F1 hybrid plants. Then, a large number of F2 plants exhibiting the mutant phenotype were pooled (Schneeberger et al. and Cuperus et al. used 500 and 93 F2 individuals, respectively) and the bulked genomes were deepsequenced. Col-type single nucleotide polymorphisms (SNPs) were significantly enriched within a specific region of the genome. The sequence data obtained by deep-sequencing in this region are next utilized for the identification of the mutation. These results show that it is now possible to use whole genome sequencing for identification of causal mutations in the reference Col accession. Recently it was also reported that a spontaneous mutation in a wild Arabidopsis accession was successfully identified using deep-sequencing (Laitinen et al. 2010). In that report, the conventional map-based cloning method with almost 1,900 F2 plants narrowed down the responsible chromosomal region to a 530 kb interval. Then, direct comparison between the mutant genome and that from the parental line within the 530 kb interval using deep-sequencing resulted in the identification of the responsible mutation, indicating that a combination of the conventional method and genome sequencing enables identification of causal mutations even in non-reference strains whose genomes have not been reported. Here we report that we succeeded in rapid identification of EMS-induced mutations responsible for phenotypes of interest in a non-reference Arabidopsis accession background, whose whole genome sequence was not publicly available, by one round of genome sequencing that achieved 6- to 9-fold genome coverage. Furthermore, in our case, we could monitor the causal locus and the transgenic reporter locus simultaneously, implying that this methodology could theoretically be applicable to mutation identification even in complex cases where phenotypes are affected by multiple loci. We will

describe the pipeline of this methodology and discuss its characteristics.

Results and Discussion The uni-1D mutant was isolated in the Arabidopsis Wassilewskija (Ws) accession and had a gain-of-function and semi-dominant mutation in the UNI gene (Igari et al. 2008). Heterozygous uni-1D/+ plants showed abnormal morphologies but were still fertile. On the other hand, homozygous uni-1D plants exhibited severe growth defects soon after germination and were lethal at the vegetative stage. To isolate suppressor mutations that recover uni-1D/+ morphological abnormalities, we mutagenized seeds from hemizygous transgenic plants harboring a uni-1D genomic fragment (hereafter, uni-1DT/+) in the Ws background, which conferred the same phenotypes as the original heterozygous uni-1D/+ plants, by EMS treatment. This transgene contains a kanamycin-resistant gene and therefore only uni-1DT/+ plants grew on plates containing kanamycin (uni-1DT/T homozygous plants are also kanamycin resistant but lethal like the original homozygous uni-1D plants). We isolated kanamycin-resistant M2 plants with wild-type morphologies as suppressor mutants. Among them, in this manuscript we focused on suppressor #1 (sup#1), and also later sup#2. sup#1 and sup#2 mutations were both recessive. We crossed sup#1 uni-1DT/+ plants with Col wild-type plants, and F2 seeds were obtained from sup#1/+ uni-1DT/+ Ws/Col F1 hybrids. We next collected 88 kanamycin-resistant F2 individuals with wild-type morphologies, all of which should be homozygous at the sup#1 locus, and isolated genomes from a pool of them. Then, we deep-sequenced (i) the Col wild-type genome; (ii) the Ws wild-type genome; and (iii) bulked genomes from a pool of 80 kanamycin-resistant F2 individuals with wild-type morphologies using the Illumina GAIIx platform. The crude sequences obtained by 75 bp sequencing were aligned to the public data of the reference Col-0 genome (TAIR9). The conditions achieved are shown in Table 1. In the Col sample (we refer to our Col line as Col-T in Table 1 and the reason for this will be discussed below), the obtained reads covered 91.7–93.1% of the total bases of each chromosome in the reference data of Col-0 (Table 1). The sequences not covered might include regions that were unable to be aligned by 75 bp short reads. On the other hand, in the Ws sample, 83.4–85.5% of the reference Col-0 sequences were covered (Table 1). The percentage coverage of the Ws sample was smaller than that of the Col sample although the coverage of the former (9.1- to 9.2-fold) was larger than that of the latter (8.4- to 8.8-fold) (Table 1). This probably showed that there were more significant degrees of difference in the genome sequences between the reference Col-0 accession and the non-reference Ws accession. Next, SNPs detected in each sample were output by CASAVA pipeline software version 1.7 (Illumina). As shown in Table 2, 3,873 SNPs were detected between the reference Col-0 and the Col line used by us.

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029 ! The Author 2011. Downloaded from https://academic.oup.com/pcp/article-abstract/52/4/716/1854249 by guest on 18 June 2018

717

N. Uchida et al.

Table 1 Summary of the achieved conditions

Table 2 Summary of detected SNPs

Sample

Chromosome

Total no. of bases useda

Mean coverageb

Percentage coveredc

Sample

Chromosome

Col-T

1 2 3 4 5

254,280,361 169,593,466 207,211,070 157,218,618 228,617,112

8.4 8.6 8.8 8.5 8.5

91.7% 92.5% 93.1% 92.6% 92.5%

Col-T

Ws

1 2 3 4 5

274,120,946 181,211,032 216,575,482 169,708,478 244,073,418

9.1 9.2 9.2 9.1 9.1

85.4% 85.1% 83.4% 85.5% 85.0%

1 2 3 4 5 All

800 613 758 696 1,006 3,873

124 69 92 165 88 538

(15.5%) (11.3%) (12.1%) (23.7%) (8.7%) (13.9%)

676 544 666 531 918 3,335

(84.5%) (88.7%) (87.9%) (76.3%) (91.3%) (86.1%)

Ws

1 2 3 4 5

194,722,560 130,641,216 157,946,109 116,171,556 176,423,308

6.4 6.6 6.7 6.3 6.5

85.0% 86.4% 85.8% 81.3% 86.3%

1 2 3 4 5 All

99,549 70,091 86,625 54,656 96,909 40,7830

92,488 62,530 78,276 48,410 89,597 371,301

(92.9%) (89.2%) (90.4%) (88.6%) (92.5%) (91.0%)

7,061 7,561 8,349 6,246 7,312 36,529

(7.1%) (10.8%) (9.6%) (11.4%) (7.5%) (9.0%)

sup#1 F2

1 2 3 4 5

252,287,020 169,744,221 206,081,879 151,334,279 227,919,213

8.3 8.6 8.8 8.1 8.4

88.9% 90.4% 90.3% 85.1% 90.1%

1 2 3 4 5 All

66,035 38,349 53,414 44,311 56,946 259,055

14,470 3,645 7,323 32,142 6,435 64,015

(21.9%) (9.5%) (13.7%) (72.5%) (11.3%) (24.7%)

51,565 34,704 46,091 12,169 50,511 195,040

(78.1%) (90.5%) (86.3%) (27.5%) (88.7%) (75.3%)

sup#2 F2

1 2 3 4 5 All

78,534 48,227 63,128 48,045 70851 308785

14,195 3,401 5,410 34,757 6,216 63,979

(18.1%) (7.1%) (8.6%) (72.3%) (8.8%) (20.7%)

64,339 44,826 57,718 13,288 64,635 24,4806

(81.9%) (92.9%) (91.4%) (27.7%) (91.2%) (79.3%)

sup#1 F2

sup#2 F2

a

Total bases that were used for SNP calling among the obtained reads. Value of the total bases useda per the length of each chromosome in the reference database of Col-0 (TAIR9). c Percentage of the bases that were covered with the obtained reads among the total bases of each chromosome in the reference TAIR9 database. b

These SNPs might indicate spontaneous mutations that have occurred during generations since we started using the line about 20 years ago. Therefore, we referred to our Col line as Col-Tasaka (Col-T). Most of the SNPs in all chromosomes were heterozygous (86.1%), suggesting that these SNPs possibly occurred relatively recently. On the other hand, 407,830 SNPs were detected between the reference Col-0 and Ws, and most of them in all chromosomes were homozygous (91%) (Table 2), showing there were a large number of fixed polymorphisms between the reference Col-0 and Ws. The result of the SNP analysis using bulked genomes of sup#1 F2 recombinants showed that SNPs detected on chromosomes (chrs) 1, 2, 3 and 5 were mostly heterozygous (78.1, 90.5, 86.3 and 88.7%, respectively), but the majority of SNPs detected on chr 4 were homozygous (72.5%) (Table 2). These data indicated enrichment of Ws-type homozygous SNPs on chr 4 with a strong linkage disequilibrium, suggesting that the sup#1 mutation was linked to chr 4. We noticed that chr 1 displayed slight enrichment of homozygous SNPs (21.9%) compared with chrs 2, 3 and 5 (9.5, 13.7 and 11.3%, respectively) (Table 2). This was probably because the kanamycin-resistant uni-1DT transgene was located in the southern part of chr 1, which we had previously analyzed (also see Fig. 1B later) and because the kanamaycin-resistant F2 population should harbor the uni-1DT transgene (either hemizygously or possibly homozygously). Because the sup#1 mutation was expected to be located on chr 4, we next examined which region of chr 4 displayed the most significant enrichment of Ws-type homozygous SNPs.

718

Homozygous SNPs

Heterozygous SNPs

To this end, chr 4 was divided into 500 kb intervals and the ratios of homozygous SNPs to heterozygous SNPs in each 500 kb interval were plotted (Fig. 1A). A peak was observed at the 8,000 k–8,500 k interval, suggesting that the sup#1 mutation was located around there. As chr 1 also displayed slight enrichment of homozygous SNPs (Table 2), we drew the same type of graph for chr 1. As expected, a peak was observed in the southern part (Fig. 1B), in agreement with the uni-1DT transgene being located there. Next, to estimate the region containing the sup#1 mutation more precisely, the 6,000 k–10,000 k region of chr 4 was divided into 50 kb intervals and enrichment of homozygous SNPs in each 50 kb interval was plotted (Fig. 1C). Relatively high peaks were detected between 6,850 k and 9,100 k, and thus we decided to focus on the 6,800 k–9,150 k region that contains all high peaks. We next attempted to identify the sup#1 mutation among 32,142 Ws-type homozygous SNPs that were detected on chr 4 in the sample of bulked genomes of sup#1 F2 recombinants (Table 2). First of all, we removed the SNPs that were detected in Col-T and Ws genomes from the 32,142 SNPs. After the procedure, 2,278 SNPs remained as candidates for the causal mutation (Fig. 2A). Next, among 2,278 SNPs, we extracted the SNPs that were located in CDSs (coding sequences; regions of nucleotides that correspond to sequences of amino acids in predicted proteins) or intron donor/acceptor sites, those that were located in the 6,800 k–9,150 k interval and those that showed the canonical EMS-induced G-to-A or C-to-T nucleotide changes (Fig. 2B). Forty-two SNPs passed these three filters

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029 ! The Author 2011.

Downloaded from https://academic.oup.com/pcp/article-abstract/52/4/716/1854249 by guest on 18 June 2018

Total SNPs

SNPing causal SNPs by whole genome sequencing

(Fig. 2B), but this number was still too large to identify the causal mutation among them. To reduce the number further, we also analyzed another mutant, sup#2, which also suppressed morphological abnormalities of uni-1D/+ plants. We applied the same procedures as we had carried out for analysis of sup#1 candidate SNPs to sup#2 using bulked genomes of 88 F2 individuals. The sup#2 mutation was strongly linked to chr 4 (Table 2) and, as shown in Fig. 2D, the graph displaying enrichment of homozygous SNPs was very similar to that in the case of sup#1 (Fig. 2A), implying that the causal mutations in sup#1 and sup#2 might exist in the same locus. When the SNPs in Col-T and Ws data were removed from 34,757 homozygous SNPs on chr 4 in the sample of bulked genomes of sup#2 F2 recombinants (Table 2), 2,443 SNPs were left as sup#2 candidates, as shown in Fig. 2A. Among them, 52 SNPs passed the three filters (Fig. 2C). During the above procedures, we noticed that there were a non-negligible number of identical SNPs (828 SNPs) between data from sup#1 F2 and sup#2 F2 samples even after parental

chr 4 (sup#1) 500k interval

A

B

SNPs, which were detected in Col-T and Ws wild-type samples, were removed (Fig. 2D). This was probably because deepsequencing missed detecting some portion of SNPs that existed in the genomes of Ws and/or Col-T plants, resulting in incomplete removal of ‘background’ SNPs from SNPs of sup#1 and sup#2 candidates. Therefore, we further removed such remaining ‘background’ SNPs that were commonly detected in sup#1 and sup#2 samples, and, as a result, 24 and 34 SNPs were left as sup#1 and sup#2 candidates, respectively (Fig. 2E). Because it was highly expected that the causal mutations in sup#1 and sup#2 would exist in the same locus (Fig. 1A, D), we next extracted the SNPs that were located within common genes. The number of SNPs that passed through all of these filters was just one in each case (Fig. 2E). The remaining SNPs were found in At4g11260 encoding the SGT1b gene.

A

chr 1 (sup#1) 500k interval

0.8 0.6

10

0.4

5

0.2

0

0

121

chr 4 (sup#1) 50k interval

C

D 25

100 9050k–9100k

75

sup#2

C

among 2443

2 80

1

364

143

42 52

2 93

392

52 152

87

139

30000k

chr 4 (sup#2) 500k interval

402

458

3

3

1 Within CDSs & intron donor/acceptor sites

8000k–8500k

2 Within the 6800k–9150k interval

20

3 EMS-type nucleotide substitutions

6850k–6900k

15

E

D

50 10 25

sup#1

sup#2

2278

2443

0

sup#1

sup#2

42

52

Removal of common SNPs

5

0 6000k

2443

among 2278 1

18600k

34757

sup#1

B 15

32142

2278

8000k–8500k

20

sup#2

Removal of Col-T & Ws SNPs

1

25

sup#1

24

18600k

1450

10000k

Fig. 1 Enrichment of homozygous Ws-type SNPs. The ratios of homozygous SNPs to heterozygous SNPs were plotted. Chromosomes were divided into 500 kb intervals (A, B and D) or 50 kb intervals (C). The cases of sup#1 and sup#2 are shown in A–C and D, respectively. Full-length chromosomes are shown in A, B and D, and the 6,000 k–10,000 k region of chr 4 is shown in C. Green bars beneath the graphs indicate the chromosomes. Filled arrowheads indicates centromeres and the unfilled arrowhead points to the insertion site of the uni-1DT transgene.

828

34

1615 Within common genes 1

1

Fig. 2 SNP filtering to identify causal SNPs. (A) and (E) The numbers of SNPs before and after each filtering are shown. (B) and (C) Venn diagrams showing the numbers of SNPs that were extracted by each filter in the case of sup#1 (B) and sup#2 (C). (D) A Venn diagram showing the numbers of SNPs that overlapped between sup#1 and sup#2 candidates.

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029 ! The Author 2011. Downloaded from https://academic.oup.com/pcp/article-abstract/52/4/716/1854249 by guest on 18 June 2018

719

N. Uchida et al.

The SNP found in sup#1 changed the cysteine residue at the 221st position to tyrosine and that in sup#2 was located at the splicing acceptor site of the fifth intron. The C221-to-Y amino acid change was previously reported to cause loss of function of SGT1b proteins (Boter et al. 2007) and the nucleotide substitution at the splicing acceptor site was identical to the sgt1b-2 mutation, which was also shown to be a loss-of-function mutation (Austin et al. 2002). SGT1b is a core member of the chaperone complex that plays a significant role for NB-LRR (nucleotide-binding site-leucine-rich repeat)-type proteins to exert their proper functions (Austin et al. 2002, Azevedo et al. 2002, Hubert et al. 2003, Zhang et al. 2010) and lossof-function mutations in the SGT1b gene render NB-LRR proteins non-functional (Austin et al. 2002, Azevedo et al. 2002, Tor et al. 2002, Hubert et al. 2003, Azevedo et al. 2006, Boter et al. 2007, Stuttmann et al. 2008, Zhou et al. 2008, Yang et al. 2010). It was previously shown that UNI had an NB-LRR structure (Igari et al. 2008) and that a loss-of-function mutation in the RAR1 gene, whose protein product is also a necessary component of the chaperone complex containing SGT1b (Austin et al. 2002, Azevedo et al. 2002, Hubert et al. 2003, Zhang et al. 2010), suppressed abnormal morphologies of uni-1D/+ plants (Igari et al. 2008). Taken together, it is suggested that the chaperone complex containing RAR1 and SGT1b is essential for proper functions of UNI proteins. Below we describe the characteristics of the methodology we used in this report. First one round of deep-sequencing with the Illumina GAIIx platform is sufficient for mutation identification in the case of the Arabidopsis mutant, indicating that all procedures finish as early as 8–9 d after genomic DNA libraries for deep-sequencing are prepared. In particular no work is required for the first 7 d during which a sequencer is operating. Any time-consuming and labor-intensive work such as producing a lot of PCR-based markers to detect each polymorphism and one-by-one sequencing of candidate genes is not necessary. Secondly, it is implied that this methodology could be theoretically applicable to mutation identification even in complex cases where phenotypes are affected by multiple loci. We succeeded in monitoring at least two peaks corresponding to the sup#1 locus and the uni-1DT transgene locus at the same time by linkage analysis (Fig. 1A, B). Therefore, for example, simultaneous identification of two causal SNPs of a mutant in which the functions of two distinct genes are disrupted by sepaprate as yet unidentified SNPs would be possible in practice. Thirdly, backcrosses to remove ‘background’ mutations, which are induced by EMS treatment but unrelated to phenotypes of interest, might not be required. Because chromosomal regions near the causal mutation would not normally be expected to have crossovers during only a few backcrosses, EMS-induced but unrelated SNPs near the causal SNP would not be removed by normal backcrosses. Thus, backcrossing might not make practical sense for mutation identification by our methodology. Fourthly, to use multiple allelic mutants significantly helps mutation identification. We observed a non-negligible number of identical SNPs between data from sup#1 F2 and sup#2

720

F2 samples even after parental SNPs, which were detected in Col-T and Ws samples, were removed (Fig. 2D). This was probably because deep-sequencing missed the detection of some portion of SNPs that existed in Col-T and Ws genomes, resulting in incomplete removal of parental ‘background’ SNPs from the data of F2 samples. These unremoved SNPs would be troublesome for identification of causal mutations especially when non-reference accessions are used as the parental lines for mutagenesis. This presumption is supported by the fact that we detected only 3,873 SNPs in the Col-T genome but as many as 407,830 SNPs in the Ws genome against the reference Col-0 genome (Table 2). If it were assumed that deep-sequencing would miss detecting the same percentage of SNPs in each sample, the number of such ‘noisy’ SNPs would be about 100-fold (407,830 vs. 3,873) larger in the Ws sample than in the Col-T sample. In our case, 828 SNPs were detected as ‘noisy’ SNPs on chr 4 in sup#1 F2 and sup#2 F2 samples (Fig. 2D). This ‘noise’ would be reduced to as few as about 8 (828/100) if the reference Col accession, not the nonreference Ws accession, were a parent for mutagenesis. Increasing the amount of coverage of genome sequencing might contribute to solving the ‘noise’ problem to some degree. However, even by doing so, complete removal of the ‘noise’ would not be expected. Therefore, a practical way to solve the problem would be to use multiple allelic mutants. If independent ‘meaningful’ mutations were found within the same gene in multiple allelic mutants, the mutations would most probably be the causal SNPs. Fig. 3 shows the pipeline we propose for identification of EMS-induced phenotypic mutations in a non-reference Arabidopsis accession by whole genome sequencing. In the SNP filtering step, decisions to use or not to use each filter, addition of another filter and the order of filters might depend on each individual case. We believe that this paper will provide significant knowledge not only for Arabidopsis researchers who would like to try ‘snipping (SNPing)’ causal SNPs but also for a number of plant researchers who hope to use whole genome sequencing for analysis of various types of phenotypic traits even in non-model plants especially when combined with further development of deep-sequencing technologies in the future.

Materials and Methods Plant materials Transgenic Arabidopsis plants harboring the kanamycin-resistant uni-1DT transgene in the Ws background were described previously (Igari et al. 2008). uni-1DT transgenics were selected by 15 mg ml 1 kanamycin. Seeds from uni-1DT transgenics were mutagenized in 0.4% EMS (Sigma-Aldrich) at room temperature for 10 h. M1 plants exhibiting the uni-1D/+ phenotype were grown and their seeds were harvested individually. Among M2 plants grown in the presence of kanamycin, plants displaying wild-type morphologies were selected as

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029 ! The Author 2011.

Downloaded from https://academic.oup.com/pcp/article-abstract/52/4/716/1854249 by guest on 18 June 2018

SNPing causal SNPs by whole genome sequencing

the coverage should be 3. These steps were performed with the default parameters of CASAVA.

Analysis of SNP data To make lists of SNPs located within CDSs and intron donor/ acceptor sites, we wrote perl script. This script extracts SNPs based on information of CDSs and intron donor/accepter sites made from TAIR9 annotation data and outputs lists of SNPs within these regions. Filtering using the output SNP lists, calculations and drawing of graphs were carried out using Microsoft Excel.

Funding

Fig. 3 Pipeline of the methodology described in this manuscript. See text for the explanation.

This work was supported by the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) [Grant-in-Aid for Scientific Research on Priority Areas (19060007), Grant-in-Aid for Scientific Research (B) (22370019) and Grant-in-Aid for Exploratory Research (22657015) to M.T., and Grant-in-Aid for Young Scientists (B) (22770038) to N.U.]; [Sumitomo Foundation to N.U.].

Acknowledgments suppressor mutants. M3 plants were backcrossed to wild-type Ws plants to check their genetic characteristics such as the penetrance rate of phenotypes and mode of inheritance. sup#1 and sup#2 described in this manuscript were recessive mutants with complete penetrance.

Library preparation for genome sequencing Genomic DNAs were isolated using a Plant DNeasy mini kit (Qiagen) from the nuclear fraction that was prepared using the ‘Semi-pure Preparation of Nuclei Procedures’ of the CelLytic PN Isolation/Extraction Kit (Sigma-Aldrich). Isolated genomic DNAs were sheared using Covaris S2 (Covaris) at 100 bp setting. Library preparation for deep-sequencing from sheared DNAs was performed using NEBNext DNA Sample Prep Master Mix Set 1 (New England BioLabs) and a Genomic DNA Sample Prep Oligo Only Kit (Illumina).

Deep-sequencing using illumina technologies and output of SNPs Prepared libraries were deep-sequenced using an Illumina Genome Analyzer IIx. Then 75 bp sequencing was carried out. Heterozygous SNPs and homozygous SNPs against the TAIR9 reference genome sequence of Col-0 were output by CASAVA pipeline software version 1.7 (Illumina). Calling of SNPs by CASAVA consists of two steps. First the allele call scores were calculated based on base calls, alignment and read quality scores, and then SNPs were called based on the allele call score and read depth; the allele call score should be 10, and

We would like to thank Drs. Taku Ohshima, Kensuke Nakamura and Naotake Ogasawara (NAIST) for valuable comments and discussions about sequencing and data analysis. We also thank Eiko Nakamoto for sequencing operation and Eriko Tanaka for technical assistance.

References Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815. Austin, M.J., Muskett, P., Kahn, K., Feys, B.J., Jones, J.D. and Parker, J.E. (2002) Regulatory role of SGT1 in early R gene-mediated plant defenses. Science 295: 2077–2080. Azevedo, C., Betsuyaku, S., Peart, J., Takahashi, A., Noel, L., Sadanandom, A. et al. (2006) Role of SGT1 in resistance protein accumulation in plant immunity. EMBO J. 25: 2007–2016. Azevedo, C., Sadanandom, A., Kitagawa, K., Freialdenhoven, A., Shirasu, K. and Schulze-Lefert, P. (2002) The RAR1 interactor SGT1, an essential component of R gene-triggered disease resistance. Science 295: 2073–2076. Boter, M., Amigues, B., Peart, J., Breuer, C., Kadota, Y., Casais, C. et al. (2007) Structural and functional analysis of SGT1 reveals that its interaction with HSP90 is required for the accumulation of Rx, an R protein involved in plant immunity. Plant Cell 19: 3791–3804. Cuperus, J.T., Montgomery, T.A., Fahlgren, N., Burke, R.T., Townsend, T., Sullivan, C.M. et al. (2010) Identification of MIR390a precursor processing-defective mutants in Arabidopsis by direct genome sequencing. Proc. Natl Acad. Sci. USA 107: 466–471.

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029 ! The Author 2011. Downloaded from https://academic.oup.com/pcp/article-abstract/52/4/716/1854249 by guest on 18 June 2018

721

N. Uchida et al.

Diamandis, E.P. (2009) Next-generation sequencing: a new revolution in molecular diagnostics?. Clin. Chem. 55: 2088–2092. Giovannoni, J.J., Wing, R.A., Ganal, M.W. and Tanksley, S.D. (1991) Isolation of molecular markers from specific chromosomal intervals using DNA pools from existing mapping populations. Nucleic Acids Res. 19: 6553–6558. Hubert, D.A., Tornero, P., Belkhadir, Y., Krishna, P., Takahashi, A., Shirasu, K. et al. (2003) Cytosolic HSP90 associates with and modulates the Arabidopsis RPM1 disease resistance protein. EMBO J. 22: 5679–5689. Igari, K., Endo, S., Hibara, K., Aida, M., Sakakibara, H., Kawasaki, T. et al. (2008) Constitutive activation of a CC-NB-LRR protein alters morphogenesis through the cytokinin pathway in Arabidopsis. Plant J. 55: 14–27. Laitinen, R.A., Schneeberger, K., Jelly, N.S., Ossowski, S. and Weigel, D. (2010) Identification of a spontaneous frame shift mutation in a nonreference Arabidopsis accession using whole genome sequencing. Plant Physiol. 153: 652–654. Lister, R., Gregory, B.D. and Ecker, J.R. (2009) Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr. Opin. Plant Biol. 12: 107–118. Michelmore, R.W., Paran, I. and Kesseli, R.V. (1991) Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proc. Natl Acad. Sci. USA 88: 9828–9832. Ossowski, S., Schneeberger, K., Lucas-Lledo, J.I., Warthmann, N., Clark, R.M., Shaw, R.G. et al. (2010) The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327: 92–94.

722

Rounsley, S.D. and Last, R.L. (2010) Shotguns and SNPs: how fast and cheap sequencing is revolutionizing plant biology. Plant J. 61: 922–927. Schneeberger, K., Ossowski, S., Lanz, C., Juul, T., Petersen, A.H., Nielsen, K.L. et al. (2009) SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat. Methods 6: 550–551. Stuttmann, J., Parker, J.E. and Noel, L.D. (2008) Staying in the fold: the SGT1/chaperone machinery in maintenance and evolution of leucine-rich repeat proteins. Plant Signal. Behav. 3: 283–285. Tor, M., Gordon, P., Cuzick, A., Eulgem, T., Sinapidou, E., Mert-Turk, F. et al. (2002) Arabidopsis SGT1b is required for defense signaling conferred by several downy mildew resistance genes. Plant Cell 14: 993–1003. Varshney, R.K., Nayak, S.N., May, G.D. and Jackson, S.A. (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol. 27: 522–530. Yang, H., Shi, Y., Liu, J., Guo, L., Zhang, X. and Yang, S. (2010) A mutant CHS3 protein with TIR-NB-LRR-LIM domains modulates growth, cell death and freezing tolerance in a temperature-dependent manner in Arabidopsis. Plant J. 63: 283–296. Zhang, M., Kadota, Y., Prodromou, C., Shirasu, K. and Pearl, L.H. (2010) Structural basis for assembly of Hsp90–Sgt1–CHORD protein complexes: implications for chaperoning of NLR innate immunity receptors. Mol. Cell 39: 269–281. Zhou, F., Mosher, S., Tian, M., Sassi, G., Parker, J. and Klessig, D.F. (2008) The Arabidopsis gain-of-function mutant ssi4 requires RAR1 and SGT1b differentially for defense activation and morphological alterations. Mol. Plant-Microbe Interact. 21: 40–49.

Plant Cell Physiol. 52(4): 716–722 (2011) doi:10.1093/pcp/pcr029 ! The Author 2011.

Downloaded from https://academic.oup.com/pcp/article-abstract/52/4/716/1854249 by guest on 18 June 2018