Genomic selection in synthetic populations

Müller, Dominik

Eingang zum Volltext

Müller, Dominik

Genomic selection in synthetic populations

Genomische Selektion in synthetischen Populationen

(Übersetzungstitel)

Bitte beziehen Sie sich beim Zitieren dieses Dokumentes immer auf folgende
URN: urn:nbn:de:bsz:100-opus-15654
URL: http://opus.uni-hohenheim.de/volltexte/2019/1565/

pdf-Format:

Dokument 1.pdf (562 KB)

Gedruckte Ausgabe:

Print-on-Demand-Kopie

Dokument in Google Scholar suchen:

Social Media:

Export:

Abrufstatistik:

SWD-Schlagwörter:

Genom , Auslese , Population , Prognose

Freie Schlagwörter (Deutsch):

Synthetik

Freie Schlagwörter (Englisch):

genome , selection , synthetic , population , prediction

Institut:

Institut für Pflanzenzüchtung, Saatgutforschung und Populationsgenetik

Fakultät:

Fakultät Agrarwissenschaften

DDC-Sachgruppe:

Landwirtschaft, Veterinärmedizin

Dokumentart:

Dissertation

Hauptberichter:

Melchinger, Albrecht Prof. Dr.

Sprache:

Englisch

Tag der mündlichen Prüfung:

26.03.2018

Erstellungsjahr:

2017

Publikationsdatum:

28.01.2019

Lizenz:

Dieser Inhalt ist unter einer Creative Commons-Lizenz lizenziert.

Kurzfassung auf Englisch:

The foundation of genomic selection has been laid at the beginning of this century. Since then, it has developed into a very active field of research. Although it has originally been developed in dairy cattle breeding, it rapidly attracted the attention of the plant breeding community and has, by now (2017), developed into an integral component of the breeding armamentarium of international companies. Despite its practical success, there are numerous open questions that are highly important to plant breeders. The recent development of large-scale and cost-efficient genotyping platforms was the prerequisite for the rise of genomic selection. Its functional principle is based on information shared between individuals. Genetic similarities between individuals are assessed by the use of genomic fingerprints. These similarities provide information beyond mere family relationships and allow for pooling information from phenotypic data. In practice, first a training set of phenotyped individuals has to be established and is then used to calibrate a statistical model. The model is then used to derive predictions of the genomic values for individuals lacking phenotypic information. Using these predictions can save time by accelerating the breeding program and cost by reducing resources spent for phenotyping. A large body of literature has been devoted to investigate the accuracy of genomic selection for unphenotyped individuals. However, training individuals are themselves often times selection candidates in plant breeding, and there is no conceptual obstacle to apply genomic selection to them, making use of information obtained via marker-based similarities. It is therefore also highly important to assess prediction accuracy and possibilities for its improvement in the training set. Our results demonstrated that it is possible to increase accuracy in the training set by shrinkage estimation of marker-based relationships to reduce the associated noise. The success of this approach depends on the marker density and the population structure. The potential is largest for broad-based populations and under a low marker density. Synthetic populations are produced by intermating a small number of parental components, and they have played an important role in the history of plant breeding for improving germplasm pools through recurrent selection as well as for actual varieties and research on quantitative genetics. The properties of genomic selection have so far not been assessed in synthetics. Moreover, synthetics are an ideal population type to assess the relative importance of three factors by which markers provide information about the state of alleles at QTL, namely (i) pedigree relationships, (ii) co-segregation and (ii) LD in the source germplasm. Our results show that the number of parents is a crucial factor for prediction accuracy. For a very small number of parents, prediction accuracy in a single cycle is highest and mainly determined by co-segregation between markers and QTL, whereas prediction accuracy is reduced for a larger number of parents, where the main source of information is LD within the source germplasm of the parents. Across multiple selection cycles, information from pedigree relationships rapidly vanishes, while co-segregation and ancestral LD are a stable source of information. Long-term genetic gain of genomic selection in synthetics is relatively unaffected by the number of parents, because information from co-segregation and from ancestral LD compensate for each other. Altogether, our results provide an important contribution to a better understanding of the factors underlying genomic selection, and in which cases it works and what information contributes to prediction accuracy.

Kurzfassung auf Deutsch:

Die jüngste Entwicklung von großen, kosteneffizienten Genotypisierungsplattformen stellt eine Grundvoraussetzung für den Erfolg der genomischen Selektion dar. Das funktionale Prinzip beruht auf der Ausnutzung von Informationen zwischen Individuen. Vorhandene genetische Ähnlichkeiten werden durch den genomischen Fingerabdruck erfasst. Diese Ähnlichkeiten liefern Informationen, die über die reinen Verwandschaftsverhältnisse hinausgehen und erlauben die Ausnutzung phänotypischer Daten über Individuen hinweg. In der Praxis muss zunächst ein Kalibrierungsdatensatz mit phänotypisierten Individuen erstellt werden, der zur Schätzung eines statistischen Modells dient. Dieses Model wird hernach eingesetzt, um Vorhersagen über den genomischen Wert von Individuen ohne phänotypische Daten zu treffen. Die Verwendung dieser Vorhersagen kann Zeit einsparen, indem das Zuchtprogramm beschleunigt wird, aber auch durch eine Verringerung der zur Phänotypisierung eingesetzten Ressourcen Kosten senken. Die Untersuchung der Vorhersagegenauigkeit genomischer Selektion innerhalb nicht phänotypisierter Individuen war bereits Gegenstand zahlreicher Forschungsarbeiten. Bei den Trainingsindividuen zur Kalibrierung des Modells handelt es sich in der Pflanzenzüchtung jedoch häufig ebenfalls um potentielle Selektionskandidaten und es existiert kein prinzipielles Hindernis, genomische Selektion ebenso auf diese anzuwenden und die Information von markerbasierten Ähnlichkeiten auszunutzen. Daher ist es wichtig, die Vorhersagegenauigkeit sowie deren Verbesserungsmöglichkeiten im Trainingsdatensatz zu prüfen. Unsere Ergebnisse zeigen, dass es grundsätzlich möglich ist durch Schrumpfungsschätzung von markerbasierten Verwandschaften deren Störsignale zu vermindern und die Genauigkeit im Trainingsdatensatz zu steigern. Dabei hängt der Erfolg von der Markerdichte und der Populationstruktur ab. Das Potential ist am größten für breite Populationen bei einer geringen Markerdichte. Synthetische Populationen werden durch Kreuzung einer geringen Anzahl an elterlichen Komponenten erzeugt und haben in der Geschichte der Pflanzenzüchtung eine wichtige Rolle gespielt. Dies betrifft sowohl die Verbesserung des Zuchtmaterials durch rekurrente Selektion, als auch die Erstellung von Sorten sowie die quantitativ-genetische Züchtungsforschung. Die Eigenschaften genomischer Selektion wurden bisher nicht in Synthetiks untersucht. Zudem handelt es sich bei Synthetiks um einen idealen Populationstyp, um die Bedeutung der drei Faktoren zu untersuchen, durch welche Marker Informationen über den Zustand an QTL liefern, nämlich (i) Verwandschaftsverhältnisse (ii) Kosegregation und (iii) Kopplungsphasenungleichgewicht (LD) im Zuchtmaterial. Unsere Ergebnisse zeigen, dass die Elternzahl einen entscheidenden Faktor für die Vorhersagegenauigkeit darstellt. Bei einer sehr geringen Elternzahl ist die Vorhersagegenauigkeit innerhalb eines Zyklus am größten und wird hauptsächlich durch Kosegregation zwischen Markern und QTL bestimmt. Ist die Elternzahl hingegen groß, so tritt als vornehmliche Informationsquelle LD im Ursprungsmaterial der Eltern hervor. Wird genomische Selektion über mehrere Zyklen hinweg praktiziert, so verschwindet die Information aus Verwandschaftsverhältnissen sehr schnell, wohingegen sich Kosegregation und LD als stabile Informationsquellen erweisen. Der langfristige Selektionserfolg genomischer Selektion in einem Synthetik ist nur in einem geringen Maße abhängig von der Elternzahl, da sich Informationen aus Kosegregation und LD gegenseitig aufwiegen. Insgesamt liefern unsere Ergebnisse einen wichtigen Beitrag für ein besseres Verständnis der Grundlagen der genomischen Selektion, in welchen Fällen sie Erfolg verspricht, und welche Informationen die Vorhersagegenauigkeit beeinflussen.