Tutorial

Function introduction and parameter information

In this section, the functions in GOVS and thier parameters will be introducted in details.

GOVS

GOVS
Description
One-stop function for a complete progress of genome optimization
Usage
GOVS(hmp,ID = NULL,pheno,trait,bins,binsInfo,output,module = "DES",designInfo = NULL)
Arguments
- hmp The genetic data in hapmap format.
- ID A character array regarding sample IDs for hmp, if NULL, the hmp data must involve header.
- pheno Phenotypic data frame, the first column describes sample IDs.
- trait The names of interest trait (The trait must be included in Pheno data frame).
- bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
- binsInfo Data frame, including bins index, start, end, length of bins locus.
- output The prefix of output files.
- module Character represents the module combination for analysis, default "DES", "D" for genome optimization module, "E" for extraction & assembly module. "S" for statistic module. "D","E","S","DE","DES" and "ES" are alternative for different module combinations. Note that different combination need different essential inputs, details see genomeOptimization extractGenome statDesign.
- designInfo Data frame, the results of genome optimization module, it's necessary for "ES" and "S" module.

Genome optimization module

genomeOptimization
Description
Virtual genome optimization based IBD(bins) data
Usage
genomeOptimization(pheno, bins, trait, output)
Arguments
- Pheno Phenotypic data frame, the first column describes sample names.
- bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
- trait The names of interest trait (The trait must be included in Pheno data frame).
- output The prefix of output files regarding the scheme of virtual genome.

Extraction and assembly module

extractGenome
Description
Extracting genome fragment from candidates based the results of genome optimization and then assembling all fragments so that produce optimized genome(virtual genome).
Usage
extractGenome(hmp,binInfo,ID = NULL,designInfo,output,write = F,bins,extractContent ="Genotype")
Arguments
- hmp The genetic data in hapmap format.
- binInfo Data frame, including bins index, start, end, length of bins locus.
- ID A character array regarding sample IDs for hmp, if NULL, the hmp data must involve header.
- designInfo Outputs of [genomeOptimization]{#genome-optimization-module}, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.
- write Boolean, if write the assembled genome to file, default FALSE.
- bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
- extractContent Character, the content of virtual genome, "Bin" for bin source well "Genotype" for genetic data, default "Genotype".

Statistic module

statDesign
Description
Statistic summary for analysis of the contribution of all candidates to optimal genome. The results directly guide the lines selection and population improvement route.
Usage
statDesign(designInfo,binInfo,pheno,trait,output)
Arguments
- designInfo Outputs of genomeOptimization, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.
- binInfo Data frame, including bins index, start, end, length of bins locus.
- pheno Phenotypic data frame, the first column describes sample names.
- trait The names of interest trait (The trait must be included in Pheno data frame).
- output The prefix of output files regarding the summary and statistic infomation via the process of genome optimization.

Genome selection prediciton module

rrBLUP model

SNPrrBLUP
Description
Genotpye-to-phenotype prediciton via ridge regression best linear unbiased prediction (rrBLUP) model. The inputs is genotypes.
Usage
SNPrrBLUP(x,y,fix = NULL,idx1,idx2)
Arguments
- x Genotypic matrix in numberic format (See transHapmap2numeric), row represents sample well column represents feature (SNP).
- y An numeric array of phenotype.
- fix A matrix containing other variables as fixed effects in mixed model.
- idx1 An array of index for training set
- idx2 An array of index for testing (predicted) set

GBLUP model

GBLUP
Description
Genotpye-to-phenotype prediciton via genomic best linear unbiased prediction (GBLUP) model. The inputs is genotypes.
Usage
GBLUP(amat,y,idx1,idx2,fix = NULL)
Arguments
- amat Additive relationship matrix, which compute from genetic matrix.
- y An numeric array of phenotype.
- fix A matrix containing other variables as fixed effects in mixed model.
- idx1 An array of index for training set
- idx2 An array of index for testing (predicted) set

Bins map construction and visualization

Construct IBD map

IBDConstruct
Description
A IBD map was constructed of contributions from the parents onto the progeny lines using a hidden Markov model (HMM).
Usage
IBDConstruct(snpParents,snpProgeny,markerInfo,q,rou,G,threshold = NULL,omit = T)
Arguments
- snpParents A matrix for the parents' genotype, lines in column and marker in row.
- snpProgeny An array for the progeny' genotype, marker number must equal to snpParents.
- markerInfo A matrix or dataframe with four cols(marker ID, allele, chromsome and physical position) regarding genotypic information.
- q The quality of sequencing, range 0 to 1 to define the quality of marker.
- rou Correlations between any pairs of flanking markers, that estimated with the offspring-LD level after corrected by parent-LD level, it can be obtained by genetic location.
- G Generations that the offsprings decented from the parents.
- threshold The threshold of posterior.
- omit Whether to omit untraceable segments, default True.

Bins map construction and visualization

Visualization of IBD map results

binsPlot
Description
Visualization of IBD map results.
Usage
binsPlot(IBDRes,color,parentInfo,parentNum)
Arguments
- IBDRes The results of IBDConstruct, see IBDConstruct.
- color A named vector for defining color of parents.
- parentInfo A named vector for defining label of parents.
- parentNum The number of parents.

Visualization of bins data

mosaicPlot
Description
Visualization of overall bins data
Usage
mosaicPlot(bins,binsInfo,chr,resolution = 500,list,parentNum = 24,color,clust = T,methods = "ward.D2",dist_method = "euclidean")
Arguments
- bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
- binsInfo Data frame, including bins index, start, end, length of bins locus.
- chr Which chromesome will be used to plot mosaic.
- resolution To set the resolution of mosaic plot, default 500.
- list The names of lines to visualize mosaic plot.
- parentNum The number of parent, if color not defined, this parameter is used to auto generate color palette.
- color A array to define color palette.
- clust Boolean values determining if lines should be hclust object.
- methods Clustering method used.
- dist_method The distance measure to be used for clustering.

Orther useful functions

Transform character to number

transHapmap2numeric
Description
This function help users to transform genetic matrix from character format to numeric format. AA-0, Aa-1, aa-2, A is major allele and a is minor allele.
Usage
transHapmap2numeric(G)
Arguments
- G Genetic matrix of character, row represents sample and column represents SNP.

Computation of GCA

gcaCompute
Description
Calculate parental general combining ability (GCA) based F1 phenotypic values.
Usage
gcaCompute(phe_df,which,trait)
Arguments
- phe_df Phenotypic data frame, row represents F1 combination and includes the paternal information in columns.
- which The column index of male or female to compute parternal or maternal GCA.
- trait A character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.

Computation of SCA

scaCompute
Description
Calculate hybrid special combining ability (SCA) based F1 phenotypic values.
Usage
scaCompute(phe_df,which_male,which_female,trait,seqname)
Arguments
- phe_df Phenotypic data frame, row represents F1 combination and includes the paternal information in columns.
- which_male The column index of paternal IDs.
- which_female The column index of maternal IDs.
- trait A character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.
- seqname A character array regarding hybrid IDs

Distribution correction

reviseFunc
Description
Scale two sets of data to a uniform distribution.
Usage
reviseFunc(ori,aim,cut = 10,sample_names)
Arguments
- ori A numeric array, as reference for correction.
- aim A numeric array, which is the object to implement correction.
- cut Number of intervals to cut, default 10.
- sample_names A character array consits of aim's name.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search