Tutorial
In this section, the functions in GOVS and thier parameters will be introducted in details.
GOVS
GOVS
- Description One-stop function for a complete progress of genome optimization
- Usage
GOVS(hmp,ID = NULL,pheno,trait,bins,binsInfo,output,module = "DES",designInfo = NULL)
- Arguments
hmp
The genetic data in hapmap format.ID
A character array regarding sample IDs forhmp
, if NULL, thehmp
data must involve header.pheno
Phenotypic data frame, the first column describes sample IDs.trait
The names of interest trait (The trait must be included inPheno
data frame).bins
Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.binsInfo
Data frame, including bins index, start, end, length of bins locus.output
The prefix of output files.module
Character represents the module combination for analysis, default "DES", "D" for genome optimization module, "E" for extraction & assembly module. "S" for statistic module. "D","E","S","DE","DES" and "ES" are alternative for different module combinations. Note that different combination need different essential inputs, details seegenomeOptimization
extractGenome
statDesign
.designInfo
Data frame, the results of genome optimization module, it's necessary for "ES" and "S" module.
Genome optimization module
genomeOptimization
- Description
Virtual genome optimization based IBD(bins) data - Usage
genomeOptimization(pheno, bins, trait, output)
- Arguments
Pheno
Phenotypic data frame, the first column describes sample names.bins
Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.trait
The names of interest trait (The trait must be included inPheno
data frame).output
The prefix of output files regarding the scheme of virtual genome.
Extraction and assembly module
extractGenome
- Description
Extracting genome fragment from candidates based the results of genome optimization and then assembling all fragments so that produce optimized genome(virtual genome). - Usage
extractGenome(hmp,binInfo,ID = NULL,designInfo,output,write = F,bins,extractContent ="Genotype")
- Arguments
hmp
The genetic data in hapmap format.binInfo
Data frame, including bins index, start, end, length of bins locus.ID
A character array regarding sample IDs forhmp
, if NULL, thehmp
data must involve header.designInfo
Outputs of [genomeOptimization
]{#genome-optimization-module}, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.write
Boolean, if write the assembled genome to file, default FALSE.bins
Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.extractContent
Character, the content of virtual genome, "Bin" for bin source well "Genotype" for genetic data, default "Genotype".
Statistic module
statDesign
- Description Statistic summary for analysis of the contribution of all candidates to optimal genome. The results directly guide the lines selection and population improvement route.
- Usage
statDesign(designInfo,binInfo,pheno,trait,output)
- Arguments
designInfo
Outputs ofgenomeOptimization
, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.binInfo
Data frame, including bins index, start, end, length of bins locus.pheno
Phenotypic data frame, the first column describes sample names.trait
The names of interest trait (The trait must be included inPheno
data frame).output
The prefix of output files regarding the summary and statistic infomation via the process of genome optimization.
Genome selection prediciton module
rrBLUP model
SNPrrBLUP
- Description Genotpye-to-phenotype prediciton via ridge regression best linear unbiased prediction (rrBLUP) model. The inputs is genotypes.
- Usage
SNPrrBLUP(x,y,fix = NULL,idx1,idx2)
- Arguments
x
Genotypic matrix in numberic format (SeetransHapmap2numeric
), row represents sample well column represents feature (SNP).y
An numeric array of phenotype.fix
A matrix containing other variables as fixed effects in mixed model.idx1
An array of index for training setidx2
An array of index for testing (predicted) set
GBLUP model
GBLUP
- Description Genotpye-to-phenotype prediciton via genomic best linear unbiased prediction (GBLUP) model. The inputs is genotypes.
- Usage
GBLUP(amat,y,idx1,idx2,fix = NULL)
- Arguments
amat
Additive relationship matrix, which compute from genetic matrix.y
An numeric array of phenotype.fix
A matrix containing other variables as fixed effects in mixed model.idx1
An array of index for training setidx2
An array of index for testing (predicted) set
Bins map construction and visualization
Construct IBD map
IBDConstruct
- Description A IBD map was constructed of contributions from the parents onto the progeny lines using a hidden Markov model (HMM).
- Usage
IBDConstruct(snpParents,snpProgeny,markerInfo,q,rou,G,threshold = NULL,omit = T)
- Arguments
snpParents
A matrix for the parents' genotype, lines in column and marker in row.snpProgeny
An array for the progeny' genotype, marker number must equal to snpParents.markerInfo
A matrix or dataframe with four cols(marker ID, allele, chromsome and physical position) regarding genotypic information.q
The quality of sequencing, range 0 to 1 to define the quality of marker.rou
Correlations between any pairs of flanking markers, that estimated with the offspring-LD level after corrected by parent-LD level, it can be obtained by genetic location.G
Generations that the offsprings decented from the parents.threshold
The threshold of posterior.omit
Whether to omit untraceable segments, default True.
Bins map construction and visualization
Visualization of IBD map results
binsPlot
- Description Visualization of IBD map results.
- Usage
binsPlot(IBDRes,color,parentInfo,parentNum)
- Arguments
IBDRes
The results of IBDConstruct, seeIBDConstruct
.color
A named vector for defining color of parents.parentInfo
A named vector for defining label of parents.parentNum
The number of parents.
Visualization of bins data
mosaicPlot
- Description Visualization of overall bins data
- Usage
mosaicPlot(bins,binsInfo,chr,resolution = 500,list,parentNum = 24,color,clust = T,methods = "ward.D2",dist_method = "euclidean")
- Arguments
bins
Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.binsInfo
Data frame, including bins index, start, end, length of bins locus.chr
Which chromesome will be used to plot mosaic.resolution
To set the resolution of mosaic plot, default 500.list
The names of lines to visualize mosaic plot.parentNum
The number of parent, if color not defined, this parameter is used to auto generate color palette.color
A array to define color palette.clust
Boolean values determining if lines should be hclust object.methods
Clustering method used.dist_method
The distance measure to be used for clustering.
Orther useful functions
Transform character to number
transHapmap2numeric
- Description This function help users to transform genetic matrix from character format to numeric format. AA-0, Aa-1, aa-2, A is major allele and a is minor allele.
- Usage
transHapmap2numeric(G)
- Arguments
G
Genetic matrix of character, row represents sample and column represents SNP.
Computation of GCA
gcaCompute
- Description Calculate parental general combining ability (GCA) based F1 phenotypic values.
- Usage
gcaCompute(phe_df,which,trait)
- Arguments
phe_df
Phenotypic data frame, row represents F1 combination and includes the paternal information in columns.which
The column index of male or female to compute parternal or maternal GCA.trait
A character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.
Computation of SCA
scaCompute
- Description Calculate hybrid special combining ability (SCA) based F1 phenotypic values.
- Usage
scaCompute(phe_df,which_male,which_female,trait,seqname)
- Arguments
phe_df
Phenotypic data frame, row represents F1 combination and includes the paternal information in columns.which_male
The column index of paternal IDs.which_female
The column index of maternal IDs.trait
A character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.seqname
A character array regarding hybrid IDs
Distribution correction
reviseFunc
- Description Scale two sets of data to a uniform distribution.
- Usage
reviseFunc(ori,aim,cut = 10,sample_names)
- Arguments
ori
A numeric array, as reference for correction.aim
A numeric array, which is the object to implement correction.cut
Number of intervals to cut, default 10.sample_names
A character array consits ofaim
's name.