Tutorial
In this section, the functions in GOVS and thier parameters will be introducted in details.
GOVS
GOVS- Description One-stop function for a complete progress of genome optimization
- Usage
GOVS(hmp,ID = NULL,pheno,trait,bins,binsInfo,output,module = "DES",designInfo = NULL) - Arguments
hmpThe genetic data in hapmap format.IDA character array regarding sample IDs forhmp, if NULL, thehmpdata must involve header.phenoPhenotypic data frame, the first column describes sample IDs.traitThe names of interest trait (The trait must be included inPhenodata frame).binsResults of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.binsInfoData frame, including bins index, start, end, length of bins locus.outputThe prefix of output files.moduleCharacter represents the module combination for analysis, default "DES", "D" for genome optimization module, "E" for extraction & assembly module. "S" for statistic module. "D","E","S","DE","DES" and "ES" are alternative for different module combinations. Note that different combination need different essential inputs, details seegenomeOptimizationextractGenomestatDesign.designInfoData frame, the results of genome optimization module, it's necessary for "ES" and "S" module.
Genome optimization module
genomeOptimization- Description
Virtual genome optimization based IBD(bins) data - Usage
genomeOptimization(pheno, bins, trait, output) - Arguments
PhenoPhenotypic data frame, the first column describes sample names.binsResults of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.traitThe names of interest trait (The trait must be included inPhenodata frame).outputThe prefix of output files regarding the scheme of virtual genome.
Extraction and assembly module
extractGenome- Description
Extracting genome fragment from candidates based the results of genome optimization and then assembling all fragments so that produce optimized genome(virtual genome). - Usage
extractGenome(hmp,binInfo,ID = NULL,designInfo,output,write = F,bins,extractContent ="Genotype") - Arguments
hmpThe genetic data in hapmap format.binInfoData frame, including bins index, start, end, length of bins locus.IDA character array regarding sample IDs forhmp, if NULL, thehmpdata must involve header.designInfoOutputs of [genomeOptimization]{#genome-optimization-module}, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.writeBoolean, if write the assembled genome to file, default FALSE.binsResults of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.extractContentCharacter, the content of virtual genome, "Bin" for bin source well "Genotype" for genetic data, default "Genotype".
Statistic module
statDesign- Description Statistic summary for analysis of the contribution of all candidates to optimal genome. The results directly guide the lines selection and population improvement route.
- Usage
statDesign(designInfo,binInfo,pheno,trait,output) - Arguments
designInfoOutputs ofgenomeOptimization, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.binInfoData frame, including bins index, start, end, length of bins locus.phenoPhenotypic data frame, the first column describes sample names.traitThe names of interest trait (The trait must be included inPhenodata frame).outputThe prefix of output files regarding the summary and statistic infomation via the process of genome optimization.
Genome selection prediciton module
rrBLUP model
SNPrrBLUP- Description Genotpye-to-phenotype prediciton via ridge regression best linear unbiased prediction (rrBLUP) model. The inputs is genotypes.
- Usage
SNPrrBLUP(x,y,fix = NULL,idx1,idx2) - Arguments
xGenotypic matrix in numberic format (SeetransHapmap2numeric), row represents sample well column represents feature (SNP).yAn numeric array of phenotype.fixA matrix containing other variables as fixed effects in mixed model.idx1An array of index for training setidx2An array of index for testing (predicted) set
GBLUP model
GBLUP- Description Genotpye-to-phenotype prediciton via genomic best linear unbiased prediction (GBLUP) model. The inputs is genotypes.
- Usage
GBLUP(amat,y,idx1,idx2,fix = NULL) - Arguments
amatAdditive relationship matrix, which compute from genetic matrix.yAn numeric array of phenotype.fixA matrix containing other variables as fixed effects in mixed model.idx1An array of index for training setidx2An array of index for testing (predicted) set
Bins map construction and visualization
Construct IBD map
IBDConstruct- Description A IBD map was constructed of contributions from the parents onto the progeny lines using a hidden Markov model (HMM).
- Usage
IBDConstruct(snpParents,snpProgeny,markerInfo,q,rou,G,threshold = NULL,omit = T) - Arguments
snpParentsA matrix for the parents' genotype, lines in column and marker in row.snpProgenyAn array for the progeny' genotype, marker number must equal to snpParents.markerInfoA matrix or dataframe with four cols(marker ID, allele, chromsome and physical position) regarding genotypic information.qThe quality of sequencing, range 0 to 1 to define the quality of marker.rouCorrelations between any pairs of flanking markers, that estimated with the offspring-LD level after corrected by parent-LD level, it can be obtained by genetic location.GGenerations that the offsprings decented from the parents.thresholdThe threshold of posterior.omitWhether to omit untraceable segments, default True.
Bins map construction and visualization
Visualization of IBD map results
binsPlot- Description Visualization of IBD map results.
- Usage
binsPlot(IBDRes,color,parentInfo,parentNum) - Arguments
IBDResThe results of IBDConstruct, seeIBDConstruct.colorA named vector for defining color of parents.parentInfoA named vector for defining label of parents.parentNumThe number of parents.
Visualization of bins data
mosaicPlot- Description Visualization of overall bins data
- Usage
mosaicPlot(bins,binsInfo,chr,resolution = 500,list,parentNum = 24,color,clust = T,methods = "ward.D2",dist_method = "euclidean") - Arguments
binsResults of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.binsInfoData frame, including bins index, start, end, length of bins locus.chrWhich chromesome will be used to plot mosaic.resolutionTo set the resolution of mosaic plot, default 500.listThe names of lines to visualize mosaic plot.parentNumThe number of parent, if color not defined, this parameter is used to auto generate color palette.colorA array to define color palette.clustBoolean values determining if lines should be hclust object.methodsClustering method used.dist_methodThe distance measure to be used for clustering.
Orther useful functions
Transform character to number
transHapmap2numeric- Description This function help users to transform genetic matrix from character format to numeric format. AA-0, Aa-1, aa-2, A is major allele and a is minor allele.
- Usage
transHapmap2numeric(G) - Arguments
GGenetic matrix of character, row represents sample and column represents SNP.
Computation of GCA
gcaCompute- Description Calculate parental general combining ability (GCA) based F1 phenotypic values.
- Usage
gcaCompute(phe_df,which,trait) - Arguments
phe_dfPhenotypic data frame, row represents F1 combination and includes the paternal information in columns.whichThe column index of male or female to compute parternal or maternal GCA.traitA character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.
Computation of SCA
scaCompute- Description Calculate hybrid special combining ability (SCA) based F1 phenotypic values.
- Usage
scaCompute(phe_df,which_male,which_female,trait,seqname) - Arguments
phe_dfPhenotypic data frame, row represents F1 combination and includes the paternal information in columns.which_maleThe column index of paternal IDs.which_femaleThe column index of maternal IDs.traitA character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.seqnameA character array regarding hybrid IDs
Distribution correction
reviseFunc- Description Scale two sets of data to a uniform distribution.
- Usage
reviseFunc(ori,aim,cut = 10,sample_names) - Arguments
oriA numeric array, as reference for correction.aimA numeric array, which is the object to implement correction.cutNumber of intervals to cut, default 10.sample_namesA character array consits ofaim's name.