Tutorial

Function introduction and parameter information


In this section, the functions in GOVS and thier parameters will be introducted in details.

GOVS

  • GOVS
  • Description
    One-stop function for a complete progress of genome optimization
  • Usage
    GOVS(hmp,ID = NULL,pheno,trait,bins,binsInfo,output,module = "DES",designInfo = NULL)
  • Arguments
    • hmp The genetic data in hapmap format.
    • ID A character array regarding sample IDs for hmp, if NULL, the hmp data must involve header.
    • pheno Phenotypic data frame, the first column describes sample IDs.
    • trait The names of interest trait (The trait must be included in Pheno data frame).
    • bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
    • binsInfo Data frame, including bins index, start, end, length of bins locus.
    • output The prefix of output files.
    • module Character represents the module combination for analysis, default "DES", "D" for genome optimization module, "E" for extraction & assembly module. "S" for statistic module. "D","E","S","DE","DES" and "ES" are alternative for different module combinations. Note that different combination need different essential inputs, details see genomeOptimization extractGenome statDesign.
    • designInfo Data frame, the results of genome optimization module, it's necessary for "ES" and "S" module.

Genome optimization module

  • genomeOptimization
  • Description
    Virtual genome optimization based IBD(bins) data
  • Usage
    genomeOptimization(pheno, bins, trait, output)
  • Arguments
    • Pheno Phenotypic data frame, the first column describes sample names.
    • bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
    • trait The names of interest trait (The trait must be included in Pheno data frame).
    • output The prefix of output files regarding the scheme of virtual genome.

Extraction and assembly module

  • extractGenome
  • Description
    Extracting genome fragment from candidates based the results of genome optimization and then assembling all fragments so that produce optimized genome(virtual genome).
  • Usage
    extractGenome(hmp,binInfo,ID = NULL,designInfo,output,write = F,bins,extractContent ="Genotype")
  • Arguments
    • hmp The genetic data in hapmap format.
    • binInfo Data frame, including bins index, start, end, length of bins locus.
    • ID A character array regarding sample IDs for hmp, if NULL, the hmp data must involve header.
    • designInfo Outputs of [genomeOptimization]{#genome-optimization-module}, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.
    • write Boolean, if write the assembled genome to file, default FALSE.
    • bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
    • extractContent Character, the content of virtual genome, "Bin" for bin source well "Genotype" for genetic data, default "Genotype".

Statistic module

  • statDesign
  • Description
    Statistic summary for analysis of the contribution of all candidates to optimal genome. The results directly guide the lines selection and population improvement route.
  • Usage
    statDesign(designInfo,binInfo,pheno,trait,output)
  • Arguments
    • designInfo Outputs of genomeOptimization, a matrix consists of sample IDs regarding the fragment source among condidates at each bin locus.
    • binInfo Data frame, including bins index, start, end, length of bins locus.
    • pheno Phenotypic data frame, the first column describes sample names.
    • trait The names of interest trait (The trait must be included in Pheno data frame).
    • output The prefix of output files regarding the summary and statistic infomation via the process of genome optimization.

Genome selection prediciton module

rrBLUP model
  • SNPrrBLUP
  • Description
    Genotpye-to-phenotype prediciton via ridge regression best linear unbiased prediction (rrBLUP) model. The inputs is genotypes.
  • Usage
    SNPrrBLUP(x,y,fix = NULL,idx1,idx2)
  • Arguments
    • x Genotypic matrix in numberic format (See transHapmap2numeric), row represents sample well column represents feature (SNP).
    • y An numeric array of phenotype.
    • fix A matrix containing other variables as fixed effects in mixed model.
    • idx1 An array of index for training set
    • idx2 An array of index for testing (predicted) set
GBLUP model
  • GBLUP
  • Description
    Genotpye-to-phenotype prediciton via genomic best linear unbiased prediction (GBLUP) model. The inputs is genotypes.
  • Usage
    GBLUP(amat,y,idx1,idx2,fix = NULL)
  • Arguments
    • amat Additive relationship matrix, which compute from genetic matrix.
    • y An numeric array of phenotype.
    • fix A matrix containing other variables as fixed effects in mixed model.
    • idx1 An array of index for training set
    • idx2 An array of index for testing (predicted) set

Bins map construction and visualization

Construct IBD map
  • IBDConstruct
  • Description
    A IBD map was constructed of contributions from the parents onto the progeny lines using a hidden Markov model (HMM).
  • Usage
    IBDConstruct(snpParents,snpProgeny,markerInfo,q,rou,G,threshold = NULL,omit = T)
  • Arguments
    • snpParents A matrix for the parents' genotype, lines in column and marker in row.
    • snpProgeny An array for the progeny' genotype, marker number must equal to snpParents.
    • markerInfo A matrix or dataframe with four cols(marker ID, allele, chromsome and physical position) regarding genotypic information.
    • q The quality of sequencing, range 0 to 1 to define the quality of marker.
    • rou Correlations between any pairs of flanking markers, that estimated with the offspring-LD level after corrected by parent-LD level, it can be obtained by genetic location.
    • G Generations that the offsprings decented from the parents.
    • threshold The threshold of posterior.
    • omit Whether to omit untraceable segments, default True.

Bins map construction and visualization

Visualization of IBD map results
  • binsPlot
  • Description
    Visualization of IBD map results.
  • Usage
    binsPlot(IBDRes,color,parentInfo,parentNum)
  • Arguments
    • IBDRes The results of IBDConstruct, see IBDConstruct.
    • color A named vector for defining color of parents.
    • parentInfo A named vector for defining label of parents.
    • parentNum The number of parents.
Visualization of bins data
  • mosaicPlot
  • Description
    Visualization of overall bins data
  • Usage
    mosaicPlot(bins,binsInfo,chr,resolution = 500,list,parentNum = 24,color,clust = T,methods = "ward.D2",dist_method = "euclidean")
  • Arguments
    • bins Results of IBD analysis (bins matrix), each row represents a bin fragment as well as each column represents each sample.
    • binsInfo Data frame, including bins index, start, end, length of bins locus.
    • chr Which chromesome will be used to plot mosaic.
    • resolution To set the resolution of mosaic plot, default 500.
    • list The names of lines to visualize mosaic plot.
    • parentNum The number of parent, if color not defined, this parameter is used to auto generate color palette.
    • color A array to define color palette.
    • clust Boolean values determining if lines should be hclust object.
    • methods Clustering method used.
    • dist_method The distance measure to be used for clustering.

Orther useful functions

Transform character to number
  • transHapmap2numeric
  • Description
    This function help users to transform genetic matrix from character format to numeric format. AA-0, Aa-1, aa-2, A is major allele and a is minor allele.
  • Usage
    transHapmap2numeric(G)
  • Arguments
    • G Genetic matrix of character, row represents sample and column represents SNP.
Computation of GCA
  • gcaCompute
  • Description
    Calculate parental general combining ability (GCA) based F1 phenotypic values.
  • Usage
    gcaCompute(phe_df,which,trait)
  • Arguments
    • phe_df Phenotypic data frame, row represents F1 combination and includes the paternal information in columns.
    • which The column index of male or female to compute parternal or maternal GCA.
    • trait A character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.
Computation of SCA
  • scaCompute
  • Description
    Calculate hybrid special combining ability (SCA) based F1 phenotypic values.
  • Usage
    scaCompute(phe_df,which_male,which_female,trait,seqname)
  • Arguments
    • phe_df Phenotypic data frame, row represents F1 combination and includes the paternal information in columns.
    • which_male The column index of paternal IDs.
    • which_female The column index of maternal IDs.
    • trait A character string to define which trait GCA will be computed, this function support two or more phenotypic GCA be computed at the same time.
    • seqname A character array regarding hybrid IDs
Distribution correction
  • reviseFunc
  • Description
    Scale two sets of data to a uniform distribution.
  • Usage
    reviseFunc(ori,aim,cut = 10,sample_names)
  • Arguments
    • ori A numeric array, as reference for correction.
    • aim A numeric array, which is the object to implement correction.
    • cut Number of intervals to cut, default 10.
    • sample_names A character array consits of aim's name.