Title: | Fit Hundreds of Theoretical Distributions to Empirical Data |
---|---|
Description: | Systematic fit of hundreds of theoretical univariate distributions to empirical data via maximum likelihood estimation. Fits are reported and summarized by a data.frame, a csv file or a 'shiny' app (here with additional features like visual representation of fits). All output formats provide assessment of goodness-of-fit by the following methods: Kolmogorov-Smirnov test, Shapiro-Wilks test, Anderson-Darling test. |
Authors: | Markus Boenn |
Maintainer: | Markus Boenn <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.0 |
Built: | 2024-11-20 02:51:30 UTC |
Source: | https://github.com/cran/fitteR |
Calculates the cumulative density of a set of numeric values.
ecdf2(x, y = NULL)
ecdf2(x, y = NULL)
x |
A numeric vector of which the ECDF should be calculated |
y |
A numeric vector. See details for explanation |
This function extends the functionality of of the standard implementation of ECDF. Sometimes it is desireable to get the ECDF from pre-tabulated values. For this, elements in x and y have to be linked to each other.
A list
ecdf
for the standard implementation of ECDF
x <- rnorm(1000) e <- ecdf2(x) str(e) plot(e) plot(e$x, e$cs) x <- sample(1:100, 1000, replace=TRUE) plot(ecdf2(x)) tab <- table(x) x <- unique(x) lines(ecdf2(x, y=tab), col="green")
x <- rnorm(1000) e <- ecdf2(x) str(e) plot(e) plot(e$x, e$cs) x <- sample(1:100, 1000, replace=TRUE) plot(ecdf2(x)) tab <- table(x) x <- unique(x) lines(ecdf2(x, y=tab), col="green")
Fits theoretical univariate distributions from the R universe to a given set of empirical observations
fitter( X, dom = "discrete", freq = NULL, R = 100, timeout = 5, posList = NULL, fast = TRUE )
fitter( X, dom = "discrete", freq = NULL, R = 100, timeout = 5, posList = NULL, fast = TRUE )
X |
A numeric vector |
dom |
A string specifying the domain of ‘X’ |
freq |
The frequency of values in ‘X’. See details. |
R |
An integer specifying the number of bootstraps. See details. |
timeout |
An numeric value specifying the maximum time spend for a fit |
posList |
A list. See details. |
fast |
A logical. See details. |
This routine is the workhorse of the package. It takes empirical data and systematically tries to fit numerous distributions implemented in R packages to this data.
Sometimes the empirical data is passed as a histogram. In this case ‘X’ takes the support and ‘freq’ takes the number of occurences of each value in ‘X’. Although not limited to, this makes most sense for discrete data.
If there is prior knowledge (or guessing) about candidate theoretical distributions, these can be specified by ‘posList’. This parameter takes a list with names of items being the package name and items being a character vector containing names of the distribtions (with prefix 'd'). If all distributions of a package should be applied, this vector is set to NA
.
Fitting of some distributions can be very slow. They can be skipped if ‘fast’ is set to TRUE
.
A list serving as an unformatted report summarizing the fitting.
To reduce the computational efforts, usage of the parameter ‘posList’ is recommended. If not specified, the function will try to perform fits to distributions from _ALL_ packages listed in supported.packages
.
Markus Boenn
printReport
for post-processing of all fits
# continous empirical data x <- rnorm(1000, 50, 3) if(requireNamespace("ExtDist")){ r <- fitter(x, dom="c", posList=list(stats=c("dexp"), ExtDist=c("dCauchy"))) }else{ r <- fitter(x, dom="c", posList=list(stats=c("dexp", "dt"))) } # discrete empirical data x <- rnbinom(100, 0.5, 0.2) r <- fitter(x, dom="dis", posList=list(stats=NA))
# continous empirical data x <- rnorm(1000, 50, 3) if(requireNamespace("ExtDist")){ r <- fitter(x, dom="c", posList=list(stats=c("dexp"), ExtDist=c("dCauchy"))) }else{ r <- fitter(x, dom="c", posList=list(stats=c("dexp", "dt"))) } # discrete empirical data x <- rnbinom(100, 0.5, 0.2) r <- fitter(x, dom="dis", posList=list(stats=NA))
Prepares a summary of the fitting as csv or shiny
printReport(x, file = NULL, type = "csv")
printReport(x, file = NULL, type = "csv")
x |
The output of |
file |
A character string giving the filename (including path) where the report should be printed |
type |
A character vector giving the desired type(s) of output |
The routine generates a simple csv file, which is the most useful output in terms of reusability. However, the shiny output is more powerful and provides an overview of the statistics and a figure for visual/manual exploration of the fits. Irrspective of output type being “csv” or “shiny”, the fit-table has the following format
package name
name of the distribution
number of parameters
names of parameters, comma-seperated list
estimated values of parameters, comma-seperated list
start values of parameters, comma-seperated list
were constraints used, logical
the runtime in milliseconds
test statistic $D$ of a two-sided, two-sample Kolmogorov-Smirnov test
$P$-value of a two-sided, two-sample Kolmogorov-Smirnov test
test statistic of a Shapiro-Wilks test
$P$-value of a Shapiro-Wilks test
A list with items
table |
A |
shiny |
if |
Markus Boenn
# discrete empirical data x <- rnbinom(100, 0.5, 0.2) r <- fitter(x, dom="dis", posList=list(stats=NA)) # create only 'shiny' app out <- printReport(r, type="shiny") names(out) ## Not run: out$shiny out <- printReport(r, type=c("csv")) # warning as 'file' is NULL, str(out) # but table (data.frame) returned
# discrete empirical data x <- rnbinom(100, 0.5, 0.2) r <- fitter(x, dom="dis", posList=list(stats=NA)) # create only 'shiny' app out <- printReport(r, type="shiny") names(out) ## Not run: out$shiny out <- printReport(r, type=c("csv")) # warning as 'file' is NULL, str(out) # but table (data.frame) returned
Get stars indicating the magnitude of significance of a P-value.
pvalue2stars(x, ns = "") pvalues2stars(x, ns = "")
pvalue2stars(x, ns = "") pvalues2stars(x, ns = "")
x |
Numeric value or numeric vector, typically a P-value from a statistical test. |
ns |
A character string specifying how insignificant results should be marked. Empty string by default. |
While the function pvalue2stars
accepts only a single value, the function pvalues2stars
is a wrapper calling pvalue2stars
for a vector.
The range of x is not checked. However, a check is done, if x is numeric at all.
String(s) of stars or points.
Markus Boenn
x <- runif(1, 0,1) pvalue2stars(x) x <- 0.5 pvalue2stars(x, ns="not signif") x <- c(0.0023, 0.5, 0.04) pvalues2stars(x, ns="not signif")
x <- runif(1, 0,1) pvalue2stars(x) x <- 0.5 pvalue2stars(x, ns="not signif") x <- c(0.0023, 0.5, 0.04) pvalues2stars(x, ns="not signif")
Get a list of currently supported packages
supported.packages()
supported.packages()
Numerous R-packages are supported, each providing a couple of theoretical statistical distributions for discrete or continuous data. Beside ordinary distributions like normal, t, exponential, ..., some packages implement more exotic distributions like truncrated alpha.
A character vector
Some of the distributions are redundant, i.e. they are implemented in more than one package.
Markus Boenn
sp <- supported.packages() head(sp)
sp <- supported.packages() head(sp)