ChooseK {AoE}R Documentation

Automated Threshold Selection for Univariate Tail Estimation

Description

The function is an implementation of an experimental method by the package author for the automated threshold selection (choice of k) for univariate tail estimation.

Usage

ChooseK(data = x, k = 10:(length(data) - 1), test = "s", alpha = 0.5,
        approx = "GPD", method = "ML", plot = TRUE)

Arguments

data A numeric vector containing the data.
k Vector of values of k = 1, ..., n-1, with n the sample size, among which to choose.
test A character string specifying the test with which the goodness-of-fit of the exponential distribution to the residuals will be tested. See ‘Details’.
alpha The nominal level α of the test.
approx A character string specifying the model which is fitted to the tail: the "Weissman" approximation or the "GPD". See ‘Details’.
method In case approx = "GPD", a character string specifying the estimators for the parameters of the generalized Pareto distribution fitted to high-threshold excesses: "Hill", "ML", or "Moment". See Hill.
plot If TRUE (the default), the results will be plotted. See ‘Details’.

Details

Let X_{1:n} <= ... <= X_{n:n} be the ascending order statistics of the sample. The residuals Z_{1:k} <= ... <= Z_{k:k} are defined as follows:

To this sample of k residuals, a goodness-of-fit test of the exponential distribution is performed. The largest k for which the null hypothesis is not rejected at level α is the selected value for k.

The argument test specifies which test will be used: "Cox-Oakes", "Gini", "Anderson-Darling", "Cramer-von Mises", "correlation", "score". See Henze and Meintanis (2005) and Stephens (1974) for more details on all of these tests but "score". The test corresponding to "score" is the score test for c = 0 in the mixture model

F(x) = 1 - (1-c)exp(-α x) - c exp(-2 α x)

Essentially this is a test for the presence for a bias term of the form predicted by the theory of second-order regular variation.

If plot = TRUE, then two graphs are shown:

If in the functions Hill, ML or Moment the argument choose.k is set to TRUE, then a value of k is selected by a call to ChooseK. This is the main use of this function.

Simulation experience shows that the "score" test works best and that α should be chosen much larger than the usual values for the type-I error, lest the selected value for k is too large. This is why the default value is alpha = 0.5, which seems to give good results overall. But see ‘Notes’.

Value

A list with the following components:

p A numeric vector of the same lengths as k with the p-values of the test at the corresponding threshold.
k0, i0 The selected value of k, specified in two ways: k0 = k[i0].
g0, s0 The estimated parameters of the generalized Pareto distribution at the selected threshold (only if method = "GPD").
z0 The residuals at the selected value of k.
test The name of the goodness-of-fit test.
alpha The nominal level of the test.

Note

This method is still experimental. No theory is existing yet. For questions or suggestions, please feel free to write to johan.segers@uclouvain.be.

Author(s)

Johan Segers

References

Henze, N. and Meintanis, S.G. (2005). Recent and classical tests for exponentiality: a partial review with comparisons. Metrika 61, 29-45.

Stephens, M.A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association 69, 730-737.

Weissman, I. (1978). Estimation of parameters and large quantiles based on the k largest observations. Journal of the American Statistical Association 73, 812-815.

See Also

Hill, ML, Moment

Examples

x <- rburr(n = 1000, gamma = 0.5, rho = -0.5)
Hill(x, k = 10:500, log = "x", choose.k = TRUE)

[Package AoE version 1.0.1 Index]