ChooseK {AoE} | R Documentation |
The function is an implementation of an experimental method by the package author for the automated threshold selection (choice of k) for univariate tail estimation.
ChooseK(data = x, k = 10:(length(data) - 1), test = "s", alpha = 0.5, approx = "GPD", method = "ML", plot = TRUE)
data |
A numeric vector containing the data. |
k |
Vector of values of k = 1, ..., n-1, with n the sample size, among which to choose. |
test |
A character string specifying the test with which the goodness-of-fit of the exponential distribution to the residuals will be tested. See ‘Details’. |
alpha |
The nominal level α of the test. |
approx |
A character string specifying the model which is fitted to the tail: the "Weissman" approximation or the "GPD" . See ‘Details’. |
method |
In case approx = "GPD" , a character string specifying the estimators for the parameters of the generalized Pareto distribution fitted to high-threshold excesses: "Hill" , "ML" , or "Moment" . See Hill . |
plot |
If TRUE (the default), the results will be plotted. See ‘Details’. |
Let X_{1:n} <= ... <= X_{n:n} be the ascending order statistics of the sample. The residuals Z_{1:k} <= ... <= Z_{k:k} are defined as follows:
approx = "Weissman"
, then Z_{i:k} = log X_{n-k+i:n} - log X_{n-k:n}. This approach is suitable only for heavy-tailed distributions, that is, with extreme-value index gamma > 0.
approx = "GPD"
, then Z_{i:k} = log {1 + gamma (X_{n-k+i:n} - X_{n-k:n}) / σ} / σ, with gamma and σ the estimates of the parameters of the generalized Pareto distribution.
To this sample of k residuals, a goodness-of-fit test of the exponential distribution is performed. The largest k for which the null hypothesis is not rejected at level α is the selected value for k.
The argument test
specifies which test will be used: "Cox-Oakes"
, "Gini"
, "Anderson-Darling"
, "Cramer-von Mises"
, "correlation"
, "score"
. See Henze and Meintanis (2005) and Stephens (1974) for more details on all of these tests but "score"
. The test corresponding to "score"
is the score test for c = 0 in the mixture model
F(x) = 1 - (1-c)exp(-α x) - c exp(-2 α x)
Essentially this is a test for the presence for a bias term of the form predicted by the theory of second-order regular variation.
If plot = TRUE
, then two graphs are shown:
If in the functions Hill
, ML
or Moment
the argument choose.k
is set to TRUE
, then a value of k is selected by a call to ChooseK
. This is the main use of this function.
Simulation experience shows that the "score"
test works best and that α should be chosen much larger than the usual values for the type-I error, lest the selected value for k is too large. This is why the default value is alpha = 0.5
, which seems to give good results overall. But see ‘Notes’.
A list with the following components:
p |
A numeric vector of the same lengths as k with the p-values of the test at the corresponding threshold. |
k0, i0 |
The selected value of k, specified in two ways: k0 = k[i0] . |
g0, s0 |
The estimated parameters of the generalized Pareto distribution at the selected threshold (only if method = "GPD" ). |
z0 |
The residuals at the selected value of k. |
test |
The name of the goodness-of-fit test. |
alpha |
The nominal level of the test. |
This method is still experimental. No theory is existing yet. For questions or suggestions, please feel free to write to johan.segers@uclouvain.be.
Johan Segers
Henze, N. and Meintanis, S.G. (2005). Recent and classical tests for exponentiality: a partial review with comparisons. Metrika 61, 29-45.
Stephens, M.A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association 69, 730-737.
Weissman, I. (1978). Estimation of parameters and large quantiles based on the k largest observations. Journal of the American Statistical Association 73, 812-815.
x <- rburr(n = 1000, gamma = 0.5, rho = -0.5) Hill(x, k = 10:500, log = "x", choose.k = TRUE)