777
Views
1
CrossRef citations to date
0
Altmetric
Articles

Credibility pseudo-estimators

Pages 770-791 | Received 06 Apr 2017, Accepted 18 Mar 2018, Published online: 22 Mar 2018

Abstract

We treat a model with independent claim numbers and claim amounts, conditional on stochastic parameters. Groups are categorized into a smaller number of classes, which likely differ in risk premium. The collective claim frequency and mean claim for a group are modeled as those of the class the group belongs to. For each group we find the Best Linear Predictor, also known as Credibility Estimator, in a generic model covering claim frequency and mean claim, as a weighted mean of the group’s individual estimate and the collective estimate. Assuming Poisson distributed claim numbers and some distributional properties of claim amounts, we find estimators of variance components, estimation errors of the collective claim frequency and mean claim, and covariances between observations, estimators, and stochastic parameters. Pseudo-estimators, i.e. estimators which are defined by expressions that contain them and which must be solved numerically, are given for between-groups variance components. Simulation results, where some of the assumptions are violated, indicate when they are preferable over non-pseudo-estimators.

1. Introduction and summary of results

In tariff analysis, we use the term multi-level factor (MLF) for a rating factor, where some classes have too few claims to admit basing the premium on the class alone. An example is geographical parish, when there are several thousands of those. Credibility analysis should be used for this argument. To distinguish the MLF from arguments (rating factors) with sufficiently many claims in each class, we call a class in it a group.

We have a prior categorizing of groups into a smaller number of classes by some property, which we have prior reasons to believe affects the risk premium. An example of groups is geographical parishes with the property population density in five classes, where higher population density likely implies higher risk premium. We call the classes of this property an auxiliary argument, or auxiliary. We use the term of Ohlsson & Johansson (Citation2010, Section 4.2.3, p. 87). These authors treat there the same setting as we do.

The input is claims and exposures for some time period. Best Linear PredictorsFootnote1 (BLPs) are deduced. Under an essentially Compound Poisson assumption and some suitable distributional assumptions for claim amounts, we also derive estimators of between-groups variance components.

The paper is organized as follows. Section 2 recapitulates credibility models found in the literature. Section 3 summarizes and gives reasons for our model. Section 4 states the notation. Section 5 sets up a generic model covering both claim frequency and mean claim. Section 5.1 gives the BLP. We take account of how estimators’ variances and covariances with observations and stochastic parameters (i.e. random effects) affect the BLP, thereby arriving at a possibly new and more exact expression than has been known so far. Section 5.2 gives a pseudo-estimator for the variance component between groups, which is optimal under certain conditions. A pseudo-estimator is one that is defined by an expression that contains the estimator itself, which thus must be found by numerical root finding. In Section 6, the specifics of claim frequency are treated under the assumption of Poisson distributed claim numbers. Section 7 treats the specifics of mean claim under assumptions of some distributional properties of claim amounts. In Section 8, the separate claim frequency and mean claim results are combined to risk premium results. Section 9 describes simulation results for the goodness of estimators of between-groups variance components, when some assumptions are violated to test robustness. Appendix 1 gives proofs. Appendix 2 gives tables from the simulations.

The free language Rapp for credibility by this paper and other methods, GLM for non-life insurance pricing, claim reserve algorithms, bignum multiprecision computing, etc. is found here: www.stigrosenlund.se/rapp.htm.

2. Overview of some credibility models

Bühlmann & Straub (Citation1970) give the classical Bühlmann–Straub estimator for a non-parametric credibility model with one MLF and no other rating factors. Norberg (Citation1980) treats best linear unbiased prediction in empirical Bayes credibility. Campbell (Citation1986) combines the MLF with groupings of it by auxiliary arguments which are functions of the MLF, e.g. median income and population density for parishes, and weight and power for car models. The grouping is made by cluster analysis. In the example rendered, exposure is normalized duration. For the Bühlmann–Straub model De Vylder (Citation1996, III, Chapter 3, Section 3.4.7), gives pseudo-estimators credited to Bichsel & Straub (CitationUnpublished). Frees (Citation2003) treats credibility with a multivariate approach to groupings into different lines of business. Overviews of credibility are found in Bühlmann & Gisler (Citation2005), including the Poisson model for claim numbers and a pseudo-estimator for Pareto credibility, and in Kaas et al. (Citation2009). Ohlsson (Citation2008) and Ohlsson & Johansson (Citation2010, Chapter 4), treat a setting with multiplicative rating factors including the MLF and auxiliary arguments, without a Poisson assumption. We cite Ohlsson & Johansson (Citation2010) frequently below due to its use as a textbook in actuarial education, although the results might have been found by other authors before them.

3. Summary of and reasons for model

For some of our results we assume that, conditional on stochastic parameters (random effects), the claim cost of any group of the MLF is Compound Poisson distributed, with the slight weakening that individual claim amounts are independent and have the same mean and variance, not necessarily identically distributed. Therefore, we write essentially Compound Poisson above.

We have available a categorization of the groups, called the Auxiliary. Given independent random effects per group with mean 1 and the same variance, the expected claim frequency or mean claim is the product of a factor specific to the auxiliary and the random effect.

See Rosenlund (Citation2010) for reasons to use pure Poisson in tariff analysis rather than Overdispersed Poisson for claim numbers, even with macroscopic fluctuations affecting large parts of the portfolio in the same way. It is also a mistake to assume Negative Binomial – Mixed Poisson with a gamma mixing distribution – for claim numbers. While the Negative Binomial distribution is appropriate if we draw a customer at random from the portfolio, it is inappropriate if used for tariff analysis and predictions of next year’s claim numbers. We should instead condition with respect to the different customers’ mixing variables, whether we regard them as stochastic or not. In this paper, we condition with respect to stochastic parameters, which were realized in the past and do not change. So we get the pure Poisson distribution. Negative Binomial would only be right if all customers were to leave at year-end and be replaced by a random sample of customers, distributed as this year’s sample and independent of it. Normally only a fraction of the customers leaves each year, making the claim number distribution somewhat more dispersed than Poisson, but not like Negative Binomial. This thought experiment suffices to clarify the matter.

4. Notation for observables

We define the following observables. Subscripts and denote claim Frequency and Mean claim, respectively.(4.1) (4.2) (4.3) (4.4) (4.5) (4.6) (4.7) (4.8) (4.9)

With no Auxiliary we set . Then is the total exposure and is the total number of claims.

5. Generic BLP and pseudo-estimator

A generic notation will be employed to deduce a BLP and a pseudo-estimator. Certain functionals and their estimators will then be specified for claim frequency and mean claim. This notation is as follows.

We condition on Nj and on for mean claim. This conditioning is implicit and is not written out below. See the remarks in Section 8 for a justification.

Here and are generic exposures. For mean claim, the claim numbers take the role of exposures. is the generic claim rate. We make the following assumptions.

Assumption 1:

Conditional on stochastic variables , with expectation E and variance has expectation , where is the expected claim rate of Auxiliary class k.

Assumption 2:

are independent.

Objective. To predict as well as possible.

Here and is a between-groups variance component.

Let(5.1)

Define these functionals. (Equation5.2) and (Equation5.3) are used both for the BLP and the pseudo-estimator, while (Equation5.4) and (Equation5.5) are used only for the pseudo-estimator.(5.2) (5.3) (5.4) (5.5)

Estimators are named as the corresponding functionals with a above them. They are obtained by plugging estimators into the expressions above.

The most laborious work in deducing a new pseudo-estimator is establishing an estimator for by (Equation5.4). This is done in Sections 6.1 and 7.1.

Define the estimator(5.6)

It is immediate that . It will be shown in Appendix A.1 that(5.7)

5.1. Best linear predictor

We establish first a non-observable predictor of in the form(5.8)

for an optimal . Having established this predictor, we obtain an observable estimated predictor in the form(5.9)

where is a suitable estimator of .

We seek the best linear combination of the observations to predict in -norm, i.e. the BLP or Credibility Estimator. It is shown in Appendix A.1 to be in the form (Equation5.8). That is, is to be determined so that is minimized. Then is the BLP. The following is proved in Appendix A.1.

Theorem 5.1:

If Assumptions 1 and 2 hold, then by (Equation5.8) is the BLP of if we set when , otherwise(5.10)

We obtain from by replacing unknown functionals with suitable estimators. We name such an estimator as the corresponding functional with a above it. For an estimator named with a or an * above it can also be used. We find above in (Equation5.6). The remaining ones will be specified for claim frequency and mean claim later.

Some terms of (Equation5.10) are part of the classical Bühlmann & Straub (Citation1970) estimator, while other terms might be new. See Remark A1.

The exposure-weighted total of the non-observable predictors is easily shown to be unbiased. The replacement with estimators in the observable predictors can cause a bias. It is corrected by first computing the simple expression and then multiply that to each .

5.2. Pseudo-estimator

We give a -estimator that contains itself, i.e. a pseudo-estimator in the sense of Bichsel & Straub (CitationUnpublished). See De Vylder (Citation1996, III, Chapter 3, Section 3.3.7), for a description of a pseudo-estimator in a simple setting. Equation (Equation5.11) must be solved numerically, and the solution is the pseudo-estimator. We have to find a zero of the rather complicated function of given by the left side of (Equation5.11) minus the right side.

The reason for a pseudo-estimator is to be able to use inverse variance weighting of squared deviations of observations from means, so that a minimum variance estimator can be obtained. These inverse variances require the variance component itself, hence the need for numerical root finding. This is laid out in Appendix A.2. Non-pseudo estimators cannot make use of such weighting. Sufficiently many groups and observations are needed for a pseudo-estimator to be better than a non-pseudo one, however.

See Appendix A.3 for the solution. We define as the largest solution. It can be 0, indicating no variance between groups. We give a sufficient condition for a positive solution, but could not prove its uniqueness or that the condition is necessary.

The iterative procedure for pseudo-estimators described in the literature is not suitable for finding the solution. Bisection (interval halvings) or some faster more complicated method should be used.

We need the concept of excess as defined in Cramér (Citation1946, Chapter 15, Section 15.8, Equation (15.8.2)), and in De Vylder (Citation1996, III, Chapter 2, Section 2.1.2). Namely

We will assume that the 3rd central moment and the excess of are 0. These assumptions admit a relatively simple and mathematically consistent estimator, and they imply approximate optimality of the estimator for a large number J of groups. However, in Section 9, we study by simulations how the estimator performs when J is not so large and when these moment assumptions are not true.

Theorem 5.2:

Let Assumptions 1 and 2 be true. Let , and be estimators, to be specified later, of the corresponding functionals in (Equation5.2), (Equation5.4) and (Equation5.5). Define the pseudo-estimator(5.11)

It holds . If and , then is approximately optimal in the sense of having the smallest mean square error of estimators in the form of a linear combination of .

6. Claim frequency specifics

Here we have , and . We use the first two ones below, while retaining the generic notation for the rest. In addition a specific function is introduced.

Assumption 3:

Conditional on is Poisson distributed.

The properties of the Poisson distribution entail the following.(6.1) (6.2) (6.3)

6.1. Variance of squared observation deviations

To use (Equation5.11) we must have an estimator , which requires an estimator of .

Lemma 6.1:

Let Assumptions 1–3 be true and assume that and . Let(6.4)

Then by (Equation5.4) is(6.5)

As before, plug in estimators in (Equation6.5) to get and then , using (Equation6.3).

6.2. Non-pseudo-estimator

With = the total number of claims, define a non-pseudo-estimator by(6.6)

The adaptation of (4.27) in Ohlsson & Johansson (Citation2010) into by (Equation6.6) is this. On p. 82, firstly we set and , in accordance with the Poisson assumption. Secondly, we divide the expression by the square of an estimate of the base factor . This is necessary, since we deal with claim frequency estimates without specifying them into base factors and argument factors. Thirdly, we truncate the estimator from below to 0. This will decrease its mean square error. One might think that this would entail a positive bias, but our simulations nevertheless showed a negative bias.

7. Mean claim specifics

Here we have , and . We use the first two ones below, while retaining the generic notation for the rest. In addition a specific function is introduced.

We need the following assumptions. Assumption 4 implies the generic Assumption 1.

Assumption 4:

Conditional on stochastic variables , with expectation E and variance for any specific j the are independent with expectation , where is the expected mean claim of Auxiliary class k.

Assumption 5:

for some and , with independent of j.

Here Assumption 5 implies . In Rosenlund (Citation2014) we argued against such an assumption in ordinary non-credibility multiplicative tariff analysis. But when many groups have few claims we risk overparametrization without the assumption.

We fix p in Assumption 5 initially. Normally is suitable by virtue of giving claim amounts a constant coefficient of variation (CV), conditional on .

We refer to Ohlsson & Johansson (Citation2010) for the following three expressions.

Define the functional(7.1)

Then it holds(7.2)

The assumptions entail the following estimator.(7.3)

7.1. Variance of squared observation deviations

To be able to use (Equation5.11) we must have an estimator . It is much more complicated for mean claim than for claim frequency.

We make this assumption. For , it follows from Assumption 5 if .

Assumption 6:

For it holds for some , with independent of j.

The assumption implies Assumption 5 with . It is true if are only random IID scale factors for IID claim amounts with such that .

Here are the central moments of conditional on , and is in Assumption 5. We use estimators of based on the sample central moments for . This can be done without first or simultaneously estimate , which is an advantage. The standard credibility procedure to estimate uses by (Equation7.3), which is a combination of sample central moments of order 2. The advantage of this procedure extends to higher order central moments.

The following lemma is a counterpart to Lemma 6.1 for claim frequency. It uses functions of x and y, where x will take the value and y the value

Lemma 7.1:

Let Assumptions 1, 4 and 6 be true and assume that and . Set(7.4)

Let the sample central moments per j be(7.5)

Set(7.6)

Then . In Appendix A.6.1 are given unobservable estimators , where the true values are used. They are shown to be unbiased estimates of .

Define the weighted totals of these expressions, the overall estimators of , to be(7.7)

Furthermore, let(7.8)

In Appendix A.6.1 we show that are suitable estimators of . They are not in general unbiased due to the substitution of for and of for .

Let also(7.9)

Now define a counterpart to (Equation6.4), namely(7.10)

Then by (Equation5.4) is(7.11)

As before, plug in estimators in (Equation7.11) to get and then , using (Equation5.5).

7.2. Pseudo-estimator for gamma-lognormal mixture

If specific parametric forms hold for the conditional claim amount distribution, we can get better -estimators.

We give an estimator under the following assumption, which covers a range of distributions between short-tailed and long-tailed ones.

Assumption 7:

Assumption 5 is true and its exponent . Conditional on , is distributed as a mixture of a gamma distribution and a lognormal distribution with probability q for gamma, both with mean .

Then Assumption 6 in Section 7.1 is true. For gamma and lognormal distributions, the 3rd and 4th moments are determined by the 1st and 2nd ones. The idea is to use the empirical 3rd central moment to estimate q, which then is used to estimate the 4th central moment. We refer to Appendix A.7 for the computations.

Corollary 7.1:

Let Assumptions 1, 4 and 7 be true. Let(7.12) (7.13) (7.14) (7.15)

Define the pseudo-estimator by replacing with and with , as given in (Equation7.15), in the estimator of in (Equation5.5) and in (Equation5.11). It holds . If and , then is approximately optimal in the sense of having the smallest mean square error of estimators in the form of a linear combination of .

Remark 7.1:

The use of the 4th central moment can cause unstable , if there are too few claims. The simulation results of Section 9 illustrate this. Even if Assumption 7 is only remotely satisfied, can be preferable over if the latter is unstable.

7.3. Non-pseudo-estimator

With = the total number of claims, as in the claim frequency non-pseudo-estimator, and = the number of groups with claims, we define this non-pseudo-estimator.(7.16)

The adaptation of (4.27) in Ohlsson & Johansson (Citation2010) into by (Equation7.16) is this. Firstly, we specialize p on p. 82 to , in order to have a suitable classical estimator to compare the pseudo one to in simulations. Secondly, we divide the expression by the square of an estimate of the base factor in Ohlsson & Johansson (Citation2010), like the claim frequency non-pseudo-estimator. Thirdly, we truncate the estimator from below to 0. As for claim frequency, our simulations nevertheless showed a negative bias. Note that the denominator in (Equation7.16) is similar to the one in (Equation6.6).

8. Combining claim frequency and mean claim

The claim frequency and mean claim predictors are combined, if we wish to use these results for risk premium. To demonstrate this, we put subscripts and into the and . We then define(8.1)

It can be corrected by the simple factor to make the exposure-weighted total of estimated predictors unbiased. It serves as final rating factor for risk premium.

We have to assume that and are independent. Claim numbers are S-ancillary for the mean claim parameters in the Compound Poisson model. This property implies that inference for shall be made conditionally on the claim numbers. Together with independence it justifies (Equation8.1).

9. Robustness test simulations

9.1. Setup of comparison

To get guidelines for choice of estimator, depending on the situation, we have compared our pseudo-estimators with the non-pseudo classical type ones. We used as the non-pseudo-estimator , given for claim frequency by (Equation6.6) and for mean claim by (Equation7.16).

The basic model stated in Assumptions 1–6 was obeyed. But, except for the case of zero , we did not let the distributions of , for both claim frequency and mean claim, have zero excess or even mostly third central moment zero, as our pseudo-estimator theorems presuppose. From a practical viewpoint these are artificial assumptions, but ones that admit relatively simple and mathematically consistent pseudo-estimators. These estimators have to be reasonably robust against departures from the assumptions in order to be useful, though.

Three J-values 200, 1000, and 2000 were studied. For each value a fictitious insurance file was made with very varied exposure sizes per group j. An Auxiliary with five classes was assigned to each j.

Certain expected claim frequencies and mean claims per class were fixed. We set the base factor for claim frequency to 0.01, the base factor for mean claim to 2000, and the following class factors.

The Auxiliary was assigned to the groups successively with 1, 2, 3, 4, 5, 1, 2, 3, ... .

Exposures per group j were assigned by the algorithm %100 and exposure = , where %100 gives the remainder after division by 100. I.e. in an arithmetic series 10, 110, 210, ... starting from the beginning at .

One simulation generated about 30,200 claims for and about 302,000 claims for . We made as many simulations as were necessary to establish the best method, unless run times would have been too long.

As measure of the goodness of an estimate, we used an estimate of expected mean square deviation of the estimate from the true parameter. These measures, in the form of (square root), are tabulated in Tables . Let be the observed square deviation of and let . Set and let be the value of the tth simulation. We estimated by , where S is the number of simulations. If this is negative, then is denoted as Best and vice versa. If the 99% level confidence interval for contains 0, then a question mark is added.

9.2. Distributions and results

The results are given in Tables in Appendix 2. Our pseudo-estimator is denoted by Ps, the non-pseudo one by Nps.

Let U(a,b) be a random variable having the uniform distribution on (a,b).

Let be the usual gamma distribution parameter, such that the mean is and the variance is . Let have the gamma distribution of (kk).

The table below lists -distributions D1, , D9 in ascending CV order.

We let claim amounts be distributed as U(meanclaim/50.5,meanclaim 100/50.5) with CV 0.56592, or lognormally distributed with CV = 1, conditional on the s.

We give an estimate of .

For = 0 the confidence intervals (confidence level 95%) are for parameter. This is marked with a . Otherwise confidence intervals (95%) are for the biases in percent of the Ps and Nps -estimates, i.e. for 100 (.

9.3. Conclusions from simulations

The pseudo-estimators for mean claim have mixed positive and negative biases. Otherwise almost all estimators have negative bias, except of course for zero . This is remarkable for , since they were truncated from below to 0. The estimator with the smaller absolute bias has mostly the smaller mean square deviation.

Overall, the advantage of the pseudo-estimators over the classical estimators increases with increasing J, in line with Remark 7.1.

9.3.1. Claim frequency

In Table it is seen that the pseudo-estimator is generally the best one, except for some cases with small . For the smaller number of groups J = 200 it is possibly worse than the also for the large in distribution D9. A guideline would be to recommend that is used, unless J is small and the suspected is also small.

9.3.2. Mean claim

For the light-tailed uniform conditional claim amount distribution of Table , the classical estimator is best for the smaller J-numbers 200 and 1000 when is large. This holds also for the heavy-tailed lognormal distribution of Table when . In a typical mass consumer credibility application J is 2000 or larger, and the conditional claim amount distribution is more heavy-tailed than the uniform distribution. For those applications, the pseudo-estimators can be recommended, while for applications with a few large customers the case is not so clear.

10. Conclusion

We give sharp results for the BLP (Credibility Estimator) in a generic credibility model covering claim frequency and mean claim. The model has an auxiliary argument, which is a function of a MLF. An optimal pseudo-estimator for the between-groups variance component is given under some moment conditions for random parameters. The pseudo-estimators are shown to be reasonably robust against departures from the moment assumptions.

Acknowledgements

Thanks are due to the referee for many valuable suggestions, which helped to improve the paper considerably. Thanks also to Dr. Adhitya Ronnie Effendie, Universitas Gadjah Mada, Yogyakarta, Indonesia, for valuable suggestions and encouragement.

Notes

No potential conflict of interest was reported by the author.

1 What is predicted is the expected claim frequency or mean claim of a group, conditional on a random effect that occurred in the past and will not change. Thus, the word predictor seems disingenuous. The actuarial tradition is to call the BLP the Credibility Estimator, but the word estimator for a random variable, such as this conditional expectation, is also disingenuous. See Ohlsson & Johansson (Citation2010, Section 4.1, p. 75), for a discussion of terminology.

References

  • Bichsel, F. & Straub, E. (Unpublished). Erfahrungstarifierung in der Kollektif-Krankenversicherung. Internal note of the Swiss-Re.
  • Bühlmann, H. & Gisler, A. (2005). A course in credibility theory and its applications. Berlin: Springer.
  • Bühlmann, H. & Straub, E. (1970). Glaubwürdichkeit für Schadensätze. Bulletin of the Swiss Association of Actuaries 70, 111–133.
  • Campbell, M. (1986). An integrated system for estimating the risk premium of individual car models in motor insurance. ASTIN Bulletin 16(2), 165–184.
  • Cramér, H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.
  • De Vylder, F.E. (1996). Advanced risk theory. Editions de l’Université de Bruxelles.
  • Frees, E. (2003). Multivariate credibility for aggregate loss models. North American Actuarial Journal. 7(1), 13–37.
  • Kaas, R., Goovaerts, M. & Dhaene, J. (2009). Modern actuarial risk theory using R. Berlin: Springer.
  • Norberg, R. (1980). Empirical Bayes credibility. Scandinavian Actuarial Journal 1980, 177–194.
  • Ohlsson, E. (2008). Combining generalized linear models and credibility models in practice. Scandinavian Actuarial Journal 2008(4), 301–314.
  • Ohlsson, E. & Johansson, B. (2010). Non-life insurance pricing with generalized linear models. Berlin: Springer.
  • Rosenlund, S. (2010). Dispersion estimates for Poisson and Tweedie models. ASTIN Bulletin 40(1), 271–279. doi:10.2143/AST.40.1.2049229.
  • Rosenlund, S. (2014). Inference in multiplicative pricing. Scandinavian Actuarial Journal 2014(8), 690–713. doi:10.1080/03461238.2012.760885. Article is Open Access.

Appendix 1

Proofs

Below an estimator to be plugged into an expression is written with a above it. This can also mean an estimator marked with a or an * in the previous sections, depending on assumptions and the suitability of different estimators in different situations.

Proof of Theorem 5.1

The optimality of the BLP form of (Equation5.8), as a linear combination of the individual and the collective mean, is a key result in basic credibility. For an extension to the case with Auxiliaries, see e.g. Ohlsson & Johansson (Citation2010, Section 4.2, Theorem 4.3), which applies to the present model. Their final rating factor for group j following (4.24) is in the form (Equation5.8). The difference between methods is how to arrive at the .

The resulting expression (EquationA4) can be applied to total claim cost, letting and interpreting functionals accordingly.

We now compute . Since only one j is treated at a time, we drop the subindex j in setting z = , Y = , = , = , and = . From (Equation5.8) we get

Set 1/2 of the derivative of this expression with respect to z equal to 0. The resulting linear equation in z has only one solution, which gives the minimum. We obtain(A1)

To simplify (EquationA1), note that from Assumption 1 we get(A2)

For any stochastic variable X and -algebra the following identities hold.

Below we will use these identities with , the -algebra induced by .

First term of (EquationA1)

We obtainSecond term of (EquationA1)

By using we obtain

It holds(A3)

ThereforeThird term of (EquationA1)

We have

Thus (EquationA1) reduces to

i.e.

i.e.

Reinstating the subindex j, again writing for z etc., we get(A4)

We can compute the variances and covariances. Assumption 1 gives(A5)

Hence by the independence condition Assumption 2 (A6)

Thus we have proved (Equation5.7).

Again by the independence condition Assumption 2, for we can eliminate all terms in not containing . Therefore,(A7)

In the same way, we obtain from (EquationA3)(A8)

Inserting (EquationA6), (EquationA7), and (EquationA8) in (EquationA4), we obtain(A9)

This gives (Equation5.10) after some rearrangement.

Remark A1:

The Var- and Cov-terms in (EquationA3) might be new. We could not find them in the literature, even with no Auxiliary. However, since the literature on credibility is so large, they are possibly already known. The Bühlmann & Straub (Citation1970) estimator is retrieved by omitting them. The terms are often small in applications. The premise of credibility analysis is normally that the collective observed mean has so small variance, that it can be equated with the true mean for practical purposes. If j’s Auxiliary class (the whole sample with no Auxiliary) comprises sufficiently many claims and the exposure of each group in the Auxiliary class of j is sufficiently small relative to the total exposure of its Auxiliary class, the premise is justified.

Proof of Theorem 5.2

For the optimal pseudo-estimator, we note that by (EquationA2) and (EquationA5)

Define the random variables

We seek the optimal estimator of in the form of a linear combination

Since are independent the minimum variance standard solution is const, for j with . (For mean claim we can have .) We obtain

Here the factor cancels out. Thus, with given by (Equation5.5), we have

The optimal estimator of using unknown true parameters is then, with some rewriting,(A10)

It is unbiased. Substituting estimators for true values in the right sides of (Equation5.5) and (EquationA10) we can obtain from and an estimator of . This estimator will be biased.

One source of bias that can be dealt with is the use of in after plugging in estimators. We have

where we used Equations (EquationA5), (EquationA7) and (EquationA6). Define

Then from the above it holds . Other bias effects on the expectation from plugging in estimators are not so easy to reduce.

Here is similar to , but uses an estimate in one place and has an extra factor and term in the denominator. An estimator based on is likely to have normally less absolute bias and less mean square error than the estimator of obtained by substituting estimators for true values in .

Let be obtained from according to (Equation5.5) by plugging in estimators. We are content to use these weights, since a more precise estimate of than approximating it with is difficult to compute, and since are dependent within Auxiliary classes.

The final estimator will then be, after plugging in estimators,(A11)

This gives Theorem 5.2. There we use the symbol and write approximately, due to the plugging in of estimators. These statements could be formulated as limit theorems as , provided some conditions were imposed to guarantee that the influence of any individual j vanishes in the limit.

For use in the next section we note that can be written as a linear expression in as(A12)

as is seen in (EquationA6).

Solutions of pseudo-estimator equations

We seek solutions of (Equation5.11).

We will rewrite Equation (EquationA11) in a way showing the dependence on in a simplified form. With and the coefficients of the partitioning of in Equation (EquationA12), let

Set

With we have to solve the equation

Here is a solution. For possible positive solutions, let , i.e.

Let and Then . Hence , and for . If no positive solution exists. If we can take R as right endpoint of the interval where the solution is. The left endpoint is 0. The solution is in the closed interval [0,R].

If we can show that g(x) is strictly increasing for , then has at most one positive solution. This is not obvious. All simulated cases in Appendix 2 have g(x) strictly increasing. We challenge researchers to prove that this is always true, or else find a counter-example.

From the definition (Equation5.5) and some calculations we get for claim frequency

A more complicated expression for g(0) holds for mean claim. If g(0) is negative a positive solution exists. It remains to show that there is no positive solution if . Also it remains to show that a positive solution is unique, or else that there are cases with several positive solutions.

Remark A2:

The upper limit might be too large for computation of g(R) for numerical reasons. Instead we use the classical estimator . If we search, by stepping up, for an upper limit x not too far from , where . If the upper limit is taken as .

Higher moments of stochastic parameters

If has 3rd central moment 0 and excess 0, then its moments of order 2 to 4 are easily shown to be these. The second order moment is always as stated.(A13)

We will in the sequel make frequent use of the Cramér (Citation1946) formulas (15.10.4) and (15.10.5), connecting moments and central moments via semi-invariants, up to order 4, in order to establish estimators for , for both claim frequency and mean claim. The addition property of semi-invariants for sums of independent variables is here very useful.

Proof of Lemma 6.1 for claim frequency

We show here that , with as defined in (Equation6.4).

Suppressing j and introducing some expressions to simplify calculations, we set

From (EquationA13) we obtain

Now Poi(). All Poisson semi-invariants are equal to the mean. From Equation (15.10.4) in Cramér (Citation1946), which gives moments in terms of semi-invariants, we get

Since , we obtain

For the 2nd and 4th central moments of N we have

We are interested in

After some calculations we arrive at the comparatively simple expression

Therefore we get, replacing a with ,

Returning completely to the original notation, we have and , which gives, with by (Equation6.4),

Proof of Lemma 7.1 for mean claim

Central moment estimators

We will develop estimators of , which are defined in Assumption 6. These are suitable for computing an estimator of . We have

We use the sample central moments for . Let be as defined in expression (Equation7.5). Then are the sample central moments for , albeit containing the unknown functionals .

Cramér (Citation1946, p. 352), gives unbiased central moment estimators of order for a sample of IID random variables. Using these we define

By Assumption 6 and Cramér (Citation1946, p. 352), it holds

and hence

where is defined by (Equation7.4).

When weighting together for total estimators , albeit with unknown functionals in them, we observe that is defined only for . So we use weights , giving the estimators

with (recall that is independent of j).

Since are unknown we use estimates , giving by (Equation7.6). They will be approximately unbiased estimators of . We employ the weights for also here, giving the estimators by (Equation7.7).

Using as weight for we obtain by (Equation7.3), which illustrates the feasibility of the weighting. We are, however, uncertain as to whether some other weighting, presumably equal to this one when specialized to , might be generally better without imposing more assumptions.

For we obtain the non-observable estimators and, for suitable estimators , the observable ones(A14)

The assumption of Lemma 7.1 is that and . Then we get estimates by using for in (EquationA13), e.g. . It follows that in (EquationA14) will be those stated in (Equation7.8).

Variance estimator using semi-invariants

We shall compute an estimate of . For shortness we suppress indices etc. in the notation below. Let

It holds . We seek

Now and . This yields(A15)

Let the semi-invariants of conditional on . Cramér (Citation1946) gives these in terms of central moments in (15.10.5). We obtain

The semi-invariant of order t for N V conditional on is , by virtue of the addition property of semi-invariants for sums of independent variables. We deduce the moments of N V conditional on from its semi-invariants, again with the help of (15.10.4) in Cramér (Citation1946). This gives

The unconditional moments of V are thus .

To go further we need the moment assumptions for , namely that and . Then the moments of are given by (EquationA13). Hence, with as defined in (Equation7.9) with and , we obtain

We have from (Equation7.10) the following expression, where we write for N again.

Then by (EquationA15) we have

The estimator of is obtained by plugging in estimators.

Remark A3:

There are high powers and mixed positive and negative terms in the expressions above. This will cause severe numerical problems in ordinary computer arithmetic. Multiprecision arithmetic must be used.

Proof of Corollary 7.1 for gamma-lognormal mixture

Assumption 7 implies that the distribution of , conditional on , is a gamma distribution with probability q and a lognormal distribution with probability , both with mean 1. Let be a gamma distributed and let be a lognormally distributed random variable, with . Let W be distributed as . We then have

The following expressions, giving the 3rd and 4th moments as functions of the 2nd one, can be deduced from the properties of the two distributions. The under Gamma are those valid for , and the under Lognormal are those valid for .

For the mixture we thus have

We can solve q from the expression for , namely

The moment method, consisting of estimating a parameter, which is a function of the moments of the distribution, with the same function of the moment estimates, is here the most practical one. Using the estimates and in (EquationA14) of Appendix A.6.1 thus gives a q-estimate. It has to be truncated to at least 0 and at most 1. Thus we get estimates given by expressions (Equation7.12), (Equation7.13) and (Equation7.14) in Corollary 7.1. Due to the truncation of to [0,1], is not always equal to .

The desired estimate of is then given by (Equation7.15), to be used in the pseudo-estimator that follows from Corollary 7.1.

An estimate of under the assumption that and is purely gamma-distributed is obtained by setting identically regardless of the value of . The pure lognormal case is obtained by setting .

Appendix 2

Simulation results

Pseudo-estimators are denoted by Ps, non-pseudo-estimators by Nps. The tables are further explained in Section 9.

Table B1. Claim frequency comparison of -estimates.

Table B2. Mean claim comparison of -estimates.

Table B3. Mean claim comparison of -estimates.