Statistical Estimation of Corruption Indicators in the Firm

A new statistical procedure of anti-corruption control of economic activity is proposed in the paper. The task of the firm’s corruption checking is formulated in terms of statistical hypothesis testing. To make checking procedure more rigorous, it is proposed to formulate a hypothesis regarding the current, not the average data, as is customary in classical statistics. Mathematics Subject Classification: 62P20


Introduction
In the last decades mathematical modeling of socioeconomic systems and processes have received definite status of the legal existence (e.g.  and references therein) .Here we demonstrate how mathematical statistics methods help to reveal corrupt activity in firms.The ability to counteract corrupt activities of firms is limited due to its hidden nature, extent of its dissemination, and, accordingly, a lack of resources to identify these ones.In this regard, a sufficiently reliable criteria for the evaluation of the presence of indirect indicators of corruption in firms is of particular relevance.Several studies have shown [21] that the circumstantial evidence of corruption in the sphere of economic activity includes: the short time of tendering, the inflated volume of the tender, the violation of the procedure of filing the application, the violation of the deadline for the submission of the application, the filing of a complaint from losing bidders, changes in the contract compared with the bid, the winning bidder has not declared all its participants, and so on.

The estimation of the corruption area boundary
Suppose that k indirect corruption indicators {x 1 , x 2 , . . ., x k } are selected as components of the vector X = (x 1 , x 2 , . . ., x k ) T (T is a transposition operator).Let R c ⊂ R k be the corruption area.Assume that the vector X obeys a multivariate normal distribution.This assumption is valid when the distribution is sufficiently localized in the area of admissible values.The boundary of the region R c can be determined statistically by the data sample values {X j , j = 1, 2, . . ., n}, obtained on the basis of the accounting statements of firms whose economic activity is of a corrupt nature.
Compute the estimate vector of the average values and the matrix of cross-covariances The degeneracy of the matrix V c means the dependence of selected indirect corruption indicators and points to the necessity to reduce their list .The boundary of the corruption area R c , given that, by assumption, X obeys a multivariate normal distribution, i.e.X ∼ N (Θ c , V c ), can be defined in two different ways: as elliptic hypersurface, or a hyperplane.In the first case we have Here, the probability density function f c (X) = f (X, Θ c , V c ) is defined by the formula . The value of the constant c α is such that the probability measure for the distribution N (Θ c , V c ) inside the ellipsoid W c , i.e.R c , is equal to 1−α.Note that the ellipsoid W c , might go beyond the boundary of the admissible values of the vector X region .Then a probability estimation should be calculated in terms of conditional probabilities.This remark also applies to probability estimation discussed below.The second approach to evaluating the boundaries of the R c is similar to E. Altman's method [1], [17].Let the orthogonal matrix P be computed so that Columns of the matrix P can be placed in such a manner as to satisfy the condition λ 1 ≥ λ 2 ≥ . . .λ k .Then one can take a hyperplane φ c = {X : P T k X = δ α }, as the boundary of R c where P k is the last column of the matrix P corresponding to the smallest eigenvalue (λ k ).The sum of the squared deviations of all values X j from hyperplane φ 0 = {X : P T k X = P T k Θ c } equals to the minimum value (n − 1)λ k , i.e. the hyperplane φ 0 reproduces the original sample with minimal quadratic error.The value of constant δ α should be calculated based on the selected level of confidence (1 − α), as the boundary of the left-sided critical region for Student random variable n−1 is quantile of the Student distribution).This means that for corrupt firms the probability to be occurred below the level δ α is equal to α/2, since the Student test is the two-sided one.

Probability estimation of corruption absence in the firm analysed
Suppose that for some firm B "analyzed for corruption" there observations {X t , t = 1, 2, . . ., T B },are given for some time period T B .Assume that the values of X t aren't included in R c , because otherwise the economic activity of the firm would be considered as corrupt.Making use of formulas ( 1) and (2), we calculate similar estimates for the parameters Θ B and V B , assuming that the vector of indices X of the estimated firms obeys a multivariate normal distribution X ∼ N (Θ B , V B ), which indirectly corresponds to the assumption of corruption absence.
The probability measure of corruption absence may be defined as a probability measure (distribution N (Θ B , V B )) of the interior of the ellipsoid when the boundary of R c is defined as the ellipsoid W c (the logarithm in ( 3) is used to simplify the differentiation) and when the boundary R c is defined as the hyperplane φ c .In the latter case, the solution is quite simple: Geometrically, the point X * is the point of contact of the ellipsoid W B with boundary R c [17], [18].Now, taking a confidence level 1 − α (here the choice does not depend on the construction of the boundary R c ), compare with it the probability measure P (IntW B ) of the interior of the ellipsoid W B .If P (IntW B ) > 1 − α, the economic activity of the firm is not heavily corrupt, if P (IntW B ) ≤ 1 − α, this should be treated as a signal for more rigid control.

Check for corruption in terms of statistical hypotheses
According to the classical theory of hypotheses testing here would be to test the hypothesis H c : Θ B ∈ R c .As shown in [17], it is enough to verify the hypothesis H * : Θ B = X * .However, the scale of variation of the average values (Θ B ) for the analyzed firm will be, as is known, the T times smaller than the scale of variation of the immediate values (X), since under the correct hypothesis H * estimating of Θ B obeys the distribution N (X * , T −1 B V B ).This makes it relatively non-sensitive to checking of the firm, whose economic corrupt action was of an episodic nature.In this regard, we should test the hypothesis H c : X ∈ R c (H * : X = X * ).This hypothesis uses the distribution of indirect corruption indicators (component s of X).It is interpreted as abuse of firm economic actions of corruption character.This approach is consistent with probabilistic estimates of section 2. Appropriate statistics under these assumptions are components of the vector which has Student's distribution with number of freedom degrees (T B − 1).Here Q is an orthogonal matrix such that the matrix Q T V B Q = Λ B has diagonal form.In addition, the value ) with the increase of the sample size T B asymptotically approaches the distribution χ 2 with the number of freedom degrees k.In the case where the hypothesis H c : X ∈ R c (H * : X = X * ) is not rejected by statistical test, the company falls into the category of suspected of corrupt activities.

Example
Let's consider a numerical example.Let's assume for simplicity of exposition that the number of indirect corruption indicators is reduced to two (k=2), and the boundary of the corruption region is a straight line.Table 1 shows a sample of values for the company's activities within a certain period of observation.This table represents the calculated average values, which are the coordinates of the test point.For the illustrative purpose, we assume that the corruption area lies on the straight line 0, 75x 1 + 0, 25x 2 = 52 (Altman's boundary).In this case, the test point falls in the safe region.  2 shows the coordinates of boundary points nearest to the test point and the coordinates of the boundary points of the maximum likelihood (all three points are presented in Fig. 1).This table contains the values of the test according to Student criterion.The sample size is not so big to consider the distribution of the magnitude (5) sufficiently similar χ 2 .As can be seen from the table 2, for the nearest boundary points one of the Student statistics exceeds the critical value, so it can give a false confidence of the tested company to be free of corruption.However, for the point of maximum likelihood, both statistics are in the confidence region.Thus, despite the fact that, on average, the tested firm is estimated as free of corrupt, in principle this one may commit acts of corruption character.Such firm must be subject to appropriate additional monitoring.

Conclusion
The statistical estimation procedure proposed in the paper significantly expands the possibilities of control of firms economic activity with regards to its corruption activity.However, the estimation procedure is becoming more rigorous and more sensitive to changes of indirect corruption indicators in firm economic activity.The procedure can be used also in other fields of mathematical modeling.It is shown that the corrupt checking of the system with a large number of parameters may need to test a complicated hypothesis that arises from restrictions on the parameters.It is shown that the verification of this complex hypothesis is reduced to the verification of simple hypotheses about boundary point, which gives the maximum likelihood.This fact and the algorithm proposed here significantly expands the possibilities of adequate testing for the corruption in the complex multiparameter structures and systems, increasing the reliability of findings and conclusions.

Figure 1 :
Figure 1: The arrangement of the elements of statistical analysis:1 -test point; 2 -nearest boundary point;3 -boundary point of maximum likelihood

Table 1 :
Simulated data of test results

Table 2 :
The results of the statistical test