Evaluating parameters of econometric models with linear limitations and a rank deficient observation matrix

There considered the approach of the linear econometric dependences parameters estimating for the case of combining a set of special conditions arising in the simulation process. These conditions address the most important problems met in practice when implementing a series of classes of mathematical models, for the construction of which the matrix of explanatory variables can be used. In most cases the vectors that make up the matrix have a close correlation relationship; this leads to the need of performing calculations using a rank deficient matrix. There are also take place violations of the Gauss-Markov theorem condition. The list of above mentioned special conditions is augmented by the additional model parameters constraints. Cobb-Douglas's production function and the Solow model are known economic problems of this type. In this research the need to impose additional constraints on the model parameters is extended to a wider range of tasks. In general, the economic formulation of the problem with the specified features is presented. Known ways to solve these tasks are discussed. The authors’ approach proposed takes into account the whole spectrum of these features. This approach is based on the application of pseudorandom matrices and the use of singular matrix decomposition. The use of proposed mathematical tools makes it possible to improve the quality of estimating model parameters while using real economic processes data. The analytical definition is found for the parameter evaluating vector of a linear econometric model with all the above mentioned features. Analysis of the used definition provides determination of the conditions that the matrix must satisfy; this describes additional model parameters constraints. The term was also obtained to estimate the variance of a linear econometric model parameters vector. The results obtained can be used in machine learning systems in the implementation of problems of econometric dependencies or discriminant models. Ganna Gryshchenko Department of Economics and Mathematical Modeling PhD, Assistans Oleksandr Kutovyi Department of Mathematik Dr hab, Prof. Oleg Shutovskyi Department of Information Technologies for Design and Applied Mathematics PhD, Ass.Prof. Математика та статистика 4 ПІДВОДНІ ТЕХНОЛОГІЇ  2020 Вип.10, 3-12 промислова та цивільна інженерія

Abstract. There considered the approach of the linear econometric dependences parameters estimating for the case of combining a set of special conditions arising in the simulation process. These conditions address the most important problems met in practice when implementing a series of classes of mathematical models, for the construction of which the matrix of explanatory variables can be used. In most cases the vectors that make up the matrix have a close correlation relationship; this leads to the need of performing calculations using a rank deficient matrix. There are also take place violations of the Gauss-Markov theorem condition. The list of above mentioned special conditions is augmented by the additional model parameters constraints. Cobb-Douglas's production function and the Solow model are known economic problems of this type. In this research the need to impose additional constraints on the model parameters is extended to a wider range of tasks. In general, the economic formulation of the problem with the specified features is presented.
Known ways to solve these tasks are discussed. The authors' approach proposed takes into account the whole spectrum of these features. This approach is based on the application of pseudorandom matrices and the use of singular matrix decomposition. The use of proposed mathematical tools makes it possible to improve the quality of estimating model parameters while using real economic processes data. The analytical definition is found for the parameter evaluating vector of a linear econometric model with all the above mentioned features. Analysis of the used definition provides determination of the conditions that the matrix must satisfy; this describes additional model parameters constraints. The term was also obtained to estimate the variance of a linear econometric model parameters vector.
The results obtained can be used in machine learning systems in the implementation of problems of econometric dependencies or discriminant models.

FORMULATION OF THE PROBLEM
The current state of society as a whole and the economy in particular is characterized by a digital information drastic increase. Therefore, in recent decades, the most urgent issue has been the search for adequate methods of vast data sets analysis and processing. Generalizing existing data analysis developments has led to the creation of Data Mining techniques. In most cases, Data Mining refers to a set of procedures for finding useful, non-trivial information that is understandable and can be applied in decision-making processes. There are several conditional options for Data Mining tasks classifying. As a rule, among the basic Data Mining tasks classes are the regression construction and the classification of economic objects. The tasks of regression constructing establish a relation between a continuous variable, which describes the behavior of an economic indicator, depending on the selected list of factors influence. The classification tasks also determine the dependence of a particular variable on the selected list of factors. However, unlike regressions, the dependent variable accepts only discrete values and can describe, for example, some characteristics of economic objects. For both specified task classes, the initial data is a dimension matrix consisting of values for the explanatory variables for each object. Therefore, the implementation of methods for constructing mathematical models of both classes of problems has a common problem that is associated with the requirements for the initial data array. The problem is that there is a strong correlation between two or more explanatory variables (the multicollinearity phenomenon). Construction of mathematical models if such connection is neglected leads to significant negative consequences, that is why special algorithms have been developed to check for multicollinearity. In most cases, if there are close correlation relationships between the explanatory variables, some variables are removed from the initial data matrix so as to eliminate multicollinearity. However, maintaining a complete list of factors when constructing a model can provide more valuable information, so very important are the approaches that allow an arbitrary matrix of explanatory variables to be used to construct mathematical models. This research describes the approach of constructing linear econometric models for the case where the matrix of explanatory variables is a deficient one.
Let there be a linear relationship between the variable explaining variables -the number of explanatory variables and perturbation. Suppose that there is a sample of observations for the variables and that each observation for each explanatory variable corresponds -the value for i observation of j variable. Then the linear relationship between and can be represented as: Denote also the and matrices transposed to and , respectively. Let the following conditions be met: is a matrix whose elements are determined numbers,

ANALYSIS OF THE RECENT RE-SEARCHES AND PUBLICATIONS
There is virtually no doubt that only a slight modification of the already known numerical methods is required to solve the system of general contravention (6). These modifications are based on the following idea. Let us solve system (1), (2) by any direct method, let it be the Gauss method with the choice of the principal element. If the matrix has an incomplete rank, then in the process of real transformations we obtain a system in which all the elements of the last rows will be minor. We reject these equations and find the solutions of the resulting system. They will serve as an approximation, good enough to the exact system.
On the basis of this idea a considerable number of works was published [1][2][3][4]. All their differences are related only to the use of various transformations of the initial system and the use of various criteria for the replacement of "small" elements of the transformed zero system. However, this idea did not immediately lead to the effective solution of systems of algebraic equations linear of general form. Moreover, the issue of the possibility of constructing a stable process of solving systems with incomplete matrices in the conditions of perturbation of the input data and the influence of rounding errors has not been finally resolved recently. A positive result was obtained only after a thorough study of the instability mechanism and finding guaranteed means of reducing its impact [3].
It is advisable to use unitary transformations of the initial system to find a normal pseudo solution. But, unlike full-rank matrices systems, the application of these transformations will no longer entail overall stability.
Thus, if the exact system matrix is incomplete, then the small values of the input data perturbations and rounding errors will not necessarily lead to the appearance in the system transformation process of any rows or columns, which consist entirely of the same small elements. This is the main, but not the only, difficulty in developing numerical methods for solving systems with rank deficient matrices, built on equivalent transformations of the original system.
Another obstacle is the reasoning for further transformations of those systems whose matrices have rows or columns with small elements.
If the system input data with a rank deficient matrix is given with errors, no increase in the accuracy of calculations and no transformations will guarantee the desired accuracy of a normal pseudo-solution [3]. This requires the involvement of additional information on the exact task. But suppose, after the unitary transformations, a system with small rows or columns is obtained. Replacing these rows and columns with zero is equivalent to a small perturbation of the initial system matrix. If we can accurately find the normal pseudo-solution of the resulting system, it will mean that the projection of the normal pseudo-solution of the exact system on one of the subspaces drawn on singular vectors will be calculated sufficiently accurately. There is no reason to expect a better result without additional information.
The need to use additional information to solve unstable systems is interconnected with some difficulties in designing the appropriate computational algorithms.

TASK SETTING
When solving certain economic problems on the basis of econometric models, it is necessary to consider conditions that impose additional regression coefficients constraints.
Let us have an enterprise that produces goods and uses m types of resources. The enterprise is characterized by a technological set n ZR   that describes all the possible sets of resources needed to produce a given product. ПІДВОДНІ ТЕХНОЛОГІЇ  2020 Вип.10, 3-12 промислова та цивільна інженерія For example, the plural may be conditioned by 1 1 ii abare given numbers 0 ii ab , that is, the proportions of resources must be within certain limits.
Let's have a production function where 0  , …, are unknown parameters for which .

Suppose
-the amount of goods produced by the enterprise in the set of resources . One can consider the task of determining , …,  [1] when applying the classical Cobb-Douglas production function of a form where production volumes, labor costs,capital costs,  ,  , model pa-rameters, the need that considers the constant production scale feedback is realized by introducing a 1     constraint type.

PRESENTING THE MAIN CONTENT
There are two ways of considering such limitations [1]. The first is to solve the problem without additional constraints. For the result obtained, we test the hypothesis that the estimated coefficients satisfy the required conditions.
The hypothesis is formulated as follows: for the true values of the coefficients of the model, a condition T Cr  is fulfilled, where is a vector of constants, which allows to describe the existing additional conditions,  -the estimates of the model parameters found without taking into account additional conditions,the constant is given by the condition.
Using model parameter estimates, the value T C  for which the value is checked is calculated where is the standard error of the model perturbation.
To test the hypothesis in (9), is substituted instead of T C  . The obtained value is compared to the critical -distribution value with degrees of freedom. The second way is to take into account the additional constraint directly in the model parameter estimation process. In the simplest cases, the regression equation can be transformed so that an additional constraint will be taken into account in the model structure itself.
Consider the case [1] of estimating the parameters (8) provided by 1     . Logarithm (8) leads down to a linear dependence of the form: Limitation 1     can be represented in the form and directly included in the model: For (10), we find the estimates of the parameters and  provided that the sum of the squares of deviations is minimized [1].
If any of the methods does not provide the necessary accuracy for solving a system of linear algebraic equations, there is no reason to expect that another method will produce better results for the same system. Most likely, such a system should be considered unstable. It is known [1][2][3][4]  And from that 12 , prove that such a matrix X  always exists and is unique [2]. If is a non-degenerate square matrix, then 1 XX   obviously satisfies the conditions (12)(13)(14)(15). If X is rectangular and has full rank, then   (13) we obtain The least squares estimates of the parameter  in (2) are defined as the values 12 , , , m    , minimizing where the matrix ik A  is a symmetric positively definite matrix. The solution (16) 12 , , , m    will be called the pseudo-solution of problem (1), (2). The solution will be linear with respect to . In addition, provided (3) 1 P is an unbiased estimate 1 P in (1). That is 11 ( ) M P P   . The solution, generally speaking, will not be the only one. We will require that the minimum amount be 2 1 m j j   . Then the solution (16) is unique. Generally speaking, when condition (4) is violated, unbiased estimates  cannot be obtained.
The case where for problem (1-2), without taking into account the linear constraints, condition (3) is not fulfilled is considered in [9] for ( ) 0 M  . Some compromise was found between the bias value  and the value () D  . The case where (2) is not fulfilled was considered by Aitken [10], who proposed the use of the generalized least squares method, provided that the matrix X is of complete rank. In this work, Aitken's method extends to problem (1)(2), provided that there are linear constraints, as well as that (6) is not fulfilled, but takes place rangX t m  and simultaneously where 2  is an unknown parameter, D , W are known symmetric positive definite order matrices nn  . Then D makes it possible a representation where the matrix is nondegenerate positively defined. So, , so that 11 P DP E   and In this article we consider the general case of linear constraints for problem (1), (2) in the form: where r is a known vector column consisting of gm  elements, g rangR  R i s a known order matrix gm  .
It is important that the matrix should have the following property:

Математика та статистика
ПІДВОДНІ ТЕХНОЛОГІЇ  2020 Вип.10, 3-12 9 промислова та цивільна інженерія It is necessary to find Y X e    model parameter estimates such that the constraints (18) to be satisfied. To do this, it is necessary to minimize the expression: where is a vector column of Lagrange multipliers.
, the second and third additions coincide. Since From (19) we have: Then the condition of minimization will be Using Lagrange multipliers, let us consider We estimate the vector  so that the condition ( ) 0 Rr    be satisfied. To do this, select  in such a way as to minimize L under condition Rr  .
The condition min L will be From (20), (21), (22) we obtain: Solve (24) pertaining to  , using the pseudorandom matrix, we have: Substitute the resulting expression for  (25) into (23): The second supplement is independent of  , therefore ( ( ) ) 0 Applying (28) and (27) we get Using the Moore-Penrose conditions (12) -(15), we have: In addition ( 2 ( ) H is the residual sum of squares in the absence of a relation between the parameters and is equal.
We can prove that T  T  T   T  T  T  T  R   T  T  T   T  T  T  The complexity of real economic processes requires the continuous improvement of existing mathematical tools to enable the construcпромислова та цивільна інженерія tion of adequate mathematical models. It is necessary to constantly search for new approaches in mathematical modeling, which will allow expanding the possibilities of constructing models of real economic processes. Known methods may have quite limited applications. Thus, a detailed method of estimating the parameters of linear econometric models may be unsuitable in some cases for its application in modeling real economic processes. The classical least-squares method gives stable and effective evaluation only if the conditions of the Gauss-Markov theorem are fulfilled, whereas in most studies such conditions are not fulfilled. Therefore, developments that allow the adaptation of existing mathematical modeling approaches to a wider range of problems are important. In addition, the modern dissemination of digital information necessitates its automated processing. Machine learning technology to build mathematical models is becoming more commonplace, and so approaches that can be used to solve common problem classes are becoming more relevant. The approach considered in this paper meets these requirements. It extends the ability to solve the problem of evaluating parameters of linear econometric models for cases of a number of problematic issues that may arise in the construction of models and can be conveniently implemented in machine learning systems.