1,725
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Editorial to the special issue: Recent statistical methods for data analysis, applied economics, business & finance

&

The special issue ‘Recent Statistical Methods for Data Analysis, Applied Economics, Business & Finance’ was mainly established for participants who attended the ‘11th International Statistics Congress’, held on 4–8 October 2019 in Bodrum, TURKEY. The primary purpose of the issue is to gather papers on various aspects of theoretical works supported mostly with applications and simulations from many different research fields of statistics, mainly from those who made presentations in the congress. However, the special issue had also drawn very much attention from researchers all over the globe, and more than one-third of the papers were submitted by those who had not participated in the congress. The 11th Statistical Congress event was such a great success that an enormous number of submissions had just been submitted to the special issue soon after the congress. The editorial process has been ongoing for about two years. There are 6 guest editors, Atila Göktaş, Özge Akkuş, Ayşen Apaydin, Erol Eğrioğlu, Yüksel Akay Ünvan and Özlem Türkşen, who have worked hard to bring the special issue into high level with 41 accepted papers for publication. The special issue is now distributed within 3 issues in Volume 48 of the Journal of Applied Statistics. Nearly half of those accepted papers came from theoretical statistics; however, we have categorized all those 41 papers more specifically, some of which may have been classified into two or more categories, in the following main topics:

  • Theoretical Statistics (10 Papers)

  • Regression Analysis (6 Papers)

  • Time -Series Analysis (4 Papers)

  • Economics, Business & Finance (9 Papers)

  • Machine Learning and Other Topics (12 Papers)

The issue aims to serve the readers with recent and novel advancements and future trends in statistics with applications in various areas. We hope that the readers enjoy and get the most benefit from the special issue.

Theoretical statistics

Theoretical statistics concerns the study and development of the mathematical, computational, and philosophical foundations of statistics. Most papers within the special issue can be classified into theoretical statistics; here, we are presenting ten representative papers in this category.

Gündüz and Aydın [Citation1] provided simulation-based exploration and characterization of the two most crucial kernel density functionals that play a central role in kernel density estimation, considering the probability density functions that are members of the location-scale family. They presented an alternative approach, called the Cauchy-scale estimators, to obtain preliminary bandwidth estimates by providing a simulation study with a comprehensive characterization of different contamination levels, constructed for random samples from normal distributions with various parameters. They proved that the proposed preliminary bandwidth selection shows the lower variance in both mixture and contaminated data. Yılmaz, Kara and Özdemir [Citation2] focused on determining the best estimators of the unknown parameters of the extreme value distribution by considering classical and Bayesian methods. For classical estimation methods, they used maximum likelihood estimators, moment’s estimators, least squares estimators, and weighted least squares estimators, percentile estimators, the ordinary least squares estimators, best linear unbiased estimators, L-moments estimators, trimmed L-moments estimators, and Bain and Engelhardt [Citation3] estimators. For Bayesian estimation methods, they use Lindley’s approximation and Markov Chain Monte Carlo methods. At last, they conducted a simulation study to compare the performances of those estimation methods concerning their biases and mean square errors. Eini and Khaloozadeh [Citation4] proposed a theorem that expands the tail conditional moment (TCM) measure from elliptical distributions to broader classes of skew-elliptical distributions suitable for modeling asymmetric phenomena. The authors obtain the analytical formula for the nth TCM for skew-elliptical distributions to help figure out the risk behavior well along the tail of loss distributions by deriving four significant results, followed by the generalizitaion of the tail conditional skewness (TCS) and the tail conditional kurtosis (TCK) measures for generalized skew-elliptical distributions, to determine the skewness and the kurtosis in the tail of loss distributions. Arnold and Manjunath [Citation5] targeted bivariate distributions, Pseudo-Poisson distributions, with one marginal and the other family of conditionals being of the Poisson form. They also discussed distributional features of such models, explored inferential aspects and included an example of applications of the Pseudo-Poisson model to sets of overdispersed data. Türkşen [Citation6] studied the parameter estimation procedure of the seemingly unrelated nonlinear regression (SUNR) based on the nonlinear least squares (NLS) method andL2-norm. The study’s novelty of includes proving the applicability of the least absolute deviation (LAD) method, defined in L1-norm, with the NLS method simultaneously for obtaining parameter estimates of the SUNR model in a multi-objective perspective using soft computing methods. Yamaguchi, Yamaguchia and Nishiib [Citation7] provided predictions of regression coefficients so that the prediction error follows a generalized Gaussian distribution. In the proposed method, they minimize the expected value of the asymmetric loss and lower the variance of the loss. Çakmak and Doğru [Citation8] proposed the application of the optimal B-robust (OBR) estimation method, which is resistant to outliers, to estimate the parameters of power Lindley (PL) distribution by providing a simulation study and a real data example to compare the performance of the OBR estimators with the performances of the ML, LS, and the regression M estimators. Akdur [Citation9] proposed a unit-Lindley mixed-effect model with appropriate likelihood analysis methods for parameter estimation. In the study, parameter estimations of a unit-Lindley mixed-effect model are obtained with Laplace and adaptive Gaussian quadrature approximation methods. The work provides a simulation study and an application of the proposed unit-Lindley mixed-effect model to a real dataset to show that it is a better fit than the unit-Lindley regression model and beta-mixed model. Evkaya et al. [Citation10] created a CD-vine mixture model, expressing the dependencies between variables in the temporal order.Cumulative distribution function values generated within the time components are tied together with D-vine probabilistically to accomplish this. With this approach, the dependence structure between variables at each time point is explained by C-vine, and the dependence among the time points is captured by the D-vine model [Citation11].

Regression analysis

In statistical modeling, regression analysis is a set of statistical processes for modeling functional relationships between a dependent or a response variable and one or more independent variables or explanatory variables. The most widely encountered form of regression analysis is the linear regression used in various disciplines of science for estimation or forecasting. For this topic, we have selected 6 different studies to highlight.

Deliorman and İnan [Citation12] developed a new diagnostic tool that does not require any distributional assumptions and is also not affected by masking and swamping effects for identifying influential observations, using the meta-heuristic binary particle swarm optimization algorithm. So the tool is robust. A different aspect of robustness in the linear regression analysis was studied by Göktaş, Akkuş and Kuvat [Citation13]. They developed a ridge parameter that is robust to any sample size or collinearity level. They proposed their ridge parameter estimation using an interesting search method. Ebegil, Özdemir and Gökpınar [Citation14] adopted the Ridge and Liu-type estimators into some shrinkage estimators using a median ranked set sample in the presence of multicollinearity. Sancar and İnan [Citation15] proposed a Liu-type logistic estimator as a two-parameter estimator to overcome the multicollinearity problem in the logistic model to insist that using the two-phase approaches is not guaranteed to overcome multicollinearity. Their new alternative method based on particle swarm optimization is suggested to estimate (k, d) pair in Liu-type logistic estimator, simultaneously. Altun [Citation16] introduced a new regression model, the Lomax regression model. Its performance is evaluated using a Monte-Carlo simulation as an alternative to the gamma regression model. The model is used to analyze an insurance dataset, which illustrates the superiority of the proposed regression model over the gamma regression model. Muñoz-Pichardo et al. [Citation17] proposed a multivariate model based on the Poisson distribution, allowing positive and negative correlations between the components by extending the log-linear Poisson model in the multivariate case through the conditional distributions. They illustrated the application of the proposed method to several datasets: various simulated datasets and a count dataset of multiple fossil species.

Time-series analysis

Dependence is the fundamental nature of the time series. The use of highly correlated high-dimensional time-series data introduces many complications and challenges. The methods and theories to solve these issues make up the content of time-series analysis in many ways [Citation18]. In this topic, we present 4 papers.

Gil-Alana and Yaya [Citation19] presented, in the context of nonlinear time-series analysis, a testing procedure that performs well for finite samples for fractional orders of integration, in the context of nonlinear terms approximated by Fourier functions. Eroğlu and Yıldırım [Citation20] introduced a new unit root test that combines the variance ratio framework with the Flexible Fourier Form, under the generalized least squares detrending mechanism. Their proposed test guarantees to detect any unit root correctly when there is serial correlation or structural breaks in the time-series data. Cetin and Yavuz [Citation21] focused on a recently proposed forecasting method, ATA, as an alternative to the exponential smoothing method. They studied the method for real and generated datasets that have no or linear trend and according to the results, the ATA approach outperforms exponential smoothing for both types of time-series data when forecasting the near and distant future. Evkaya and Kurnaz [Citation22] studied mainly models for forecasting drought index. They considered the time-series data with its wavelet transformation to investigate Nonlinear Auto-Regressive and Nonlinear Auto-Regressive with External Input (NARX)-type Neural Network (NN) models. Their findings hold promise to increase the capacity of forecasting the drought index.

Economics, business and finance

Economics, Business and Finance have always attracted researchers who want to work in the financial sector to explore relationships between economics and finance. Researchers studying economics deal mostly with recession, inflation, interest rates – learn how they impact each other and the outside world. Within the topic, we have 8 different papers summarized as follows.

Balcı, Akgüller and Güzel [Citation23] studied the hierarchical evolution of the market communities throughout the Brexit referendum, which is known as the stress period for the stock market. This is divided into two sub-periods of pre-referendum and post-referendum to obtain communities and hierarchical structures. Their results indicated that financial companies are leading elements of the clusters. Yıldırım et al. [Citation24] investigated the environmental quality impacts of social capital and central government expenditures on environmental protection, taking spatial dimension into account from 2009 to 2017 for Turkey. They adopted a general-to-specific approach where spatial variations in the relationships have been examined with a dynamic spatial Durbin model, using the panel data at the NUTS3 level. Atukalp [Citation25] examined the relationship between the stock return and financial performance of Turkish deposit banks via the CRITIC method, the TOP-SIS method, and Spearman’s rank correlation analysis from 2014 to 2018. Their results show no statistically significant correlation between the stock return ranking and financial performance rankings of deposit banks in Turkey. Metin et al. [Citation26] studied native calibration approaches and the weight trimming processes used in large datasets with extreme consequences and different correlation structures. They applied those approaches to the 2017 Annual Industry and Service Statistics data and found that restricted calibration estimators were more efficient than the generalized regression estimator in estimating the variables with high variance such as turnover. Momtaz [Citation27] discussed the endogeneity problem and methods to ‘debias’ time-to-event models in entrepreneurship. From the simulation and empirical results, the work indicates that only the frailty approach yields consistently unbiased parameter estimates. Külekci and Selcuk-Kestel [Citation28] investigated the future mortality and longevity risk with different age structures for three other countries. Additionally, they proposed the credibility approach to establish a reliable estimate for the annuity net single premium. Chen et al. [Citation29] revitalized the investigation of the classical cusp catastrophe model, which is challenging because its associated transition density, along with the likelihood function, is analytically intractable. They also proposed a novel Bayesian approach combining Hamiltonian Monte Carlo with two likelihood approximation methods, namely, Euler approximation and Hermite expansion. Their novel approach was validated via a series of simulation studies and was applied to the real USD/EUR exchange rate dataset. Ehrhardt et al. [Citation30] showed that theoretical and empirical results reinforce the idea to use the classical reject inference methods carefully and invest in future research works for designing model-based reject inference methods. Evren et al. [Citation31] intended to provide a new insight into the concentration and dominance indices as the concerns grow about the increasing concentration in the markets around the world. They proposed two normalized dominance measures that can be derived from the work of Wilcoxon qualitative variation. By some simulations, asymptotic behaviors of these indices are analyzed under some assumptions about the market structure. They have also supported their study with an application on Turkish car sales in 2019. They concluded that one of the dominance indices was determined to have the advantage of having fewer errors in estimation, less sensitivity to smaller market shares, and less sampling variability.

Artificial intelligence, machine learning, big data and other topics

Kabran and Ünlü [Citation32] studied the prediction of the bubbles in the S&P 500 stock market with a two-step machine learning approach that employs a real-time bubble detection test and support vector machine (SVM), which is a widely used method in financial time-series forecasting as a nonparametric binary classification technique. The bubbles are achieved in two steps, and their experimental results demonstrate that the proposed approach with high predictive power could be a favorable alternative in bubble prediction. Güler and Güler [Citation33] proposed a Mixed Lasso (M-Lasso) estimator that incorporates stochastic linear restrictions to big datasets for selecting the true model and estimating parameters simultaneously. Their proposed approach was validated with a simulation study and, according to the results, M-Lasso is superior in terms of MSE, and it generally dominates the compared estimators following the model selection criteria. Yıldırım et al. [Citation34] proposed a new artificial neural network type, called a threshold single multiplicative neuron artificial neural network, with two data generation processes. Yerlikaya-Özkurt et al. [Citation35] proposed a new procedure for estimating the parameters in the partially nonlinear models. They considered penalized profile nonlinear least squares problem where nonparametric components are expressed as a B-spline basis function. Then the estimation problem is expressed in terms of conic quadratic programming, which is a continuous optimization problem and solves the interior point problem. Erdiş et al. [Citation36] proposed a method for determining the mode-mixing problem in which the Itakura–Saito distance, which is a measurement of the similarity of stationary signals and based on Fourier spectrums, is modified by applying Kaiser filter to short-time signals. O’Brien and Silcox [Citation37] provide far-reaching practical design strategies for dose-response model fitting and estimation of relative potency using key illustrations. Those strategies are supported by both theoretical and simulation results. Gerger and Firuzan [Citation38] applied a conceptual Six Sigma/design of experiment hybrid framework to integrate the Taguchi method and Six Sigma for process improvement in a complex industrial environment. In this context, the Six Sigma methodology has been used by a company operating within the automotive industry to improve a manufacturing process, which has caused a customer complaint about the company. Yazıcı and Çavuş [Citation39] stated two approaches to compute the p-value of the GF test, based on beta and chi-squared random numbers. From prior art in the literature, it appears that the two computation approaches of the Generalized tests are equivalent. In their study, an extensive Monte-Carlo simulation study in terms of Type I error probability and penalized power has been conducted to investigate the equivalence of these approaches. Rusyda et al. [Citation40] discuss horticultural multicrop insurance products based on revenue risk triggered by low prices, low yields, or a combination of both. They stated that in designing multicrop insurance products, it is important to model the variability of revenue risk through the implementation of copula toward crop yield and price and estimate the indemnity of the revenue-based multicrop insurance. Their results have shown that multicrop revenue insurance tends to reduce the price of agricultural insurance in Indonesia. The main goal of Karaibrahimoglu et al. [Citation41] is to evaluate the descriptive statistics of diagnosis, operation, and last examination dates in gastric carcinoma patients by circular analysis methods with a total of 502 gastric carcinoma patients. They checked the fitting of distributions of all variables also, following von Mises, Rayleigh, and Kuiper’s tests. When they analyzed the days and months by classical descriptive statistics, their results were completely different from the circular analysis results. Therefore, they concluded that the dates and times should be analyzed in certain diseases to give an idea for physicians. Chen et al. [Citation42] proposed a comprehensive marketing mix model that captures the hierarchical structure and the carryover, shape and scale effects of certain marketing activities and sign restrictions on certain coefficients that are consistent with common business sense. Their proposed approach estimates all the unknown parameters simultaneously using a constrained maximum likelihood approach and a Hamiltonian Monte Carlo algorithm. Gündüz and Fokoué [Citation43] used Nonnegative Matrix Factorization (NMF) and several other state-of-the art statistical machine learning techniques to provide an in-depth study of university professor evaluations by their students. In their study, they specifically used the Kullback-Leibler divergence as our loss function in keeping with the type of the data and extracted revealing patterns consistent with the educational objectives underlying the questionnaire design. They applied their methods to a dataset gathered at Gazi University in Turkey that reveals compelling patterns, such as the strong association between the student’s seriousness and dedication (measured by attendance) and the kind of scores the students tend to assign to the courses and the corresponding professors.

We sincerely thank Prof. Jie Chen, the Editor-in-Chief of the Journal of Applied Statistics, for her endless support from establishing the special issue to its completion. We thank our team of special issue guest editors for putting tremendous effort during the peer-review process of the submissions to the special issue.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.