Search in:

Journal of Small Business Management Volume 61, 2023 - Issue 5: Special Issue on Entrepreneurship Processes

Submit an article Journal homepage

Open access

2,312

Views

CrossRef citations to date

Altmetric

Listen

Research Articles

Predicting New Venture Gestation Outcomes With Machine Learning Methods

Paris Koumbarakisa University of St Gallen, Swiss Institute for Small Business & Entrepreneurship, Switzerland

https://orcid.org/0000-0003-0078-1962

Thierry Voleryb Zurich University of Applied Sciences, School of Management & Law, SwitzerlandCorrespondence[email protected]

Pages 2227-2260 | Published online: 15 Jun 2022

Cite this article
https://doi.org/10.1080/00472778.2022.2082453
CrossMark

In this article

ABSTRACT
Introduction
Background
Methodology
Results
Discussion
Conclusion
Disclosure statement
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

ABSTRACT

This study explores the use of machine learning methods to forecast the likelihood of firm birth and firm abandonment during the first five years of a new business gestation. The predictability of traditional logistic regression is compared with several machine learning methods, including logistic regression, k-nearest neighbors, random forest, extreme gradient boosting, support vector machines, and artificial neural networks. While extreme gradient boosting shows the best overall model performance, neural networks provide good results by correctly classifying entrepreneurs who have not abandoned their business venture in the early stage of the gestation process. In addition, this study provides valuable insights in relation to the start-up activities leading to firm emergence. Entrepreneurs who perform a greater number of activities and who can orchestrate them at the right rate, concentration, and time are more likely to successfully launch a new business venture.

KEYWORDS:

New venture creation
forecasting
machine learning

Introduction

Launching a new business venture requires a lot of time and effort. According to one estimate, US business angels invested more than $26 billion into start-ups in 2018 (WBAF, Citation2020) and the time entrepreneurs devote to starting new firms amounts to 2.7% of total paid work (Reynolds & Curtin, Citation2011). The related sunk costs in start-ups are correspondingly high, considering that the majority of entrepreneurs abandon their business idea in the first five years of a new business creation (Reynolds, Citation2017). Against this backdrop, investors and service providers continuously face the decision to put extra time, effort, and money to support entrepreneurs and their fledgling business ventures; but how do they know whether these entrepreneurs will launch a profitable business venture, quickly abandon their business idea, or spend years pottering about a business idea that never gets traction? Accurate prediction models of venture gestation could help investors and other stakeholders optimize their resource allocation.

In this study, we contribute to the literature on new venture creation by tackling two research questions: (1) What model(s) best predict firm birth and firm abandonment? And (2) What are the single conditions leading to firm emergence? To this end, we use artificial intelligence (AI) techniques to forecast the likelihood of firm emergence and firm abandonment during the first five years of a new business gestation. By exploring the combinations of single conditions, and by using machine learning methods to forecast firm emergence, we aim to generate new insights on the start-up process and not just test theory. This approach is warranted because despite the multitude of research on firm emergence, the extant literature on start-up processes remains fragmented (Davidsson & Gruenhagen, Citation2020). In addition, previous research seems to have reached an empirical dead-end in trying to identify what combinations of single conditions are more likely to explain firm emergence (Arenius et al., Citation2017). The need to uncover the necessary conditions for firm emergence is thus more actual than ever.

There exist many prediction models which have identified start-up success and failure factors across different countries (for example, Lussier & Claudia, Citation2010; Mayr et al., Citation2020). However, these models have several limitations. First, they relate to the later stage of the start-up process, rather than the whole gestation period (Reynolds, Citation2017). For example, past studies often adopted a firm perspective, drawing on data collected from owners of already established or liquidated companies (Lussier, Citation1995). Second, most prediction models have traditionally drawn on linear or logistic regression methods. Recently, entrepreneurship scholars have begun using machine learning (AI) techniques to address multiple research questions related to new venture dynamics (Antretter et al., Citation2019; Van Witteloostuijn & Kolkman, Citation2019; Weinblat, Citation2018). These AI techniques often outperform traditional regression-based models, providing higher prediction accuracy (Loureiro et al., Citation2018) and detecting ambiguity of interaction and nonlinear effects in input data (Gerasimovic & Bugaric, Citation2018). Third, AI models, specifically the supervised learning model types, provide great added value in predictive tasks since they are specifically designed for such purposes (Obschonka & Audretsch, Citation2019).

We provide a longitudinal, multifaceted perspective on the start-up process by drawing on a large set of harmonized international panel data used to consider multiple factors affecting firm emergence, including start-up activities and their evolution over time, as well as business, industry, and entrepreneurs’ characteristics. In this context, AI can help reveal unexpected patterns in a data set and potential connections between otherwise unrelated issues, which in turn could serve as a basis for developing new theories in entrepreneurship (Lévesque et al., Citation2020; Obschonka & Audretsch, Citation2019). By exploring the combinations of single factors using new AI methods to forecast firm emergence, we do not aim to test theories, but to garner important new insights into the field of entrepreneurship.

The remainder of this paper is structured as follows: the second section provides a review of the factors influencing the venture creation process and a rationale for the quantitative exploratory approach. The third section describes the methodology, including the data set, variables, and empirical methods. The fourth section details the results, followed by a discussion section and the conclusion.

Background

Factors influencing venture gestation outcome

The prediction of venture gestation outcome has been a central question in entrepreneurship research over the past two decades (Tornikoski & Newbert, Citation2007; Van Gelderen et al., Citation2005). Given the prevalence of the entrepreneur and the lack of financial information in the early stages of the business gestation process, past research has focused predominantly on nonfinancial elements such as the occurrence and sequence of gestation activities (for example, prototype development, market validation, team recruitment, raising capital) which may impact the likelihood of firm birth and firm abandonment (Burnaev et al., Citation2015; Newbert, Citation2005).

Empirical research in this field has mainly drawn from Panel Study of Entrepreneurial Dynamics (PSED) research programs or studies following a similar design (Davidsson & Gordon, Citation2012; Reynolds & Curtin, Citation2011). PSED provides reliable and generalizable data on the process of business formation. It includes information on the characteristics of the adult population attempting to start new businesses, the kinds of activities nascent entrepreneurs undertake during the business start-up process, and the proportion and characteristics of the start-up efforts that are launched. Even though the results regarding the factors that impact the gestation outcome are diverse, the following patterns have emerged from this stream of research ().

Table 1. Predictors of firm emergence.

Download CSV Display Table

First, the type as well as the amount of gestation activities appear to impact the venture gestation outcome (Chwolka & Raith, Citation2012; Honig & Samuelsson, Citation2012). In fact, “what nascent entrepreneurs do may be more important than whom they are and what product-markets they intend to serve” (Tornikoski & Newbert, Citation2007, p. 313). Taking a closer look at single start-up activities, business planning and raising funds have been closely related to the likelihood of firm birth. Longitudinal studies suggest that business planning plays an important role in the gestation outcome (Liao & Gartner, Citation2007; Newbert & Tornikoski, Citation2012). Specifically, business planning facilitates goal attainment as it helps founders to undertake more valuable actions to develop their fledgling enterprises (Delmar & Shane, Citation2003). When it comes to start-up capital and funding, the percentage of ownership as well as external equity influence birth outcome (Hechavarria et al., Citation2012; Van Gelderen et al., Citation2005). For example, financial capital significantly decreases the odds of discontinuance (Liao et al., Citation2009). Furthermore, networks and activities that connect the nascent entrepreneur with others appears to positively impact firm birth (Newbert & Tornikoski Citation2012; Newbert et al., Citation2013). However, the impact of specific activities is inconsistent (Chwolka & Raith, Citation2012; Delmar & Shane, Citation2003) and even though some level of activity is needed, no single gestation activity appears necessary to achieve firm birth (Arenius et al., Citation2017; Shim & Davidsson, Citation2018).

Next to specific activity types, three factors time, rate, and concentration have been studied to assess the “complexity dynamics” (Lichtenstein et al. Citation2007) of the gestation process. There is some evidence that the timing of start-up activities (whether the bulk of the organizing activities is accomplished earlier or later during the start-up process), the rate (the number of start-up activities undertaken over a period of time) and the concentration (how closely start-up activities are undertaken in relation to each other) has an impact on firm emergence (Hopp & Sonderegger, Citation2015; Lichtenstein et al. Citation2007). New ventures are more likely to emerge when entrepreneurs conduct gestation activities at a faster rate, in lower concentration, and with an average timing at a later stage in the gestation process.

Next to start-up activities, it is widely recognized that personal characteristics, and in particular personal agency such as self-directedness and self-efficacy, have a positive impact on firm emergence and a negative relationship with firm abandonment (Dimov, Citation2010; Hechavarria et al., Citation2012; Khan et al., Citation2014). In addition, personal background characteristics including entrepreneurial experience (Rotefoss & Kolvereid, Citation2005; Van Gelderen et al., Citation2005), industry experience (Dimov, Citation2010), education attainment (Hopp & Sonderegger, Citation2015), and age (Liao et al., Citation2009) can affect the start-up activities conducted and accelerate the speed toward business launch.

With regard to the founding team, initial team size (Chandler et al., Citation2005), team resource heterogeneity (Muñoz-Bullon et al., Citation2015) as well as balanced team experience (Delmar & Shane, Citation2006; Thiess et al., Citation2016), positively influence the firm birth. In the context of homogenous teams, also with regard to homogenous start-up experience, there exists a negative relationship toward firm performance in the long-term (Steffens et al., Citation2012). Although contradicting results subsist (Tornikoski, Citation2008), the influence of the team on the venture outcome has largely been validated.

Next to the human capital perspective, there is wide agreement that social capital (the resources embedded in entrepreneurs’ personal networks) is critical for the emergence of new firms. For instance, network connections enable entrepreneurs to identify new business opportunities, marshal resources, and secure legitimacy from external stakeholders (Clough et al., Citation2019; Hallen, Citation2008). In their study on nascent entrepreneurship comparing individuals engaged in start-up activities with a control group of nonentrepreneurs in Sweden, Davidsson and Honig (Citation2003) found that social capital variables were very strong and consistent predictors of firm emergence. Both bonding and social capital based on strong ties, such as having parents who owned businesses or close friends who owned businesses, and bridging social capital based on weak ties were found to be a good predictor of nascent entrepreneurship. Recent research on accelerator and entrepreneurship education programs (Hallen et al., Citation2020) has documented how nascent entrepreneurs interact and learn within an accelerator, further expanding their social network in the process. These programs are important mechanisms for nascent entrepreneurs to attract resources and to convey quality and legitimacy.

Context variables matter too. For example, the industry in which the venture evolves has an impact on the gestation outcome. Ventures within the service industry are likely to be more rapidly operational and profitable (Steffens et al., Citation2012; Van Gelderen et al., Citation2005). Market dynamism is another important contextual variable. For example, the number of gestation activities for nascent entrepreneurs operating in low-velocity markets is greater than for nascent entrepreneurs operating in moderate-velocity markets (Newbert, Citation2005). The use of technology is a key determinant of market dynamism. Technology-based entrepreneurs typically create business ventures operating in more dynamic and uncertain environments, and they engage in more planning, legitimacy establishment and resource acquisition activities (Liao & Welsch, Citation2008). Economic crises tend to have a negative impact on the emergence of new business ventures through a drastic drop in demand for goods and services (Giotopoulos et al., Citation2017; Vegetti & Adăscăliţei, Citation2017).

Rationale for an exploratory quantitative approach

Despite the plethora of research, the rich literature on firm emergence is surprisingly limited in volume and results remain fragmented (Davidsson & Gruenhagen, Citation2020). Scholars have recently suggested that no particular gestation activity is necessary to achieve firm birth and that only a low number of activities is necessary for reaching initial profits after 24 months of gestation (Arenius et al., Citation2017). Two main reasons explain this fragmentation. First, past studies have typically adopted one single perspective or theoretical anchor, such as creative agency (Hechavarria et al., Citation2012; Khan et al., Citation2014), planning theory (Honig & Samuelsson, Citation2012; Liao & Gartner, Citation2007; Newbert et al., Citation2013), and human capital (Hopp & Sonderegger, Citation2015; Muñoz-Bullon et al., Citation2015; Steffens et al., Citation2012), therefore shedding light on one aspect of the new venture creation process at a time. Second, as a corollary, previous studies of gestation activities have primarily been content with a partial understanding of organizing activities as sufficient conditions (Arenius et al., Citation2017). However, the relative importance of each activity and an understanding of which activities constitute necessary conditions for firm emergence has yet to be established. Third, because of the temporal heterogeneity of venture creation processes, scholars have often focused on the achievement of partial milestones, such as receiving external funding or generating first sales. As a result, “it has been very difficult to find either general patterns in or explanations for the entire sequence of gestation activities” (Davidsson & Gordon, Citation2012, p. 858).

Recognizing that prior research may have reached an empirical dead-end in trying to identify gestation activities as sufficient conditions for firm emergence (Arenius et al., Citation2017), we seek to explore the combinations of single conditions which are more likely to explain firm emergence. Our approach thus departs from most recent research on new venture gestation characterized by a proliferation of quantitative work aiming to extend or add new theoretical understanding, a procedure which could explain the slow rate of cumulative research progress in the field (Wennberg and Anderson Citation2020).

To this end, we use machine learning methods to forecast the likelihood of firm emergence and firm abandonment during the first five years of a new business gestation. By exploring the combinations of large sets of variables and adopting new methods to forecast firm emergence, we aim to generate new insights, rather than just testing theories. Our context of applying machine learning methods to a large data set (PSED) implies that we are engaging in exploratory data-driven empirical research (Coad & Srhoj, Citation2019; Wennberg and Anderson Citation2020) as a fact-finding exercise that could help trigger novel theories which run counter to existing ones or broaden the scope of existing ones by identifying variables and relationships from areas ignored in the past (Lévesque et al., Citation2020).

Methodology

Data set

This study draws on five longitudinal data sets from Panel Study of Entrepreneurial Dynamics (PSED) research program in four different countries: the US PSED I (1998–2004) and US PSED II (2005–2008), the Swedish Panel Study of Entrepreneurial Dynamics, the Comprehensive Australian Study of Entrepreneurial Emergence, and the Chinese PSED. These data sets have been harmonized into one data set which comprises 3537 nascent entrepreneurs (Reynolds et al., Citation2016).

PSED provides valid and reliable data on the process of business formation based on nationally representative samples of nascent entrepreneurs. Its design is based on a population screening interview to identify nascent entrepreneurs and a series of subsequent interviews to track their progress toward their business launch. In this study, entrepreneurs were tracked over a period of 60 months following their first identification as “nascent” in the screening interview. To be classified as nascent, entrepreneurs had to perform at least two gestation activities (for example, develop, a prototype, draft a business plan, register a business, open a bank account, recruit first employee, and so on) in the 12 months prior to the screening interview. Based on this entry point date, nascent entrepreneurs provided information about the completion of subsequent gestation activities and the outcome of the gestation process (that is, firm birth, firm abandonment, and ongoing gestation) in three-month intervals. As a result of this selection procedure, our data set consisted of 1457 nascent entrepreneurs.

In terms of gestation outcome, PSED defines firm birth as the presence of monthly profits that cover expenses and owner salaries, firm abandonment as having stopped working on the business idea, and ongoing gestation as entrepreneurs still pursuing their business idea, having neither set up a profitable new firm nor abandoned their business idea (Reynolds et al., Citation2016). In this study, we focus on two gestation outcomes: firm birth and firm abandonment. It should be noted that firm abandonment occurs during the gestation process before the firm emerges and becomes profitable. We discarded nascent entrepreneurs in ongoing gestation. These individuals have often been characterized as “dilettante dreamers” or “hobbyists” (Davidsson & Gordon, Citation2012; Reynolds & Curtin, Citation2011). They meet the screening criteria, but show low levels of activity and do not seem to be very serious about taking their start-up idea to the market (or to termination) in follow-up interviews.

Variables

In addition to the two dependent variables firm birth and firm abandonment, 40 independent variables were included in our modeling (Appendix A). There are six broad types of independent variables: (1) personal characteristics (for example, gender, age, education, start-up experience, industry experience, management experience, (2) motivational elements (for example, motivation to start the business, growth preference), (3) venture related characteristics (for example, team size, ownership structure, industry, hi-tech venture), (4) start-up activities (for example, business plan prepared, outside funding received, own money invested into the business, patent or trademark applied, employees or managers hired), (5) a market related measure (crises), and (6) four complexity measures (rate, concentration, timing, and effort) which we explain hereafter. We did not include any variables measuring social capital because they were not available in our data set.

Rate is defined as the total number of start-up activities undertaken by the nascent entrepreneur divided by the duration of the gestation process of the new business. For example, if the nascent entrepreneur has conducted five different activities in month 1, 3, 6, 9, and 12. This sequence would equal five activities conducted within a time span of 12 months, resulting in a rate of .42, thus reflecting the average pace of organizing activities across the gestation process.

Concentration reflects how closely start-up activities are undertaken in relation to each other. It is operationalized in terms of the variance of monthly activity time. High values reflect a high dispersion of activities whereas low values indicate that more of the start-up activities are bundled together (for example, variance = 0 if all activities are conducted in one month). For example, cases with a start-up activity sequence of {1, 1, 1, 1} and {1, 3, 6, 6} have a concentration of 0 and 6 respectively.

Timing indicates whether the bulk of start-up activities is accomplished earlier or later during the start-up process. It is measured by taking the average event time divided by the duration of the gestation process. For example, the average event time related to the start-up activities {1, 3, 3, 6, 12, 12} with a duration of 12 months is 4.5. This figure is divided by the duration of 12 months, resulting in a timing of .375. Values close to 1 indicate that start-up activities occurred at the end of the gestation process whereas values close to 0 mean that activities occurred in the first months of the gestation process.

Effort captures the development of the start-up activities over the 60-month period. It is calculated by computing the difference between two periods for total amount of conducted activities in each period. For example, if the entrepreneur conducted 2 activities at t₀, which is the minimum number of activities to be considered nascent, and 5 activities six months later at t₁, the effort invested over this period is calculated by subtracting (t₁ –t₀) the number of activities divided by the duration, that is 6 months. The higher the value, the higher the effort in the time sequence.

The data set consists of numerical (for example, team size, rate, concentration, timing, and effort of the conducted start-up activities) as well as categorical (for example, education, motivation, and so on) variables. Start-up activities (for example, writing a business plan) are coded binary (0 = not conducted; 1 = conducted). External economic conditions such as dotcom and financial crises are considered, with a binary variable (0 = noncrisis year, 1 = crisis year).

Empirical methods

The goal of this study is to predict the likelihood of firm birth and firm abandonment over a 60-month gestation period using different machine learning techniques. The computation is based on independent variables after 12 months (t₁), 24 months (t₂), 36 months (t₃), and 48 months (t₄). The dependent variable firm birth was coded as 1 = firm birth and 0 = otherwise, and firm abandonment was coded 1 = firm abandonment and 0 = otherwise.

We followed three steps to conduct our analysis: (1) data preprocessing as previously outlined, (2) optimization, and (3) evaluation. The optimization phase included selecting the optimal set of independent variables as well as conducting hyperparameter tuning for each applied technique using either a grid search (GridSearchCV) or a random search (RandomSearchCV) approach with k-fold cross-validation (k = 10). Both approaches are common techniques to optimize the models hyperparameters and to derive an optimized model (Bergstra & Bengio, Citation2012; Vo et al., Citation2019). The choice of the approach was based on the computational power needed (that is, random search for k-nearest neighbors, decision tree, random forest, XGBoost, support vector machine, artificial neural networks, and grid search for the logistic regression). The evaluation step included the testing process with the model comparison. An overview of the methodology is shown in .

Figure 1. Schematic outline of the analytical approach.

After the data preprocessing, the initial step was to split the data into training, validation, and testing data sets. Training and validation included 75% of the observations (n = 1089) while 25% of the data set are used as test and hold-out set (n = 363) to validate the performance of the models. Given the imbalanced data set, the data split included stratification based on the outcome variable.

To further optimize the models, different variable combinations were tested to identify better performing models. In so doing, we applied seven different feature selection methods: Pearson correlation coefficient, chi-squared, random forest, decision tree, lasso regression, XGBoost and recursive feature elimination (RFE) using three different classifiers (logistic regression, k-nearest neighbors, support vector machine). A feature was considered important if it was selected in seven out of the nine applied techniques. As one of the goals in this study was to analyze the differences in feature importance between the time sequences (t₁–t₄), this procedure was repeated four times for each set of independent variables.

Following the features selection, the train and validation procedure builds on a k-fold cross-validation step (k = 10) to compute the highest F1-score for each model and time sequence. This cross-validation procedure ensures that any sampling biases can be eliminated from the training process (Topuz et al., Citation2018). To ensure that the selected features improve the model performance, a comparison of the mean values from each k-fold of the full and feature-selected model was conducted.

As we aimed to identify entrepreneurs who are more likely to achieve firm birth or to abandon the venture, the minority classes of firm birth and firm abandonment needed to be accurate. Thus, during the training process of the models, we applied four class balancing techniques (Burnaev et al., Citation2015) to put more weight on the firm birth and firm abandonment predictions: synthetic minority oversampling technique (SMOTE; Han et al., Citation2005), adaptive synthetic sampling (ADASYN), neighborhood cleaning rule (Laurikkala, Citation2002), and edited nearest neighbor (Raniszewski, Citation2010). Depending on the model, we applied the technique that increased the F1-score and provided reasonable metrics for the area under the receiver operating curve and accuracy (ROC AUC), while keeping the recall or precision metrics on a relative acceptable level. The F1-score is then defined as the harmonic mean of precision and recall. The closer the value to 1, the better the model.

We used three criteria to evaluate the models: (1) accuracy for the probability of correct classifications, (2) precision as well as recall while using a confusion matrix for recognizing the nascent entrepreneur of a certain group (that is, firm birth or firm abandonment), and (3) ROC AUC and F1-score for evaluating the overall performance (Topuz et al., Citation2018; Veganzones & Séverin, Citation2018). These measures are calculated for each k-fold (k = 10), averaged for each classification technique and sequence to obtain an overall estimate of the performance of the model. Finally, all trained models were evaluated with the hold-out, test data set.

Classification methods

This paper tackles a typical classification problem using an imbalanced data set. We used several classification methods, including classical logistic regression and five machine learning methods (k-nearest neighbors, random forest, XGBoost, support vector machine, artificial neural network) to predict the likelihood of firm birth and firm abandonment.

K-nearest neighbors’ algorithm

Popular in pattern recognition and to solve classification problems, the k-nearest neighbors’ (k-NN) algorithm is considered an efficient and relatively simple, easy-to-implement supervised machine learning algorithm (Wang et al., Citation2013). The algorithm is based on the concept that data points of the same class should be closer in the feature space. The distance can be defined via the number of samples closest in distance to the new, prediction point (k-nearest neighbor learning). Due to its simplicity and effectiveness, this algorithm has been applied in a variety of settings and has generally provided robust results (Li & Wang, Citation2015).

Using RandomSearchCV, our configuration included the number of neighbors ranging from 2–70 in steps of two, four different algorithm types (auto, ball tree, k-d tree, and brute force), leaf size in steps of 5 ranging from 10–40, as well as weights (uniform or distance) and a power parameter (1 or 2).

Random forest

This technique consists of a large number of individual decision trees that operate as an ensemble and overcome weaknesses of simple decision trees, such as high sensitivity to small variations in data (Loureiro et al., Citation2018). Each tree is grown using a random subset of the input variables and at each split a random sample of predictors is examined. The tree is then allowed to grow fully. Thus, no pruning techniques are required. In addition, RF is very user friendly as it requires the researcher to determine two main parameters (that is, the number of variables used for building the individual trees and the number of trees) (Antretter et al. Citation2019). Random forest has provided good performance in recent studies in entrepreneurship (Sabahi & Parast, Citation2020; Xu et al., Citation2018).

The number of trees to grow were tested for the values 10, 50, 100, 150, 200, 250, 500, 1000, 1500, 2000. Values of depth ranged from 2–50 in steps of 2, while the splits ranged from 10–200 in steps of 25.

Extreme gradient boosting

Extreme gradient boosting (XGBoost) is an efficient implementation of the gradient boosting approach (Friedman, Citation2001). Simply put, while boosting refers to modifying weak learners (that is, decision trees) to strong learners, the objective of gradient boosting is to minimize weakness based on a gradient decent approach, addressing the loss of the model by adding weak learners. XGBoost is an improved model that introduces a regularized model formalization, thus reducing overfitting and increasing the predictive performance. Among others, XGBoost has been applied to predict new venture creation (Antretter et al. Citation2019) and survival (Climent et al., Citation2019).

We tuned a variety of parameters to compute XGBoost. A “gbtree” booster was applied and the parameter tuned to include the number of trees (500, 1000), depth (3, 5, 7, 9), learning rate (.01, .1, .2, .3), gamma (0–.4) and subsampling with values ranging from .5 to .9 with steps of .1.

Support vector machine

Support vector machine (SVM) uses a subset of training points in the decision function (called support vectors), so it is also memory efficient (Kraus & Feuerriegel, Citation2017). Because of the relative simplicity and flexibility for addressing a range of classification problems, SVMs have been effective even with limited sample sizes (Tu et al., Citation2019) and often outperform other classical statistical models (Chaudhuri & Bose, Citation2020). This learning method has been widely applied in different research fields, including the entrepreneurship literature (Blanco-Oliver et al., Citation2014; Tu et al., Citation2019), to predict a variety of outcomes such as credit rating (Huang et al., Citation2004) and financial distress (Blanco-Oliver et al., Citation2014).

Specifically, we used RandomSearchCV to examine the influence of two different kernel types (linear, rbf) as well as the cost and gamma parameters required for each kernel. For the parameter cost, values ranging from .1–2 with a step of were considered, for gamma, next to auto and scale, the boundary test values ranged from .5–5 with steps of .5.

Artificial neural network

We used a multilayer perceptron (MLP) model with two hidden layers drawing on backpropagation learning methods. Backpropagation refers to the training and learning process of a neural network and is currently one of the most widely used neural network algorithms (Huang et al., Citation2004; LeCun et al., Citation2015). Note that an artificial neural network (ANN) with more than one layer is often considered a deep neural network. ANNs have been previously applied in business contexts such as bankruptcy prediction models (Veganzones & Séverin, Citation2018) and offer several useful properties and capabilities such as nonlinearity, learning from examples, adaptivity and fault tolerance (Friedman, Citation2001).

For the MLP model, the configuration of the parameters was based on a random search k-fold (k = 10) approach. It included the number of hidden neurons ranging between 25 and 200 with a step of 25 for each hidden layer, different activation functions (relu, tanh, sigmoid, hard sigmoid, swish), different solvers (sgd, adam, nadam, adagrad, adadelta, rmsprop and lfbgs), with iterations ranging from 25–300 with steps of 25, batch sizes ranging from 10–150 with steps of 10, and three different learning rates (constant, adaptive, and in-scaling).

Results

Comparing prediction models

summarizes the results for each prediction model and sequence (t₁–t₄) of the new venture gestation. When being benchmarked against a simulated random classification, all models significantly outperform that baseline over all the observed periods (for example, ROC AUC > .5). In general, the power to predict firm birth or firm abandonment increases over time, in line with the additional information processed by the models over gestation stages (t₁–t₄). In addition, when using the logistic regression model as a benchmark, the analysis generated several important insights for firm birth and firm abandonment.

Table 2. Evaluation of classification models on test data set.

Download CSV Display Table

For firm birth, logistic regression achieves the lowest predictive power after 12 months (accuracy of 54.27%, a ROC AUC of 62.48% and an F1-score of 39.42%) compared to all machine learning models. Conversely, XGBoost achieves the highest predictive power (accuracy of 72.18%, a ROC AUC of 76.71% and an F1-score of 52.58%) for the same period. The predictive power of the logistic regression model increases over the next periods. However, it is still lower compared to the best performing AI model, XGBoost. Overall, our results reveal that XGBoost is the best model to predict firm birth over the different stages of the venture gestation.

For firm abandonment, different models perform better for each time sequence. For example, after 12 months, XGBoost achieves an accuracy of 52% with a F1-score of 55%. After 24 months, k-NN provides the best results with an accuracy of 60.06% and a F1-score of 55.11%, while SVM achieves an accuracy of 57.30% and a F1-score of 57.06% for the predictors after 36 months. The logistic regression model achieves a comparably good predictability after 48 months with an accuracy of 71% and an F1-score of 62%. The performance of ANN for predicting firm abandonment was in the mid-range compared to all the other models over the different gestation sequences.

Translating these numbers into a practical context, various aspects need to be considered. In this study, we aimed for balanced precision and recall metrics, thus tuning the models in order to increase the overall F1-score and evaluate the models based on a reasonable balance between F1-score, accuracy and ROC AUC. However, from a practical perspective, this tuning decision could vary because a higher precision score for firm birth and a higher recall score to identify firm abandonment may be preferred.

In the case of firm birth, if the model accidentally predicts that an investment into a profitable venture is bad (false negative), a chance to invest is missed. This would generate opportunity losses and losses for future financial gains from the missed investment into a newly profitable business venture. If the model predicts that the entrepreneur achieves firm birth, but they do not (false positive) and abandons the venture, the costs also directly relate to monetary and nonmonetary investments. Thus, in the case of identifying a successful nascent entrepreneur, the model with the higher precision should be preferred.

In this respect, XGBoost identifies 120 entrepreneurs as profitable with a precision of 53.33% after 36 months. Out of these 120 profitable entrepreneurs, 64 entrepreneurs are correctly identified as profitable while 56 entrepreneurs are wrongly classified as profitable, thus they have either abandoned or are still in the gestation process after 60 months. In the same gestation period, the neural network achieves a precision of 45.03% and identified 151 entrepreneurs as profitable, out of which 68 entrepreneurs are identified correctly and 83 entrepreneurs are wrongly classified as profitable. While more profitable cases are identified in the neural network model, the amount of wrongly classified entrepreneurs is higher. Thus, XGBoost achieves a better precision score. It is important to note that although models can be optimized for precision, this optimization will be at the expense of lower recall values.

In the case of firm abandonment, the model with a higher recall should be preferred if the goal is to identify entrepreneurs who do not abandon the venture. In other words, it is acceptable to have more false positives (for example, nascent entrepreneurs who are not abandoning are considered to have abandoned the venture) than false negatives (for example, nascent entrepreneurs who abandon the venture are not identified as such). For example, after 12 months, XGBoost predicts that 104 entrepreneurs do not abandon the venture after 60 months. Out of those, 81 entrepreneurs are correctly identified as such while 23 are wrongly classified as abandoning the venture. This equals to a recall value of 82.31%. In other words, if an investor invests in these 104 cases, only 17.69% of the resources invested were allocated to entrepreneurs who abandoned the venture after a period of five years are lost. ANN achieves a recall value of 81.54% and identifies 89 entrepreneurs who did not abandon the venture. Out of these 89 cases, 65 are correctly classified and 24 entrepreneurs are wrongly classified. Fewer cases are identified as not having abandoned the venture and thus, the XGBoost achieves a better recall score.

Factors leading to firm emergence and firm abandonment

summarizes the 10 most important factors predicting firm birth. The reflected factor importance in the figures is based on the results of random forest (RF), one of the classification methods used in our prediction models. RF is characterized by high robustness against overfitting and has delivered high prediction accuracy in a variety of studies in the context of entrepreneurship (Antretter et al. Citation2019; Sabahi & Parast, Citation2020; Xu et al., Citation2018).

Figure 2. Factor importance of different firm birth prediction sequences.

As shown in , the number of start-up activities conducted, the complexity dynamics (rate, effort, concentration, and timing of activities) as well as a few specific activities (having achieved first sales, having asked for supplier credit, having a formed a start-up team and hired initial employees) are crucial for firm birth. Comparing the different sequences of the gestation process, it appears that making financial projections, hiring employees, and having formed a start-up team play a bigger role at an early stage, while asking for supplier credit was important at a later stage (36 and 48 months).

In relation to complexity dynamics, our findings suggest the rate of start-up activities (average pace of organizing), the concentration (extent to which the pace is unstable or constant) as well as the timing (degree to which activities are carried out earlier or later through the process) influence the likelihood of achieving firm birth. Specifically, and in line with previous research (Lichtenstein et al. Citation2007), we find evidence that a high rate of start-up activities, a minimum pace of activities over time, as well as a tendency to conduct activities later rather than earlier in the process positively relate to firm birth.

We further contribute to the complexity dynamics perspective by adding a new factor, organizing effort, which captures the change in the number of start-up activities over a period. Our results suggest that organizing effort plays a crucial role in predicting firm birth, along with rate, concentration, and timing of activities. Looking at dynamics over time (t₁–t₄), different insights can be generated. First, after 12 months in the gestation process (t₁), all complexity dynamics (rate = 0.18, effort = .15, concentration = .14 and time = .09) play an important role in predicting firm birth. In the following periods (t₂–t₃), the impact of the rate variable (t₃ = .22) to predict firm birth increases while the impact of time (t₃ = .09), concentration (t₃ = .13) and effort (t₃ = .13) remained mostly stable. In the last period, the impact of rate (t₄ = .19) and effort (t₄ = .14) increases again, while the impact of the two other dynamics factors remain stable. From a process perspective, this indicates that complexity dynamics play a crucial, enduring role throughout the venture creation process. Moreover, the generation of initial sales and team formation variables (that is, initial employees hired) become increasingly important over time.

For firm abandonment (), a series of three single organizing activities (achieve initial sales, own money invested, phone lines installed), personal characteristics (educational attainment and industry experience), complexity dynamics, the industry type, and the number of start-up activities conducted appear to be the most important predicting indicators. In terms of timing, key activities such as generating initial sales or having invested the own money as well as certain personal characteristics such as the educational attainment are more important to predict firm abandonment during the earlier stages compared to the complexity dynamics whose importance to predict firm abandonment increases during the later stages. Finally, there are some variables whose importance to predict firm abandonment differ during different sequences.

Figure 3. Factor importance of different firm abandonment prediction sequences.

Discussion

By exploring the combinations of single conditions, and by using machine learning methods to forecast firm emergence, different insights from a methodological, theoretical, and practical perspective can be generated.

From a methodological point of view, our results suggest that machine learning methods significantly outperform a simulated random classification and thus provide a valid option to predict the likelihood of firm birth as well as firm abandonment. In addition, we found evidence that certain machine learning algorithms can, especially during an early stage with ambiguous information, outperform traditional regression-based models in predicting firm birth while preserving interpretability. One explanation for this could relate to the capability of complex models to detect ambiguity of interaction and nonlinear effects in input data (Gerasimovic & Bugaric, Citation2018), especially when available information for a clear classification at an early stage is scarce. This appears to be especially true for predicting firm birth.

Among the methods we examined, XGBoost was one of the most promising, while neural networks provided comparable performance metrics, suggesting that they can still be used for relatively small data sets. When comparing ANN and XGBoost techniques, we can recognize that XGBoost often achieves state-of-the-art results and outperforms artificial neural networks, especially where data sets are small and structured (Chen & Guestrin, Citation2016; Climent et al., Citation2019). However, ANNs include a complex set of hyperparameters that can be tuned and given the applied random search approach, additional optimization to achieve even better results cannot be ruled out. Finally, neural networks often achieve better results if trained on larger data sets and further optimization is conducted using deeper network structures (D’souza et al.’s, Citation2020). Even though ANN performed worse than XGBoost, ANN is still promising given that, at an early stage of the start-up process (for example, 12 months), most of the entrepreneurs who have not abandoned their business venture are correctly classified.

From a theoretical perspective, our results provide an insight into the critical activities carried out at different stages of the gestation process and a better understanding of the factors leading to firm birth and firm abandonment. For example, our results provide evidence that certain aspects of human capital (that is, education or industry experience) are more prevalent for predicting firm abandonment than firm birth and are more important at an early stage of the gestation process to predict emergence outcomes. While the reasoning for this could be manifold (for example, relationship between human capital and high-velocity decision-making to quit), these insights contribute to existing theories and empirical generalizations related to human capital in entrepreneurship. More pointedly, while some research on the impact of human capital on entrepreneurial outcomes has been inconclusive (Bosma et al., Citation2004), scholars started to focus on two ways to reflect upon and reconcile this mixed evidence (Dimov, Citation2017). The first relates to the complexity of different entrepreneurial outcomes which calls for further research on identifying new possible moderators. The second relates to the nature of the human capital construct itself and the way it is constructed and measured (Dimov, Citation2017). With our analysis, we add a possible third explanation to this discussion by connecting temporal elements to the relationship between certain indicators of human capital and a specific entrepreneurial outcome (that is, firm abandonment).

In a similar vein, we provide new insights about new venture team formation and the likelihood of achieving firm birth (Held et al., Citation2018; Klotz et al., Citation2014). Our results suggest that presence of a founding team is a stronger predictor for firm birth in the first and last stages of the gestation process. It is essential to outline that the terminology “stages” is more related to a temporal dimension used in the context of forecasting firm birth or firm abandonment. When considering the “three stage model” outlined by Davidsson and Gruenhagen (Citation2020, p. 17) for example, the human capital dimensions included in the models can occur in any of the three proposed stages “prospecting, developing or exploiting” and potentially lead firm birth. However, from a temporal perspective, team formation (that is, hiring employees) seems to be more important at either a very early or later stage and probably can indicate, similar to achieving a first sale, a “critical incident” (Davidsson & Gruenhagen, Citation2020, p. 18) for predicting firm birth.

In addition, our results contribute to the discussion whether entrepreneurship education should be considered as a method or as a process (Neck & Greene, Citation2011). While the process perspective follows “one of identifying an opportunity, developing the concept, understanding resource requirements, acquiring resources, implementation, and exit” (Neck & Greene, Citation2011, p. 59), thus having at its core opportunity evaluation, feasibility analysis, business planning, and financial forecasting, the method perspective “represents a body of skills or techniques.” (Neck & Greene, Citation2011, p. 61) Given our research design, at first sight, the nature of our work rather relates to a process perspective with a “planning and prediction” character using AI. One can reasonably ask: “If AI allows one to predict the likelihood of, for example, firm birth during different stages, should it not inevitably follow a process?” The answer is equivocal. Our results highlight the possibility of critical incidents that allow for a prediction of firm birth either early or later during the process. As such, teaching from a process perspective may align toward getting the best out of such temporally aligned incidents (for example, sales classes). Despite these insights, it is important to note that certain critical incidents can also occur at any time during the process, thus making the mentioned “three stage model” from a process perspective intriguing. Moreover, the most crucial prediction variables relate to complexity dynamics such as rate, time, concentration, or effort. These variables cannot be assigned to any of the stages illustrated in the process paragraph above and are rather to be understood as overshadowing the whole phase. As such, a method approach for teaching these skills can be beneficial. For example, in relation to the rate, education programs should focus on would-be entrepreneurs should maintain or even increase the pace of entrepreneurial activities, given its importance for firm birth. A design thinking (Linton & Klinton, Citation2019) or even a design sprint approach (Hilliard, Citation2021) could help in developing such skills.

From a practical point of view, the combination of all proposed firm birth and firm abandonment prediction models provides a valuable system to foster a better allocation of resources to successful entrepreneurs, while reducing respective resource misallocation to entrepreneurs who abandon the venture. The allocation of third-party resources to potentially successful entrepreneurs is inherently speculative given the high failure rate during the start-up process, the often patchy product or service offerings, and the unproven technologies (Drover et al., Citation2017). While information is often scarce at such an early stage, uncertainty and information asymmetry prevail (Dunkelberg et al., Citation2013; Nguyen et al., Citation2020). This asymmetry can lead to agency problems (Jensen & Meckling, Citation1976) that arise due to hidden information and hidden actions between the involved parties.

The proposed models provide stakeholders with a viable early stage screening and monitoring system to mitigate agency problems during the venture creation process. Specifically, the models can contribute to not only mitigate costs of a resource misallocation, but also to the costs related to the selection process for external parties such as incubators, accelerators, angel investors and venture capitalists (Drover et al., Citation2017; Yin & Luo, Citation2018). For example, a seed venture capitalist must often use more visible and quickly accessible information to efficiently discern which ventures are worthy of moving to due diligence (Drover et al., Citation2017). Given the considerable amount of resources that are expended in properly vetting the venture during the due diligence (Drover et al., Citation2017), our models provide a parsimonious solution to distinguish between entrepreneurs who are likely to achieve firm birth and those who are likely to abandon during the screening process.

Moreover, the identification of entrepreneurs who are likely to abandon their project could also help business advisers to timely engage with the entrepreneurs before their abandonment decision may even arises. This could help at an early stage to identify crucial factors to boost the nascent entrepreneurial motivation or to guide them to pivot their idea.

Conclusion

Predicting new venture gestion outcome is a complex endeavor. More accurate forecasting tools can serve as a support and an early warning system to help stakeholders identify promising entrepreneurs, while at the same time provide valuable insights into the gestation process. Against this backdrop, we used machine learning methods to forecast the likelihood of firm birth and firm abandonment during the first five years of a new business gestation, and to identify what single factors can lead to firm emergence. Our results suggest that the application of AI techniques to predict firm birth and firm abandonment is very promising. We provide evidence that machine-learning algorithms outperform traditional regression-based models while preserving interpretability. Among the methods we examined, XGBoost was one of the most promising, while neural networks provided comparable performance metrics, showing that they can be used for relatively small data sets.

In addition, by identifying key factors to predict firm birth and firm abandonment, we were able to gain valuable insights in relation to the start-up activities leading to firm emergence. Achieving sales, as well as the rate, timing, and concentration of activities are key elements in predicting the venture gestation outcome. Looking at the whole firm gestation process, a variety of activities are more significant at an early stage (for example, forming a start-up team), while others are more important at a later stage of the gestation process (for example, ask for supplier credits) in relation to predicting the outcome of the venture. This dynamism underpins the complexity of the gestation process and further underlines the importance for external parties to gain this in-depth knowledge of the entrepreneur, their status, and their business environment. Combining these elements, we thus contribute to theory and practice by providing further insights, as outlined in the discussion section, to a better understanding of the entrepreneurial process and the practical need for such better understanding (that is, cost related to resource misallocation).

This study has a several limitations. First, it is important to recognize that machine learning methods do not constitute a panacea in decision-making as they are constrained when processing and interpreting “soft” types of information (information that cannot be quantified) and making predictions in uncertain situations (Dellermann et al., Citation2017). Although these models can provide some guidance, entrepreneurship often requires intuitive decision-making and heuristics enable entrepreneurs to function effectively in those situations. Similarly, experts such as business angels or venture capitalists, with their industry knowledge and sensitivity to entrepreneurial personality, can provide informed advice. Second, firm emergence is a complex process where further variables may play an essential role on the venture’s outcome. For nascent entrepreneurs, cognitive capacities can have a significant effect on the likelihood of succeeding (SBA, Citation2012) and venture networking (that is, connection to incubators, accelerators, research centers, universities; founder/s’ strong and weak ties, and so on), may positively impact the venture outcome (Woolley & MacGregor, Citation2021).

Given the missing data in the used harmonized data set, future research could include such variables into the proposed models. Third, as outlined in the sequential analysis, stakeholders using classification models during the start-up process must be aware of the changes over time. Henceforth, the models need to be retrained at least on a yearly basis to provide the necessary information.

Finally, users should be aware of the complexities associated with the different models, especially regarding their implementation. Given that data-mining techniques such as neural networks and support vector machines are often considered a black box themselves (Cortez & Embrechts, Citation2013), models with better visualization possibilities such as random forest models, may provide a cost-effective alternative. To improve the predictability of such models, future research would profit from including further variables such as cognitive factors, competitor data, and information about the quality of the business idea and perceived product–market fit. Further model optimizations such as including additional layers in the neural network model should be considered. Combining text mining models, for example, business plans and pitches with machine learning models could further increase the predictive power.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

Antretter, T., Blohm, I., Grichnik, D., & Wincent, J. (2019). Predicting new venture survival: A Twitter-based machine learning approach to measuring online legitimacy. Journal of Business Venturing Insights, 11, e00109. https://doi.org/10.1016/j.jbvi.2018.e00109
Google Scholar
Arenius, P., Engel, Y., & Klyver, K. (2017). No particular action needed? A necessary condition analysis of gestation activities and firm emergence. Journal of Bussiness Venturing Insights, 8, 87–92. https://doi.org/10.1016/j.jbvi.2017.07.004
Google Scholar
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2), 281–305.
Google Scholar
Blanco-Oliver, A., Pino-Mejías, R., & Lara-Rubio, J. (2014). Modeling the financial distress of microenterprise start- ups using support vector machines: A case study. Innovar, 24(1Spe), 153–168. https://doi.org/10.15446/innovar.v24n1spe.47615
Google Scholar
Bosma, N., Van Praag, M., Thurik, R., & De Wit, G. (2004). The value of human and social capital investments for the business performance of startups. Small Business Economics, 23(3), 227–236. https://doi.org/10.1023/B:SBEJ.0000032032.21192.72
Web of Science ®Google Scholar
Burnaev, E., Erofeev, P., & Papanov, A. (2015). Influence of resampling on accuracy of imbalanced classification. In A. Verikas, P. Radeva, & D. Nikolaev, (Eds.), Eighth International Conference on machine vision (vol. 987521). https://doi.org/10.1117/12.2228523
Google Scholar
Chandler, G. N., Honig, B., & Wiklund, J. (2005). Antecedents,mmoderators, and performance consequences of membership change in new venture teams. Journal of Bussiness Venturing, 20(5), 705–725. https://doi.org/10.1016/j.jbusvent.2004.09.001
Web of Science ®Google Scholar
Chaudhuri, N., & Bose, I. (2020). Exploring the role of deep neural networks for post-disaster decision support. Decision Support Systems, 130, 113234. https://doi.org/10.1016/j.dss.2019.113234
Web of Science ®Google Scholar
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD International Conference of knowledge discovery and data mining, ACM, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785.
Google Scholar
Chwolka, A., & Raith, M. G. (2012). The value of business planning before start-up - a decision-theoretical perspective. Journal of Bussiness Venturing, 27(3), 385–399. https://doi.org/10.1016/j.jbusvent.2011.01.002
Web of Science ®Google Scholar
Climent, F., Momparler, A., & Carmona, P. (2019). Anticipating bank distress in the eurozone: an extreme gradient boosting approach. Journal of Bussiness Research, 101, 885–896. https://doi.org/10.1016/j.jbusres.2018.11.015
Web of Science ®Google Scholar
Clough, D. R., Fang, T. P., Vissa, B., & Wu, A. (2019). Turning lead into gold: How do entrepreneurs mobilize resources to exploit opportunities? Academy of Management Annals, 13(1), 240–271. https://doi.org/10.5465/annals.2016.0132
Web of Science ®Google Scholar
Coad, A., & Srhoj, S. (2019). Catching gazelles with a lasso: Big data techniques for the prediction of high-growth firms. Small Business Economics, 55(3), 1–25. https://doi.org/10.1007/s11187-019-00203-3
Web of Science ®Google Scholar
Cortez, P., & Embrechts, M. J. (2013). Using sensitivity analysis and visualization techniques to open black box data mining models. Information Science, 225, 1–17. https://doi.org/10.1016/j.ins.2012.10.039
Web of Science ®Google Scholar
D’souza, R. N., Huang, P. Y., & Yeh, F. C. (2020). Structural analysis and optimization of convolutional neural networks with a small sample size. Scientific Reports, 10(1), 834. https://doi.org/10.1038/s41598-020-57866-2
PubMed Web of Science ®Google Scholar
Davidsson, P., & Honig, B. (2003). The role of social and human capital among nascent entrepreneurs. Journal of Business Venturing, 18(3), 301–331. https://doi.org/10.1016/S0883-9026(02)00097-6
Web of Science ®Google Scholar
Davidsson, P., & Gordon, S. R. (2012). Panel studies of new venture creation: A methods-focused review and suggestions for future research. Small Business Economics, 39(4), 853–876. https://doi.org/10.1007/s11187-011-9325-8
Web of Science ®Google Scholar
Davidsson, P., & Gruenhagen, J. H. (2020). Fulfilling the process promise: A review and agenda for new venture creation process research. Entrepreneurship Theory and Practice, 45(5), 1083–1118. https://doi.org/10.1177/1042258720930991
Web of Science ®Google Scholar
Dellermann, D., Lipusch, N., Ebel, P., Popp, K. M., & Leimeister, J. M. (2017). Finding the Unicorn: Predicting Early Stage Startup Success through a Hybrid Intelligence Method. International Conference on Information Systems (ICIS), Seoul, South Korea. https://doi.org/10.48550/arXiv.2105.03360
Google Scholar
Delmar, F., & Shane, S. (2003). Does business planning facilitate the development of new ventures? Strategic Management Journal, 24(12), 1165–1185. https://doi.org/10.1002/smj.349
Web of Science ®Google Scholar
Delmar, F., & Shane, S. (2006). Does experience matter? The effect of founding team experience on the survival and sales of newly founded ventures. Strategic Organization, 4(3), 215–247. https://doi.org/10.1177/1476127006066596
Google Scholar
Dimov, D. (2010). Nascent entrepreneurs and venture emergence: Opportunity confidence, human capital, and early planning. Journal of Management Studies, 47(6), 1123–1153. https://doi.org/10.1111/j.1467–6486.2009.00874.x.
Web of Science ®Google Scholar
Dimov, D. (2017). Towards a qualitative understanding of human capital in entrepreneurship research. International Journal of Entrepreneurial Behaviour and Research, 23(2), 210–227. https://doi.org/10.1108/IJEBR-01-2016-0016
Web of Science ®Google Scholar
Drover, W., Wood, M. S., & Zacharakis, A. (2017). Attributes of angel and crowdfunded investments as determinants of VC screening decisions. Entrepreneurship: Theory and Practice, 41(3), 323–347. https://doi.org/10.1111/etap.12207
Web of Science ®Google Scholar
Dunkelberg, W., Moore, C., Scott, J., & Stull, W. (2013). Do entrepreneurial goals matter? Resource allocation in new owner-managed firms. Journal of Business Venturing, 28(2), 225–240. https://doi.org/10.1016/j.jbusvent.2012.07.004
Web of Science ®Google Scholar
Friedman, J. (2001). Greedy function approximation : A gradient boosting machine. Annals on Stategy, 29(5), 1189–1232. https://www.jstor.org/stable/2699986
Web of Science ®Google Scholar
Gerasimovic, M., & Bugaric, U. (2018). Enrollment management model: Artificial neural networks versus logistic regression. Applied Artificial Intelligence, 32(2), 153–164. https://doi.org/10.1080/08839514.2018.1448146
Web of Science ®Google Scholar
Giotopoulos, I., Kontolaimou, A., & Tsakanikas, A. (2017). Drivers of high-quality entrepreneurship: What changes did the crisis bring about? Small Bussiness Economics, 48(4), 913–930. https://doi.org/10.1007/s11187-016-9814-x
Web of Science ®Google Scholar
Hallen, B. L. (2008). The causes and consequences of the initial network positions of new organizations: From whom do entrepreneurs receive investments? Administrative Science Quarterly, 53(4), 685–718. https://doi.org/10.2189/asqu.53.4.685
Web of Science ®Google Scholar
Hallen, B. L., Cohen, S. L., & Bingham, C. B. (2020). Do accelerators work? If so, how? Organization Science, 31(2), 378–414. https://doi.org/10.1287/orsc.2019.1304
Web of Science ®Google Scholar
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In D. S. Huang, X. P. Zhang, & G. B. Huang (Eds.), Advances in intelligent compuing (pp. 878–887). Springer-Verlag.
Google Scholar
Hechavarria, D. M., Reno, M., & Matthews, C. H. (2012). The nascent entrepreneurship hub: Goals, entrepreneurial self-efficacy and start-up outcomes. Small Business Economics, 39(3), 685–701. https://doi.org/10.1007/s11187-011-9355-2
Web of Science ®Google Scholar
Held, L., Herrmann, A. M., & van Mossel, A. (2018). Team formation processes in new ventures. Small Business Economics, 51(2), 441–464. https://doi.org/10.1007/s11187-018-0010-z
Web of Science ®Google Scholar
Hilliard, R. (2021). Start-up sprint: Providing a small group learning experience in a large group setting. Journal of Management Education, 45(3), 387–403. https://doi.org/10.1177/1052562920948924
Web of Science ®Google Scholar
Honig, B., & Samuelsson, M. (2012). Planning and the entrepreneur: A longitudinal examination of nascent entrepreneurs in Sweden. Journal of Small Business Management, 50(3), 365–388. https://doi.org/10.1111/j.1540-627X.2012.00357.x
Web of Science ®Google Scholar
Hopp, C., & Sonderegger, R. (2015). Understanding the dynamics of nascent entrepreneurship-prestart-up experience, intentions, and entrepreneurial success. Journal of Small Bussiness Management, 53(4), 1076–1096. https://doi.org/10.1111/jsbm.12107
Web of Science ®Google Scholar
Huang, Z., Chen, H., Hsu, C. J., Chen, W. H., & Wu, S. (2004). Credit rating analysis with support vector machines and neural networks: A market comparative study. Decision Support Systems, 37(4), 543–558. https://doi.org/10.1016/S0167-9236(03)00086-1
Web of Science ®Google Scholar
Jensen, M. C., & Meckling, W. H. (1976). Theory of the firm: Managerial behavior, agency costs and ownership structure. Journal of Financial Economics, 3(4), 305–360. https://doi.org/10.1016/0304-405X(76)90026-X
Web of Science ®Google Scholar
Khan, S. A., Tang, J., & Joshi, K. (2014). Disengagement of nascent entrepreneurs from the start‐up process. Journal of Small Business Management, 52(1), 39–58. https://doi.org/10.1111/jsbm.12032
Web of Science ®Google Scholar
Klotz, A. C., Hmieleski, K. M., Bradley, B. H., & Busenitz, L. W. (2014). New venture teams: A review of the literature and roadmap for future research. Journal of Management, 40(1), 226–255. https://doi.org/10.1177/0149206313493325
Web of Science ®Google Scholar
Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural networks and transfer learning. Decision Support Systems, 104, 38–48. https://doi.org/10.1016/j.dss.2017.10.001
Web of Science ®Google Scholar
Laurikkala, J. (2002). Instance-based data reduction for improved identification of difficult small classes. Intelligent Data Anaalyisis, 6(4), 311–322. https://doi.org/10.3233/ida-2002-6402
Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
PubMed Web of Science ®Google Scholar
Lévesque, M., Obschonka, M., & Nambisan, S. (2020). Pursuing impactful entrepreneurship research using artificial intelligence. Entrepreneurship Theory and Practice, 104225872092736. https://doi.org/10.1177/1042258720927369
Web of Science ®Google Scholar
Li, J., & Wang, Y. (2015). A new fast reduction technique based on binary nearest neighbor tree. Neurocomputing, 149, 1647–1657. https://doi.org/10.1016/j.neucom.2014.08.028
Web of Science ®Google Scholar
Liao, J. J., & Gartner, W. B. (2007). The influence of pre-venture planning on new venture creation. Journal of Small Business Strategy, 18(2), 1–22.
Google Scholar
Liao, J., & Welsch, H. (2008). Patterns of venture gestation process: Exploring the differences between tech and non-tech nascent entrepreneurs. The Journal of High Technology Management Research, 19(2), 103–113. https://doi.org/10.1016/j.hitech.2008.10.003
Google Scholar
Liao, J., Welsch, & Moutray, C. (2009). Start-up resources and entrepreneurial discontinuance: The case of nascent entrepreneurs. Journal of Small Bussiness Strategy, 19(2), 89–103. https://libjournals.mtsu.edu/index.php/jsbs/article/view/112
Google Scholar
Lichtenstein, B. B., Carter, N. M., Dooley, K. J., & Gartner, W. B. (2007). Complexity dynamics of nascent entrepreneurship. Journal of Business Venturing, 22(2), 236–261. https://doi.org/10.1016/j.jbusvent.2006.06.001
Web of Science ®Google Scholar
Linton, G., & Klinton, M. (2019). University entrepreneurship education: A design thinking approach to learning. Journal of Innovation and Entrepreneurship, 8(1), 1–11. https://doi.org/10.1186/s13731-018-0098-z
Google Scholar
Loureiro, A. L. D., Miguéis, V. L., & da Silva, L. F. M. (2018). Exploring the use of deep neural networks for sales forecasting in fashion retail. Decision Support Systems, 114, 81–93. https://doi.org/10.1016/j.dss.2018.08.010
Web of Science ®Google Scholar
Lussier, R. N. (1995). A nonfinancial business success versus failure prediction model for young firms. Journal of Small Bussiness Management, 33(3), 8. https://doi.org/10.1111/j.1540-627X.2010.00298.x
Web of Science ®Google Scholar
Lussier, R. N., & Claudia, E. H. (2010). A three‐country comparison of the business success versus failure prediction model. Journal of Small Business Management, 48(3), 360–377. https://doi.org/10.1111/j.1540-627X.2010.00298.x
Web of Science ®Google Scholar
Mayr, S., Mitter, C., Kücher, A., & Duller, C. (2020). Entrepreneur characteristics and differences in reasons for business failure: Evidence from bankrupt Austrian SMEs. Journal of Small Business and Entrepreneurship, 33(5), 539–558. https://doi.org/10.1080/08276331.2020.1786647
Google Scholar
Muñoz-Bullon, F., Sanchez-Bueno, M. J., & Vos-Saz, A. (2015). Startup team contributions and new firm creation: The role of founding team experience. Entrepreneurship and Regional Development, 27(1–2), 80–105. https://doi.org/10.1080/08985626.2014.999719
Web of Science ®Google Scholar
Neck, H. M., & Greene, P. G. (2011). Entrepreneurship education: Known worlds and new frontiers. Journal of Small Business Management, 49(1), 55–70. https://doi.org/10.1111/j.1540-627X.2010.00314.x
Web of Science ®Google Scholar
Newbert, S. L. (2005). New firm formation: A dynamic capability perspective. Journal of Small Bussiness Management, 43(1), 55–77. https://doi.org/10.1111/j.1540-627X.2004.00125.x
Web of Science ®Google Scholar
Newbert, S. L., & Tornikoski, E. T. (2012). Supporter networks and network growth: A contingency model of organizational emergence. Small Bussiness Economy, 39(1), 141–159. https://doi.org/10.1007/s11187-010-9300-9
Web of Science ®Google Scholar
Newbert, S. L., Tornikoski, E. T., & Quigley, N. R. (2013). Exploring the evolution of supporter networks in the creation of new organizations. Journal of Bussiness Venturing, 28(2), 281–298. https://doi.org/10.1016/j.jbusvent.2012.09.003
Web of Science ®Google Scholar
Nguyen, B., Le, C., & Vo, X. V. (2020). The paradox of investment timing in small business: Why do firms invest when it is too late? Journal of Small Business Management, 1–43. https://doi.org/10.1080/00472778.2020.1816436
Web of Science ®Google Scholar
Obschonka, M., & Audretsch, D. B. (2019). Artificial intelligence and big data in entrepreneurship: A new era has begun. Small Business Economics, 55, 529–539. https://doi.org/10.1007/s11187-019-00202-4
Web of Science ®Google Scholar
Raniszewski, M. (2010). The edited nearest neighbor rule based on the reduced reference set and the consistency criterion. Biocybernetics and Biomedical Engineering, 30, 31–40.
Web of Science ®Google Scholar
Reynolds, P. D., & Curtin, R. T. (2011). Overview and Commentary. In P. Reynolds & R. Curtin (Eds.), New business creation: International studies in entrepreneurship 1 (Vol. 27, pp. 295–334). Springer. https://doi.org/10.1007/978-1-4419-7536-2_11
Google Scholar
Reynolds, P. D., Hechavarria, D., Tian, L. R., Samuelsson, M., & Davidsson, P. (2016). Panel study of entrepreneurial dynamics: A five cohort outcomes harmonized dataset. http://www.psed.isr.umich.edu/psed/data
Google Scholar
Reynolds, P. D. (2017). When is a firm born? Alternative criteria and consequences. Business Economics, 52(1), 41–56. https://doi.org/10.1057/s11369-017-0022-8
Google Scholar
Rotefoss, B., & Kolvereid, L. (2005). Aspiring, nascent and fledging entrepreneurs: An investigation of the business start-up process. Entreprenureship. Reg. Dev, 17(2), 109–127. https://doi.org/10.1080/08985620500074049
Web of Science ®Google Scholar
Sabahi, S., & Parast, M. M. (2020). The impact of entrepreneurship orientation on project performance: A machine learning approach. International Journal of Production Economics, 226, 107621. https://doi.org/10.1016/j.ijpe.2020.107621
Web of Science ®Google Scholar
SBA. (2012). Frequently asked questions about small business. (accessed February 6, 2020) https://www.sba.gov/sites/default/files/FAQ_Sept_2
Google Scholar
Shim, J., & Davidsson, P. (2018). Shorter than we thought: The duration of venture creation processes. Journal of Bussiness Venturing Insights, 9, 10–16. https://doi.org/10.1016/j.jbvi.2017.12.003
Google Scholar
Steffens, P., Terjesen, S., & Davidsson, P. (2012). Birds of a feather get lost together: New venture team composition and performance. Small Business Economics, 39(3), 727–743. https://doi.org/10.1007/s11187-011-9358-z
Web of Science ®Google Scholar
Thiess, D., Sir, C., & Grichni, D. (2016). How does heterogeneity in experience influence the performance of nascent venture teams? Insights from the US PSED II study. Journal of Business Venturing Insights, 5, 55–62. https://doi.org/10.1016/j.jbvi.2016.04.001
Google Scholar
Topuz, K., Zengul, F. D., Dag, A., Almehmi, A., & Yildirim, M. B. (2018). Predicting graft survival among kidney transplant recipients: A bayesian decision support model. Decision Support Systems, 106, 97–109. https://doi.org/10.1016/j.dss.2017.12.004
Web of Science ®Google Scholar
Tornikoski, E. T., & Newbert, S. L. (2007). Exploring the determinants of organizational emergence: A legitimacy perspective. Journal of Business Venturing, 22(2), 11–335. https://doi.org/10.1016/j.jbusvent.2005.12.003
Web of Science ®Google Scholar
Tornikoski, E. T. (2008). Legitimating characteristics and firm emergence. Journal of Enterprising Culture, 16(3), 233–256. https://doi.org/10.1142/S0218495808000144
Google Scholar
Tu, J., Lin, A., Chen, H., Lin, Y., & Li, C. (2019). Predict the entrepreneurial intention of fresh graduate students based on an adaptive support vector machine framework. Mathematical Probems in Engineering, 1–16. https://doi.org/10.1155/2019/2039872
Google Scholar
van Gelderen, M., Thurik, R., & Bosma, N. (2005). Success and risk factors in the pre-startup phase. Small Bussiness Economics, 26(4), 319–335. https://doi.org/10.1007/s11187-004-6837-5
Web of Science ®Google Scholar
van Witteloostuijn, A., & Kolkman, D. (2019). Is firm growth random? A machine learning perspective. Journal of Business Venturing Insights, 11, 1–5. https://doi.org/10.1016/j.jbvi.2018.e00107
Google Scholar
Veganzones, D., & Séverin, E. (2018). An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112, 111–124. https://doi.org/10.1016/j.dss.2018.06.011
Web of Science ®Google Scholar
Vegetti, F., & Adăscăliţei, D. (2017). the impact of the economic crisis on latent and early entrepreneurship in Europe. International Entrepreunership Management Journal, 13(4), 1289–1314. https://doi.org/10.1007/s11365-017-0456-5
Web of Science ®Google Scholar
Vo, N. N. Y., He, X., Liu, S., & Xu, G. (2019). Deep learning for decision making and the optimization of socially responsible investments and portfolio. Decision Support Systems, 124, 113097. https://doi.org/10.1016/j.dss.2019.113097
Web of Science ®Google Scholar
Wang, J. S., Lin, C. W., & Yang, Y. T. C. (2013). A k-nearest-neighbor classifier with heart rate variability feature-based transformation algorithm for driving stress recognition. Neurocomputing, 116, 136–143. https://doi.org/10.1016/j.neucom.2011.10.047
Web of Science ®Google Scholar
WBAF. (2020). Global fundraising stage - GFRS 2020 an international co-investment platform. World Business Angels Investment Forum. https://www.wbaforum.org/upload/07GFRS_2020_745.pdf
Google Scholar
Weinblat, J. (2018). Forecasting European high-growth Firms - A random forest approach. Journal of Industry, Competition and Trade, 18(3), 253–294. https://doi.org/10.1007/s10842-017-0257-0
Web of Science ®Google Scholar
Wennberg, K., & Anderson, B. S. (2020). Editorial: Enhancing the exploration and communication of quantitative entrepreneurship research. Journal of Business Venturing, 35(3), 1–11. https://doi.org/10.1016/j.jbusvent.2019.05.002
Web of Science ®Google Scholar
Woolley, J. L., & MacGregor, N. (2021). The influence of incubator and accelerator participation on nanotechnology venture success. Entrepreneurship: Theory and Practice. https://doi.org/10.1177/10422587211024510
Web of Science ®Google Scholar
Xu, B., Yang, J., & Sun, B. (2018). A nonparametric decision approach for entrepreneurship. International Entrepreneurship and Management Journal, 14(1), 5–14.
Web of Science ®Google Scholar
Yin, B., & Luo, J. (2018). How do accelerators select startups? Shifting decision criteria across stages. IEEE Transactions on Engineering Management, 65(4), 574–589. https://doi.org/10.1109/TEM.2018.2791501
Web of Science ®Google Scholar

Appendix A.

List of variables included in the study.

Download CSV Display Table

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Predicting New Venture Gestation Outcomes With Machine Learning Methods

ABSTRACT

Introduction

Background