1,226
Views
5
CrossRef citations to date
0
Altmetric
Articles

Start-ups survival through a crisis. Combining machine learning with econometrics to measure innovation

ORCID Icon, ORCID Icon & ORCID Icon
Pages 468-493 | Received 03 Nov 2019, Accepted 24 Feb 2020, Published online: 12 Jun 2020
 

ABSTRACT

This paper shows how data science can contribute to improving empirical research in economics by leveraging on large datasets and extracting information otherwise unsuitable for a traditional econometric approach. As a test-bed for our framework, machine learning algorithms allow to create a new holistic measure of innovation following a 2012 Italian Law aimed at boosting new high-tech firms. We adopt this measure to analyse the impact of innovativeness on a large population of Italian firms which entered the market at the beginning of the 2008 global crisis. The methodological contribution is organised in different steps. First, we train seven supervised learning algorithms to recognise innovative firms on 2013 firmographics data and select a combination of those models with the best prediction power. Second, we apply the latter on the 2008 dataset and predict which firms would have been labelled as innovative according to the definition of the 2012 law. Finally, we adopt this new indicator as the regressor in a survival model to explain firms' ability to remain in the market after 2008. The results suggest that innovative firms are more likely to survive than the rest of the sample, but the survival premium is likely to depend on location.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Although a stream of literature tries to develop models that overcome the trade-off between the prediction error due to a simple model and the variance of estimates in out-of-sample predictions (Pearl and Mackenzie Citation2018), in statistical learning, the trade-off is still binding.

2 The debate on the use of patents dates back at least to Pavitt (Citation1985).

3 For a review and future perspectives on history-friendly models see Capone et al. (Citation2019).

4 The 221/2012 Legislative Decree, was adopted, and when in force, on 17 December 2012.

6 Alternatively, as a measure of performance, we can compare the area under the ROC curve (AUC). For further details on the interpretation of ROC curves, see Alpaydin (Citation2014).

7 We respectively assigned the weights 0.77 and 0.23 to BAG and ANN, according to a function which maximises the separation between the predicted probabilities for INNs and NOINNs and the area under the ROC curve (AUC). As a robustness check, we also tested the mix of different algorithms, but there was no substantial improvement in the performance. See Appendix 2 for further details.

8 We estimate the variance with Greenwood's formula using the Delta method, and we use log-minus-log transformation for the confidence interval (Borgan and Liestøl Citation1990).

9 Note that NAs are much too diffuse among the variables and observations, and therefore multiple imputations will add an extra variability to non justified observed variables. Even if we limit the multiple imputation to some crucial variables, we still do not have enough complete observations in the dataset to finalise the NA completion.

10 Note that management variables, which contain a huge amount of unstandardised text, are discarded from the beginning of the data construction process.

11 Note that, without this last MVA step, there would only have been 18,078 firms left in the 2008 sample, representing less than the 28% of the initial amount of 2008 start-ups.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 408.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.