324
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Chasing entrepreneurial firms

&
Pages 479-507 | Published online: 17 May 2018
 

Abstract

The search for a reliable data-set of entrepreneurial firms is ongoing. We analyze and assess longitudinal data on startups from two data sources – the National Establishment Time-Series (NETS) database and the Secretary of State (SOS) business registry data. Our primary purposes in this paper are to assess the usefulness and reliability of these databases in measuring startup activity along several quality indicators and to explore the possibility of integrating these large databases using both automated and manual processes. The NETS identifies a firm’s employment, sales, and industry but is expensive and suffers from a temporal lag. The SOS data provide up-to-date startup counts but offer limited variables. We conclude that policymakers and researchers will benefit from combing both the SOS and adjusted NETS since they provide complementary information on startups. We carefully document our methodology and make suggestions for use of the data for future research.

Acknowledgement

The authors wish to thank the Kauffman Foundation, especially Evan Absher, for his advice and support of this research. Evan Johnston, Jayson Varkey, and Alyse Polly, the research assistants who worked in preparing complex data for analysis as well as figures and tables, deserve special recognition. They displayed great intelligence in analyzing, organizing, and interpreting the detailed data. We thank Scott Stern and Jorge Guzman for sharing data from the North Carolina Secretary of State and Lori Castro, Briana Godbey, and Renee Guerrero at the Texas Secretary of State for their time and advice. We also thank the participants of the Innovative Data for Economic Analysis Workshop organized by the Frank Hawkins Kenan Institute of Private Enterprise at the Kenan-Flager Business School, University of North Carolina at Chapel Hill.

Notes

1 Don Walls, president of Walls & Associates and administrator of the NETS data, provided insight on the coverage, collection, and trends of the NETS data through multiple emails and phone conversations.

2 Haltiwanger, Jarmin, and Miranda (Citation2013), citing Neumark, Wall, and Zhang (Citation2011), note that the NETS reports between 13.1 and 14.7 million establishments on average annually between 1992 and 2004, the LBD reports between 6 and 7 million paid employer establishments in a typical year, and the Census Bureau reports more than 15 million nonemployer businesses in a typical year for the United States.

3 Most recent address refers to the address of the establishment in its last active year.

4 Jurisdiction refers to the state where the entity first registers and forms, while registration refers to the state or states in which the entity can conduct business. Jurisdiction is important because the laws governing the formation of an entity vary by state. For example, a company may form under the legal jurisdiction of Delaware and register with the Delaware Secretary of State while also registering to conduct business in Texas with the Texas Secretary of State.

5 Venture capital firms requiring or encouraging Delaware jurisdiction was mentioned repeatedly in many of the more than 50 interviews conducted with local entrepreneurs and influencers in Austin.

6 Guzman and Stern (Citation2015b) define a growth outcome as a company’s achieving an initial public offering (IPO) or acquisition at a meaningful positive valuation within six years of registration.

7 Guzman and Stern (Citation2015a, Citation2015b, Citation2016) provide a rich and detailed overview of these data in the data appendix of their publications.

8 We use 135 five-digit ZIP codes to define the Austin MSA. The ZIP code to MSA (or CBSA) crosswalk is taken from the U.S. Department of Housing and Urban Development. These data may be provided by the authors upon request. The five-county composition of the Austin MSA has held constant between 1990 and the present and includes Bastrop, Caldwell, Hays, Travis, and Williamson counties (Geffen Citation2003; LMCI of the Texas Workforce Commission Citation2015).

9 We use the following 13 counties to define the Research Triangle region: Chatham, Durham, Franklin, Granville, Harnett, Johnston, Lee, Moore, Orange, Person, Vance, Wake, and Warren.

10 Neumark, Zhang, and Wall (Citation2005) removed NETS observations for 1990 and 1991, as D&B drastically improved its methodology for data collection in 1992, when they began using yellow pages to identify business units. We chose to retain these years, as they do not show a divergence from the trends seen in the SOS data.

11 NAICS: 92 (government and armed forces), 8131 (religious and charitable organizations), 4821 (railroad employment), 6111 (private and public elementary and secondary schools), 1141 (commercial fish and shellfish related sectors), 8141 (domestic workers), and 11 (agricultural workers on small farms).

12 The 13-county Research Triangle region comprises 172 five-digit ZIP codes, which may be provided upon request.

13 Although sole proprietors and non-employers are different categories conceptually, the number of non-employers traditionally tracks very closely with the number of sole proprietors (Small Business Administration Citation2013).

14 We decided to remove only establishments with one employee in their first year rather than those with one or two employees, as startup trends from the NETS shift significantly below the SOS startup trends when we exclude these additional establishments. This seems to indicate that removing establishments with two employees will remove a large portion of LLP, LLC, LP, and corporations with one owner and one employee or with two owners.

15 According to personal communication with Don Walls on March 27, 2017, he found that the full NETS, which covers the country, reveals that the spike in startup activity in 2010 is an explosion in the birth of small businesses (one or two employees) after the 2008–2009 recession.

16 Neumark, Wall, and Zhang (Citation2011) note the NETS sometimes detects new establishments with a delay; therefore, they do not use the two most recent years in their analysis.

17 Of the 359,018 Austin establishments in the 2013 NETS, only 359,009 were found in the 2014 NETS. This accounts for the difference of nine establishments between the two Austin datasets, resulting in 37,262 added establishments. Similarly, of the 324,915 Research Triangle establishments in the 2013 NETS data-set, only 324,510 were found in the 2014 NETS. This accounts for the difference of 405 establishments and results in 28,108 added establishments for the Research Triangle.

18 After converting Hecker’s (Citation2005) 46 four-digit 2002 high-tech NAICS to 173 six-digit 2007 NAICS and estimating technology-oriented workers’ intensity by industry in 2012 and 2014 using the 2007 NAICS, they found that 147 of the 173 high-tech NAICS (85%) overlapped in the three years of analysis: 2002, 2012, and 2014.

19 A complete list of the high-tech 2012 NAICS can be found in Table 1 in the Appendix 1. We converted Hecker’s (Citation2005) high-tech NAICS codes to 2012 NAICS codes using the Census NAICS concordance available here: https://www.census.gov/eos/www/naics/concordances/concordances.html.

20 Authors’ calculations using the American Community Survey 2015 five-year aggregated 5% sample via the Integrated Public Use Microdata Series (IPUMS) (Ruggles et al. Citation2017).

21 The NAICS codes that define startups in the high-tech business services and biotechnology sectors are based on Osman (Citation2015) and Echeverri-Carroll et al. (Citation2016), while those that define the software sector are based on definitions of this sector in Osman (Citation2015), Spigel (Citation2013), Bessen and Hunt (Citation2007), Rosenthal and Strange (Citation2006), and Saxenian (Citation1994). Table 1 in the Appendix 1 provides a list of all high-tech 2012 NAICS codes and their subsectors at the six-digit level.

22 A string variable is a variable that can contain letters, numbers, and other characters.

23 Online business databases included business information aggregator websites such as www.manta.com, www.buzzfile.com , and www.smallbusinessdb.com, all of which provide information on small businesses within a geographical area using a company name-based search. Information provided includes company industry and sector, address, name of owner(s), and company aliases. However, these sites do not contain an exhaustive list of all businesses.

24 The measure of dissimilarity is obtained by computing the number of deletions, insertions, or swaps needed to transform one of the strings being compared into the other string (also referred to as the Levenshtein distance).

25 Because companies registered with the Texas SOS pay franchise taxes collected by the Texas Comptroller of Public Accounts, this organization provides data on registration with the Texas Secretary of State through taxable entity searches based on name, filing number, or tax identification number via the following website: https://mycpa.cpa.state.tx.us/coa.

26 The SOSDirect online database contains the administrative universe of all entities registered to conduct business in Texas. The Austin SOS datasets in this paper are subsets of the SOSDirect online database, which can be accessed freely at the secretary of state office in Austin or through the Austin Public Library System found at the following website: https://direct.sos.state.tx.us/acct/acct-login.asp.

27 Some entity address ZIP codes include a four-digit extension (e.g. 78746–7482). A total of 95 entities in our sample do not correctly separate the extension from the five-digit ZIP code (e.g. 787467482). The Texas Secretary of State extracted the Austin sample of registrations using the provided list of 135 five-digit ZIP codes that define Austin MSA and therefore omitted entities that do not separate the extension. The staff at the Texas Secretary of State has informed us they are aware of this issue and are taking measures to correct it.

28 The sample of SOS registrations used in the matching exercise only includes registrations between 1990 and 2010 to be comparable with NETS establishments that started (variable: firstyear) between 1990 and 2010. As previously mentioned, our full sample of SOS registrations includes registrations from 1990 to 2015. These 64 entities registered outside of the 1990–2010 interval and therefore were omitted from the sample of SOS registrations used in the matching exercise.

29 Though the majority of for-profit entities file with the secretary and report an address either to the SOS or the CPA, which communicate address and other information to each other, 29 entities elected not to report an entity address in their filing forms, and 21 entities chose not to register with the SOS but rather to only file with the CPA. We used the SOSDirect online database, which contains digital scans of filing documents, to find the 29 entities without entity addresses, and the CPA’s website to identify the 21 entities.

30 On August 6, 2001, the Texas Secretary of State transitioned from storing files manually to the digital Business Entity and Secured Transactions (BEST) database. During this conversion process, only information on active entities was included in the new database. After 2001, information on an inactive entity was input into the system only if it was revived or an individual specifically requested the information on the entity. These 49 entities were inactive on August 6, 2001, so they do not have entity address information.

31 The remaining 159 entities cannot be grouped into meaningful categories and are missing from our sample due to other differences including registration outside Texas or Delaware and registration as not-for-profit entities among others.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 307.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.