ABSTRACT
Many current and future data scientists will be “isolated”—working alone or in small teams within a larger organization. This isolation brings certain challenges as well as freedoms. Drawing on my considerable experience both working in the professional sports industry and teaching in academia, I discuss troubled waters likely to be encountered by newly minted data scientists and offer advice about how to navigate them. Neither the issues raised nor the advice given are particular to sports and should be applicable to a wide range of knowledge domains.
Acknowledgments
I am grateful to Nicole Lazar, Lance Waller, Nicholas Horton, Luke Bornn, Jenny Bryan, and Hadley Wickham for helpful comments on previous drafts of this article.
Notes
1 There is a well-read Isolated Statistician mailing list for statisticians in industry and academia in similar positions.
2 https://twitter.com/gshotwell/status/577485681146097664
3 Please see https://help.github.com/articles/good-resources-for-learning-git-and-github/GitHub Help for a list of resources for learning how to use git and GitHub.
4 https://en.wikipedia.org/wiki/Extract,_transform,_load
5 https://en.wikipedia.org/wiki/Apache_Spark
6 https://en.wikipedia.org/wiki/Cron
7 https://cran.r-project.org/web/packages/etl/index.html
8 https://en.wikipedia.org/wiki/GNU/Linux_naming_controversy
9 https://en.wikipedia.org/wiki/.NET_Framework
10 https://en.wikipedia.org/wiki/PHP
11 https://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt
12 http://www.bloomberg.com/news/articles/2015-06-16/my-time-with-the-architect-of-the-astros-ground-control-database
13 For a terrifying story of a rogue data scientist, read about Chris Correa (The Associated Press Citation2016). Correa held a job similar to mine with the St. Louis Cardinals, but was convicted of hacking into the Houston Astros’ proprietary database and was subsequently sentenced to nearly four years in prison (http://www.si.com/mlb/2016/07/18/cardinals-chris-correa-hacks-astros-prison-sentence).
14 https://www.coursera.org/
15 http://www.datacamp.com
16 http://www.espn.com/blog/sweetspot/post/_/id/48166/moneyball-before-moneyball-was-cool
17 http://www.si.com/vault/2011/09/26/106111997/the-art-of-winning-an-even-more-unfair-game
18 http://grantland.com/the-triangle/pittsburgh-pirates-mike-fitzgerald-mit-sabermetric-road-show/
19 http://www.bloomberg.com/news/articles/2014-08-28/extreme-moneyball-houston-astros-jeff-luhnow-lets-data-reign
20 https://twitter.com/octonion
21 http://www.nytimes.com/2016/07/25/sports/baseball/michael-fishman-solving-the-yankee-equation.html
22 http://www.espn.com/blog/new-york/mets/post/_/id/46665/stat-guru-baumer-leaving-mets-to-teach
23 https://en.wikipedia.org/wiki/Svengali
24 https://www.baseballamerica.com/today/features/050107debate.html
25 https://en.wikipedia.org/wiki/Domain_knowledge
26 http://www.r-bloggers.com
27 https://en.wikipedia.org/wiki/Multicollinearity
28 https://en.wikipedia.org/wiki/Homogeneity_(statisti-cs)
29 I was introduced to this term by Andrew Bray of Reed College.
30 http://mlb.com/team/front_office.jsp?c_id=mil
31 http://mlb.com/team/front_office.jsp?c_id=ana
32 http://mlb.com/team/front_office.jsp?c_id=phi
33 https://www.linkedin.com/in/robertsebastian
34 https://www.linkedin.com/in/agaldi1