535
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Advancing intelligence analysis: using natural language processing on East Pakistani intelligence documents

Pages 740-763 | Published online: 13 Feb 2023
 

ABSTRACT

This article demonstrates how natural language processing (NLP) can be used by intelligence practitioners and scholars to analyse text. Using decades of unredacted East Pakistani intelligence reports declassified and released by the Government of Bangladesh, the article shows how machine learning can provide insight into intelligence documents. In particular, it provides a case study for how NLP can provide quantitative analysis that complements the work of qualitative analysis. Simultaneously, this article also demonstrates how NLP can provide historians with a quantitative methodology to better understand historical government records.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. Special thanks to the reviewers and ASM Ali Ashraf of the Dhaka University, Bangladesh for their comments and critiques.

2. Obermeier, Natural Language Processing Technologies in Artificial Intelligence, 25.

3. Blei, et al, “Latent Dirichlet Allocation”, 993–1022.

4. “How Intelligence Works”, IntelligenceCareers.gov.

5. Ashraf, “Bangladesh: Intelligence Culture and Reform Priorities”.

6. Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh.

7. For a review of the volumes and their significance, see: Shaffer, “Secret Documents of Intelligence Branch on Father of The Nation”.

8. “PM Hasina: Secret documents on Bangabandhu to be nice resources for researchers”.

9. Officially, the first ten volumes have 2,065 reports according to the table of contents. However, there are publishing errors as pages 480–485 of the first volume are blank and no actual reports exist. The final output is: volume 1 has 316, volume 2 has 286, volume 3 has 230, volume 4 has 234, volume 5 has 159, volume 6 has 150 volume 7 has 233, volume 8 has 252, volume 9 has 124 and volume 10 has 79.

10. For example, a survey of Intelligence and National Security articles and book reviews finds just one book that provides details about Bangladesh’s intelligence community. It is: Shaffer, “Intelligence, National Security and Foreign Policy”.

11. Schendel, A History of Bangladesh, 79.

12. Ibid., 107.

13. Ibid., 109.

14. Ibid., 113.

15. Ibid., 117.

16. Ali, From East Bengal to Bangladesh, 253.

17. ”PM Hasina: Secret documents on Bangabandhu to be nice resources for researchers”.

18. Rahman, The Unfinished Memoirs.

19. Schendel, 119.

20. For more on the genocide and targeted killings, see: Bass, The Blood Telegram.

21. Hasina was the main interviewee in Hasina: A Daughter’s Tale, a 2018 docudrama about Rahman’s and rest of her family’s assassination.

22. For example, see: Shaffer, “Islamist Attacks Against Secular Bloggers in Bangladesh”.

23. Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume I (1948–1950), xiv.

24. Weaver, “Translation” in Machine Translation of Languages, 15. For more on Weaver and his contributions to machine translation, see: Hutchins, “Warren Weaver and the Launching of MT”, in Early Years in Machine Translation.

25. Hutchins, “The First Decades of Machine Translation: Overview, Chronology, Sources”, in Early Years in Machine Translation, 7, 8.

26. Barr, “Natural Language Understanding”, 6.

27. Ibid., 6.

28. “How Intelligence Works”.

29. For example: “RCA/ACSIMATIC Briefing”; “Task Team VI – Research & Development”.

30. Heuer, “Adapting Academic Methods and Models to Governmental Needs“, 1, 2.

31. ”Memorandum for the National Foreign Intelligence Committee Principals”.

32. Ibid.

33. “25th Anniversary: 1988 Annual Report“.

34. Richelson, “The Wizards of Langley”, 98.

35. Ibid., 98.

36. “About”, History Lab.

37. “Projects”, History Lab.

38. Connelly, et al. “Diplomatic Documents Data for International Relations”, 777, 778. For similar methods used in different work by scholars affiliated with the History Lab, see: Chaney, et al, “Detecting and Characterizing Events”.

39. Connelly, et al. “New Evidence and New Methods for Analyzing the Iranian Revolution as an Intelligence Failure”, 786.

40. Katagiri and Min, “The Credibility of Public and Private Signals”, 162, 170.

41. Heuer, “Preface”, ix.

42. For example, see: “About LexisNexis”, LexisNexis.

43. For example, see the Government of Bangladesh’s published timeline. “Timeline: 100 Years of Mujib”.

44. All the pre-processing steps except stop word removal were performed with the spaCy pipeline, which is a pretrained deep neural network model that quickly performs common NLP tasks with a high degree of accuracy and precision. See https://spacy.io/ for spaCy documentation. “Industrial-Strength Natural Language Processing”.

45. Blei, et al, “Latent Dirichlet Allocation”, 993–1022.

46. Grimmer, “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases”, 1–35; Roberts, et al, “A Model of Text for Experimentation in the Social Sciences”, 988–1003; Greene, et al, “Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach”, 77–94; Beauchamp, “Predicting and Interpolating State‐Level Polls Using Twitter Textual Data”, 490–503; Gerrish, et al, “Predicting Legislative Roll Calls from Text”, 489–496; Mueller, et al, “Reading between the Lines: Prediction of Political Violence Using Newspaper Text”, 358–75.

47. This article used the Gensim implementation of LDA in the Python programming language. For documentation, see: “Topic Modeling for Humans”, Gensim.

48. Mimno, et al., “Optimizing Semantic Coherence in Topic Models”, 262–72.

49. Antoniak, et al, “Evaluating the Stability of Embedding-Based Word Similarities”, 107–19.

50. Ibid.

51. The co-occurrence matrices were generated with the quanteda R package; the PPMI matrices with the wordspace package. See Benoit, et al., “Quanteda: An R Package for the Quantitative Analysis of Textual Data”, 774; Evert, “Distributional Semantics in R with the Wordspace Package”, 110–14.

52. SVD matrices were generated with the rsvd package in R. See Erichson, et al., “Randomized Matrix Decompositions Using R”.

53. See: Jurafsky, et al, “Speech and Language Processing (Draft), 3rd”, Ch. 6. for a detailed explanation of the PPMI-SVD method.

54. Rodman, “A Timely Intervention”, 87–111; Nikhil Garg et al., “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes”, 3635–44; Kozlowski, et al, “The Geometry of Culture”, 905–49; Rheault and Cochrane, “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora”, 112–33.

55. See the Government of Bangladesh’s published timeline. “Timeline: 100 Years of Mujib”.

56. See Rahman, The Unfinished Memoirs.

57. “Timeline: 100 Years of Mujib”.

58. For example, Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume II (1951–1952), 12.

59. Some of the report types are listed in the abbreviations section. Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume I (1948–1950), xvii – xviii.

60. There are numerous speeches throughout the volumes. For example, Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume IV (1954–1957), 18, 21.

61. Rahman, The Unfinished Memoirs.

62. Maqsood Ali, 91; Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume IV (1954–1957), 19.

63. Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume V (1958–1959), 237.

64. Ibid., 15.

65. Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume IV (1954–1957), 9.

66. For example, Ashrafuddin was watched, see: Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume IV (1954–1957), 241, 242, 254. For Zahur examples, see: Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume IV (1954–1957), 272, 275; Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: and Bangabandhu Sheikh Mujibur Rahman: Volume VI (1960–1961), 336, 407.

67. For example, see: Hasina (ed.), Secret Documents of Intelligence Branch on Father of The Nation, Bangladesh: Bangabandhu Sheikh Mujibur Rahman: Volume IV (1954–1957), 212.

68. Schafer, “Issues in Assessing Psychological Characteristics at a Distance”, 511–27.

Additional information

Notes on contributors

Ryan Shaffer

Ryan Shaffer has a PhD in history with expertise in extremism and security. He has published hundreds of articles and reviews in numerous journals. Shaffer is an editorial board member for the Journal of Policing, Intelligence and Counter Terrorism. His books include African Intelligence Services: Early Postcolonial and Contemporary Challenges and The Handbook of Asian Intelligence Cultures.

Benjamin Shearn

Benjamin Shearn is a PhD candidate in public policy at George Mason University and has published chapters about the role of cyber security in international conflict escalation and the intelligence profession. After an early career in information technology and security, he became a computational social scientist specialising in applied machine learning for political and military analysis. His current research focuses on measuring policy-maker identity and threat perception with natural language processing techniques.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 322.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.