Following up on the Octoverse clues, I uncovered this GEM — Markovtsev, Vadim, and Waren Long. “Public git archive: a big code dataset for all.” In Proceedings of the 15th International Conference on Mining Software Repositories, pp. 34-37. ACM, 2018. you can look at the arXiv version here.
This study point to the following being the most popular programming languages
- C
- JS
- C++
- Java
- PHP
- Go
- Python
- Obj-C
- C#
- Ruby
If you’re into data mining and analysis of REALLY large public datasets, this one offers lots to work with. According to the authors, the Public Git Archive occupies 3.0 TB on disk . Enjoy ..