A more serious study of the Public Git Archive (PGA)

Following up on the Octoverse clues, I uncovered this GEM — Markovtsev, Vadim, and Waren Long. “Public git archive: a big code dataset for all.” In Proceedings of the 15th International Conference on Mining Software Repositories, pp. 34-37. ACM, 2018. you can look at the arXiv version here.

This study point to the following being the most popular programming languages

  1. C
  2. JS
  3. C++ 
  4. Java
  5. PHP
  6.  Go
  7. Python
  8. Obj-C
  9. C#
  10. Ruby

If you’re into data mining and analysis of REALLY large public datasets, this one offers lots to work with. According to the authors, the Public Git Archive occupies 3.0 TB on disk .   Enjoy ..

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.