About

Motivation:

In today’s rapidly evolving digital landscape, it’s important  for policymakers to have a clear understanding of their nation’s standing in software development and technological innovation. This knowledge is crucial to inform policy and investment decisions in order to foster productivity growth and competitiveness in the global market. 

The real challenge lies in accurately measuring software development. GitHub, the largest code repository hosting over 83 million developers globally, is increasingly regarded as a barometer for online innovation and research. For instance, the World Intellectual Property Organization (WIPO) incorporated an indicator based on GitHub commits in the 2022 edition of its Global Innovation Index (GII), highlighting the platform’s relevance in gauging developer creative outputs. Recently, GitHub introduced the Innovation Graph, an open data and insights platform aimed at equipping technology researchers, policymakers, and developers with reliable data on global software development trends and open-source movements.

This graph platform provides quarterly updates on software development activities, detailing contributions in each economy where there are at least 100 developers actively building software. According to the Innovation Graph, many Low and Middle-Income Countries (LMICs) like Bangladesh, Nigeria, and Pakistan have witnessed the most rapid growth in their developer populations. However, relying solely on the number of developer accounts might lead to inaccuracies, as not all accounts are necessarily active. The mere creation of an account does not guarantee ongoing activity. Therefore, this research aims to develop a nuanced metric to analyze software development across all LMICs, ensuring a more accurate representation and understanding of the landscape.

Research Questions/Objectives:

1. What is the ranking of LMIC countries based on the number of currently active developer accounts and how has the growth rate of active developers in LMIC countries evolved over the past five years, and which countries demonstrate significant acceleration in technological innovation?

2. What is the ranking of  LMIC countries based on the average level of activity (e.g., commits, contributions) per developer account and how have it grown in the past 5 years?

3. Utilizing the Global Innovation Index’s (GII) metrics, how do LMIC countries compare to one another in terms of their ‘online innovation’, particularly in the context of software development and technological progress? How does that compare to our findings?

4. How does our findings of software innovation compare to the Global innovation graph report released by Github?

5. Based on our findings do we see any correlation with economic growth and software development amongst LMIC countries?

Potential Findings and Answers:

  • Variations in the level of engagement

Preliminary analysis may reveal significant variations in the level of engagement and activity among developers in different LMICs, highlighting a diverse landscape of software innovation. We might discover that certain LMIC countries, potentially those with burgeoning tech hubs and government support for IT education, rank significantly higher in terms of active developer accounts and average activity. These nations could show a consistent increase in active developers over the past five years, signaling a thriving and expanding local software industry. Significant acceleration in technological innovation may be particularly evident in countries that have invested in digital infrastructure and education, reflecting in their growing numbers of active developers.

  • Comparison with Existing Reports

Our findings might align closely with those from the GitHub Innovation Graph and WIPO GII, reinforcing the reliability of public data in reflecting software development trends. But since we are discarding inactive accounts discrepancies could arise, prompting a discussion about the factors influencing these differences.The analysis could provide a nuanced understanding of how LMICs fare in terms of online innovation compared to global standards, potentially highlighting underrecognized regions of innovation.

  • Relationship Between Developer Activity and Economic Growth

A correlation analysis might indicate a positive relationship between the density of active developers (or the intensity of their activity) and economic growth indicators within LMICs, suggesting that a vibrant software development sector contributes to broader economic development.Conversely, the findings could reveal that while some countries have high developer activity, it does not directly translate into immediate economic growth, pointing to other mediating factors such as the business environment, regulatory frameworks, and access to capital.

Expected Limitations: 

  • Under representations due to not having data on private activity:

One of the primary limitations of this research stems from the reliance on GitHub data, which predominantly covers activities within public repositories. GitHub’s privacy policies restrict access to private repository data, leading to a potential underrepresentation of the full spectrum of software development and innovation activities. Many developers and organizations opt to keep their repositories private for reasons ranging from intellectual property concerns to the early-stage development of projects. This limitation means that our analysis might not capture a comprehensive view of software innovation in LMIC countries, as significant portions of development work could be occurring in these private spaces. This gap in data poses a challenge to accurately gauging the total volume and quality of software development efforts across different regions, necessitating cautious interpretation of findings and acknowledgment of the potential for unobserved activities.

  • Developer Identification

Identifying developers as belonging to specific LMIC countries can be complex due to the global and often anonymous nature of the GitHub community. Developers may not always specify their location, or they might contribute from different locations over time.

  • Quantitative vs Qualitative Activity Measurement

Defining and measuring “activity” on GitHub (such as commits, pull requests, and contributions) can be challenging. Not all activity is equally significant, and high commit counts don’t always correlate with high-quality or impactful development work.

  • Bot Activity 

The presence of bots on GitHub, which can significantly affect metrics by automating workflows, presents a challenge in ensuring the accuracy of the data. Identifying and filtering out bots is recognized as an important task that may require additional research efforts.

  • Non-GitHub OSS Activities

The OSS ecosystem is not limited to GitHub. There are numerous other platforms and venues where OSS development occurs, such as GitLab, Bitbucket, and specific project forums. GitHub metrics might not capture the entirety of OSS efforts, leading to an incomplete picture.

  • Dependency on Third-party Service

Prior to the development of the GitHub Innovation Graph, researchers needed to rely on third-party services for GitHub data, which might not always be up-to-date or complete. This dependency could skew research outcomes and insights due to incomplete or outdated data.