China’s Hyperscalers Strive to Keep Pace in Open Source

We have a good sense of what projects U.S. companies open source but when it comes to Chinese webscale companies, most notably the big three—Baidu, Alibaba, and Tencent—that ecosystem is less public and not often discussed.

For instance, even if you work in open source at a large company, it is unlikely you’ve head of projects like X-Deep Learning, which is Alibaba’s production ML framework for sparse datasets or the Mars cross-platform network component developed at Tencent. Conversely, most of our readers will have heard of open source efforts from companies including Tensorflow, which started inside Google or from Facebook with well-known projects like Hydra for application configuration or Hermes for mobile, or at AWS with its boatload of tools and platforms built internally before being released into the open source wild.

There are occasional announcements from some of China’s majors that cross the U.S./China divide and garner interest like Alibaba’s “Alink” platform for machine learning data collation, for instance, but news like this is a once-per-year event at most.

In short, U.S. webscale companies have been leaders in open sourcing their own tooling but the view into how similarly sized (if not larger) companies in China has been muddled. Transparency has been granted at least in part following a multi-university study that focused on China’s big three tech companies and their various open source undertakings.

Although it is based on a relatively small sample size for the qualitative section, it does reveal something that might be lost on us here in the U.S. China’s largest web companies are emulating the West’s emphasis on open source as beneficial for a host of reasons but where these companies choose to put their open source foot forward is different, as highlighted below. For instance, take a look at Google’s list of open source machine learning tools or even its list for networking as part of its 1957 repositories on GitHub to Alibaba’s 428 repositories for cloud-focused efforts.

Work began with a collection of 1,000 open source projects from those companies with a survey from internal developers working on those projects and examined both the technical rollout of these open source initiatives and how developers at the Chinese webscale giants perceived open versus closed source development.

The chart below provides a summary of where open source innovation is most rooted, but there were some other nuanced findings about how these developers undertake the open source effort, more on that momentarily. Of the 1,000 open source projects, the majority were at Alibaba (520) with 380 from Baidu and only 100 from Tencent.

The dominant category of projects that were open sourced by BAT is frontend development, which accounts for 71.8% of the total. The second most common category is backend development, which takes up 10.2% of the total. Although the backend development category accounts for the second highest number of occurrences, the number of such projects is far less than the frontend development category. Conversely, operating system and management and monitoring categories have the least occurrence in our studied projects, which only account for 1.1% and 1.6% of the total, respectively.

Some of the key findings include the idea that frontend projects were more likely to be open sourced and of those, most fell into the tooling and frameworks category. When it came to rationale for open sourcing, gaining “fame” and expanding influence and recruitment advantages were at the top of the list. Perhaps most interesting was that respondents with the most experience inside these companies were less positive about open sourcing software projects, but this response was in the minority (88% of respondents were positive about open sourcing software efforts).

“88% of respondents are positive towards open sourcing software projects, respondents with high experience are more negative to open sourcing, while respondents with low experience are more positive. There is also room for future research to dig in-depth on why participants with different demographics show different attitudes towards open sourcing effort, why Chinese companies prefer to open source frontend development projects, etc.”

Also noteworthy is that respondents found that having an English version of the readme and in the code comments was key to internationalization and this push to make these projects international in reach is keenly felt. ”Results show 73% of the respondents said that they had considered internationalization of open source projects in our survey, while our project analysis shows that only 552 out of 1,000 (55.2%) open source projects have considered internationalization (i.e., including English readme files).”

“It implies that companies and developers need to do more things to improve open source projects’ internationalization process in practice. Meanwhile, our results show that open source projects with high internationalization effort show higher popularity than projects with low internationalization effort. It also highlights the value that Chinese technology companies can reap to pursue internationalization.”

If you follow open source software for cloud platforms and webscale infrastructures, test yourself: how many of these projects have you ever heard of from Alibaba versus Google’s list or that from Facebook? Is it just a lack of cultural exchange/language barriers or is there something else preventing a freer exchange between open source communities among top cloud contenders? More information about this study can be found here.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

1 Comment

  1. I think most of the opensource projects from the Chinese hyperscalers just have less users and exposure in the English-speaking community. Moreover, in some cases, their AI/ML frameworks are geared more toward edge-processing (at least in my experience with Tencent AI framework). So, you have two barriers: use case and some (but not insurmountable) language barrier.

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.