Second-tier languages on the web.

I just finished reading the ‘Net Lang‘ UNESCO report on ‘the multilingual cyberspace’. I didn’t find the report overall earth shattering, but did note a few points of interest, which I’d like to share here.

The first thing I’d like to note is that nobody really seems to know what’s happening on the web, or be able to measure it. I posted charts from wikipedia in a previous post, and commented on the discrepancy between English language contents vs English speakers online. All authors agree that English content is disproportionately high. But the exact measure is unclear. In spite of various projects, there seems to be no clear figures about  language on the web. Given the abundance of contents, including huge quantities of dynamic contents (blogs, facebook pages, twitter…), it is, at present, impossible to give any clear calculations: lack of proportion does not help generate percentage points. This initial remark serves as a caveat for what follows.

The second point is something I already noted when reflecting on language policy within Australia: there is remarkably little talk of ‘second-tier’ languages, and how to strategically engage with them. In rough terms, the situation is as follows: there are about 6000 to 7000 languages spoken on the planet, but a small number of them dominate the offline world – and even more so the online world. Roughly speaking, on the internet, 1 language (English) accounts for about 50% of all contents, and about 60 to 70 account for over 90% of all contents. Many author defend ‘minority languages’ – those within the under-represented 10%. But no-one seems to really focus on the ‘second-tier’ dominant languages – Chinese, Japanese, Russian, Arabic, Spanish, etc – which alternatively fall in with the rest of under-represented languages, or – more frequently – are bundled together with English among ‘privileged’ languages which do benefit from an established set of standards, and are recognised by multilingual browsers and translators.

These  ‘second-tier languages’ are precisely those I am most interested in. They represent about one third of all contents, and two thirds of all users. What will happen to that proportion in the close future? Are they going to challenge the dominance of English? Chinese, particularly, but also Spanish, Portuguese, Arabic… Is the web gearing towards an equal proportion of English and Chinese? Or is English going to remain the dominant form, a necessary koine for web communication?

While I prepare further reflections about machine-assisted language learning vs automatic translation, and scenarios for the future of digital multilingualism, I’d love to hear your thoughts on this – and link to any material on that question!

3 thoughts on “Second-tier languages on the web.

  1. “These ’second-tier languages’ are precisely those I am most interested in. They represent about one third of all contents, and two thirds of all users”

    Hey Julien, this is really Interesting. I believe that we are going to assist to a re-equilibration. The reason is simple: it’s what the market wants.

    I mean, let’s say that you want to start a business and you speak both English and Chinese. What would be easier to monetize?

    A saturated market where it’s basically impossible to get to the first page of Google for decent keywords (cause there are too many English websites already established) or a market that is still under-developed as the Chinese one?

    The latter market seems way easier to “exploit.”

    I have some first-hand experience.

    I blog in both Italian and English which, according to your statistics, are both saturated markets (that is more internet presence than real people on the world). However the English market is much more saturated than the Italian one, and the overall quality of the websites is superior.

    I post exactly the same articles in both sites and, while there are so many English speakers on the web, my traffic is divided 50/50 between English and Italian language.

    Why? Cause it’s easier to rank for Italian language. Also, I get way more interaction from Italian people than English people (70% of my email subscribers are from Italy).

    I believe the reason is that while in English language there are so many excellent resources, in Italian language you often struggle to find them. Most of pages you see are still web 1.0 or plain spam. So when people find an good resource (I believe my website is good) they stick around.

    This is why I decided to focus more on second tiers languages (in particular Spanish because it’s the one I master) and less on English.

    Actually, I just started the same blog in Spanish (with a catalan friend) which, as for Chinese, is under-developed on the web.

    I’m curious to see what will happen in 6 months/ 1 year. Since there are so many Spanish speakers and so few websites in Spanish language, I actually expect it to be the most “successful” version of the blog (in terms of traffic) even if I started one year later.

  2. Pingback: ¿Qué es el proyecto Marco Polo? Entrevista con su fundador Julien Leyre

  3. Pingback: What’s Marco Polo Project? Interview with the founder Julien Leyre

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s