Introduction to i18n
There are two terms that you may have heard which are sometimes (incorrectly) used interchangeably: internationalization and localization. These terms refer to adapting software in order to cater to different locales.
You may have also heard these terms abbreviated as i18n and L10n respectively which Wikipedia explains:
The terms are frequently abbreviated to the numeronyms i18n (where 18 stands for the number of letters between the first i and last n in internationalization, a usage coined at DEC in the 1970s or 80s) and L10n respectively, due to the length of the words.
While these two terms are directly related they are distinct in purpose.
According to WC3:
Internationalization is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.
Localization refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).
i18n is what makes L10n possible. If L10n are the cupcakes, i18n is the recipe. You cannot localize an application without i18n in place. Likewise the best i18n solution in the world is of little value if it's not put to use.
Going forward anytime I refer to L10n it comes with the assumption that i18n is present and accounted for and vice versa.
While i18n can be applied to applications and documents as well, for the sake of this article I will focus specifically on how it pertains to the web.
Why is i18n important?
Contrary to popular belief, the interwebs is more than just a series of tubes. It allows us to connect to an international network. Search engines such as Google allow users to control what languages their results may include. However, with the advent of social media and email, links are easily shared and viewed from anywhere in the world. As such you cannot guarantee that visitors to your site will be familiar, much less fluent in your locale. In the few short weeks that I've had this blog it has already been viewed in the United States, Great Britain, Germany and Brazil. That's without any attempt to promote my site – and no I haven't put any effort into localizing the content ;-)
It's a safe assumption that your site serves some sort of purpose. Whether it be a personal blog such as this, a full-blown web application or anything in between, it's been built with the intent of attracting traffic and ideally resulting in some sort of a conversion. While that conversion may mean a subscription to your RSS feed or adding revenue to your business, if a visitor to your site can't make sense of the content they're going to walk away. More on this later.
What needs to be localized?
There are four main types of information that need to be localized for your users:
- text
- dates
- numbers
- currency
You may also consider localizing units of measure (metric system vs English units), temperature (Fahrenheit vs Celsius) and direction of text (left to right vs right to left) depending on which locales you support and the content of your site.
Text
When it comes to localizing the content of your website, written text probably stands out as the most obvious thing to address. It is also the one that requires the least amount of explanation.
If a user to your site speaks English they will expect to be greeted with Welcome, while a user that speaks French will expect Bienvenue. In order for a user to properly navigate your site and understand the content presented, written text will need to be presented in a language that they can read.
Dates
If you see a date formatted as 05/10/12 what date do you interpret that to be? If you are from the United States you probably read it as May 10, 2012. If you are from Great Britain on the other hand it is likely read as October 5, 2012. Yet again if you are from Japan you may have seen it as October 12, 2005.
Properly formatting dates for a user's locale is important for the sake that if you invite them to an event or tell them that a package will be delivered by a certain date you will certainly want to be clear that they understand when that date actually is.
Numbers
Number formatting takes into consideration what characters are used to represent groups, decimals and negative values. The Michigan Stadium in Ann Arbor, Michigan is the sports stadium for the Wolverines football team. It is the largest stadium in the United States with a capacity of one-hundred and nine-thousand and nine-hundred and one people. If this were to be formatted for various locales, including decimal to illustrate the point (pun intended), it would be 109,901.00 in the United States, 109 901,00 in France or 109.901,00 in Germany.
Currency
Formatting currency is essentially the same as formatting numbers with the addition that currency symbol has to be accounted for. This includes both what symbol to be used for currency (i.e., $, €, £) as well as where that symbol should be displayed (i.e., before or after the amount being displayed). Displaying an amount of 123.45 in USD would be $123.45 as opposed to 123,45 € in EUR.
Additionally you may need to accommodate currency conversion. This involves storing currency values in a standard amount, for sake of example let's say USD. Then when displaying you would perform a conversion to show the amount adjusted for the rate of the user's locale. Assuming your site sells a product that costs $123.45 and you ship to Europe, you will want to show to your European customers that it costs 90.20 € (based on an exchange rate of 1 USD = 0.731 EUR at time of this writing).
Should I localize my site?
After all the preaching about why you should localize, let me get down off my soap box and be realistic. Frankly it doesn't always make sense to localize your website. As I said before I haven't attempted to localize this blog, nor do I intend to.
It's difficult to categorically say one type of website needs i18n and another doesn't. It should really be handled on a case by case basis by performing a litmus test. Ask the following questions:
- What is the likelihood that the site will receive traffic from a different locale?
- How much do you stand to gain by getting a conversion?
- How much will it cost you to enable i18n for your site?
Alternatively, you could use the following formula:
Where c
is the cost to add i18n support to your site, t
is the total amount of traffic to your site, L
is the percentage of that traffic coming from different locales, r
is the rate of all traffic that results in a conversion as a percentage and p
is the average dollar amount that you stand to profit from each conversion. This will result in the number of months to recoup the investment to support i18n.
Let's assume your site receives 1,000 visits a month, 5% of those visits are from varying locales, you have a 2% conversion rate and you make $25 per conversion on average. In this scenario you would be making $25 per month from users coming from different locales.
Now you need to compare that against how much it cost you to facilitate i18n. If you spend $2,500 to refactor your site (a one time cost) it would take you 100 months (eight years!) to break even. Hardly worth it.
On the other hand let's say that all things remain the same but your traffic from other locales increases to 33%. You would recoup your cost in just over a year.
This brings us back to the point made earlier regarding why i18n is important. By modifying this formula slightly we can also see how much money is being left on the table by not supporting i18n.
Using this formula in the case of 33% traffic from foreign locales you are missing out on $1,980 year.
Admittedly these formulas are flawed in that they don't account for an increasing or decreasing trend in site traffic. They should however provide a conservative baseline.
Up next
This intro has focused primarily on the what, when and why of i18n. In the next article we'll get more technical as we look at how to detect locale.
Open source hacker. Community organizer. Co-organizer @ReactRally. Software Sommelier.