Web Development is infinitely more troublesome when you have documents in languages other than American English. The onus is on us web developers and server administrators to make sure browsers and search engines can detect the right language. Here is how you can declare the language of your document in HTML 5.
What is language declaration?
This is a way to specify what language a HTML document or a snippet of HTML text is in. Language declaration does not provide information on character encoding and the text direction (right to left or left to right). Those need to be declared separately.
Why specify a language?
Language information can be used for:
- Text to speech converters (e.g. speak Canadian french rather than french)
- Selecting the right fonts for display (e.g. use traditional chinese script instead of the simplified one)
- Selecting the right dictionary for browser spell-checks in forms (use UK English rather than US English)
- Rendering the page correctly — in short deliver the document in its most natural language as possible.
In HTML 5, there are 3 ways to declare the language of a HTML document:
As a pragma directive e.g.(W3C’s HTML5 validator now reports the following error: “Using the meta element to specify the document-wide default language is obsolete. Consider specifying the language on the root element instead.”)
<meta http-equiv="content-language" content="en">
As part of header in HTTP response, e.g. below:
HTTP/1.1 200 OK Date: Wed, 05 Nov 2003 10:46:04 GMT Server: Apache/1.3.28 (Unix) PHP/4.2.3 Content-Location: CSS2-REC.en.html Vary: negotiate,accept-language,accept-charset TCN: choice P3P: policyref=http://www.w3.org/2001/05/P3P/p3p.xml Cache-Control: max-age=21600 Expires: Wed, 05 Nov 2003 16:46:04 GMT Last-Modified: Tue, 12 May 1998 22:18:49 GMT ETag: "3558cac9;36f99e2b" Accept-Ranges: bytes Content-Length: 10734 Connection: close Content-Type: text/html; charset=iso-8859-1 Content-Language: enExample from W3C article on Internationalization Best Practices
langattribute on a HTML element e.g.
<div lang="fr">, or a
xml:langattribute on XML documents like MathML and SVG.
The first two ways of specifying language is used to identify the intended audience of the HTML document. This information is used in the following ways:
- Search Engines use this for determining which document to include in search results (e.g. it will not show a document with content-language set as Chinese if a search is looking for english documents, but most search engines use more than these two to determine which documents to show).
- Content negotiation by Apache servers based on the language preference set by the users on their browsers.
- Identify the default language of a document This concept is new in HTML 5. If you specify only one language using the above two methods (i.e.
<html lang="en">instead of
<meta http-equiv="content-language" content="en, fr">), then the text of the entire document is processed as that language (except for the text that is contained in an element which has another
langattribute, which is processed as the language tag value in
The last method is to explicitly declare a language to be used for text processing by the user agent. Use the
lang attribute if you want the browser to process the text in that HTML element in a specific language.
The language code that comes after
meta http-equiv or in
lang attribute need to be from subtags in the IANA language subtag registry. You can read more on choosing language values here
Default Language of a Document
Unless you explicitly use the lang attribute to define the language of the document, HTML 5 specifies the following inheritance rules to determine the language of a HTML element:
The HTML element has a
lang attribute (e.g.
<span lang="en">), if not —
The nearest parent of that element has a lang attribute, if not —
The document has a single language tag set through pragma directive (e.g. —
<meta http-equiv="content-language" content="en">), if not
The HTTP header Content-Language contains a single language tag, if not —
The document is treated as that of an unknown language.
This is not the last word on detecting the language of a document, but for the time being, if your document has content that is mostly not English, use the
lang attribute on the
<html> element to specify the language. If there are elements of the document which use language other than the one specified for the whole document, use
lang attribute for each such element.