Divya Manian

RSS Feed Youtube Channel Github

Declaring Languages in HTML 5

Web Development is infinitely more troublesome when you have documents in languages other than American English. The onus is on us web developers and server administrators to make sure browsers and search engines can detect the right language. Here is how you can declare the language of your document in HTML 5.

What is language declaration?

This is a way to specify what language a HTML document or a snippet of HTML text is in. Language declaration does not provide information on character encoding and the text direction (right to left or left to right). Those need to be declared separately.

Why specify a language?

Language information can be used for:

  • Text to speech converters (e.g. speak Canadian french rather than french)
  • Selecting the right fonts for display (e.g. use traditional chinese script instead of the simplified one)
  • Selecting the right dictionary for browser spell-checks in forms (use UK English rather than US English)
  • Rendering the page correctly — in short deliver the document in its most natural language as possible.

Language processing

In HTML 5, there are 3 ways to declare the language of a HTML document:

  • As a pragma directive e.g. <meta http-equiv="content-language" content="en"> (W3C’s HTML5 validator now reports the following error: “Using the meta element to specify the document-wide default language is obsolete. Consider specifying the language on the root element instead.”)
  • As part of header in HTTP response, e.g. below:

    HTTP/1.1 200 OK
    Date: Wed, 05 Nov 2003 10:46:04 GMT
    Server: Apache/1.3.28 (Unix) PHP/4.2.3
    Content-Location: CSS2-REC.en.html
    Vary: negotiate,accept-language,accept-charset
    TCN: choice
    P3P: policyref=http://www.w3.org/2001/05/P3P/p3p.xml
    Cache-Control: max-age=21600
    Expires: Wed, 05 Nov 2003 16:46:04 GMT
    Last-Modified: Tue, 12 May 1998 22:18:49 GMT
    ETag: "3558cac9;36f99e2b"
    Accept-Ranges: bytes
    Content-Length: 10734
    Connection: close
    Content-Type: text/html; charset=iso-8859-1
    Content-Language: en            
    
    Example from W3C article on Internationalization Best Practices
  • As lang attribute on a HTML element e.g. <div lang="fr">, or a xml:lang attribute on XML documents like MathML and SVG.

The first two ways of specifying language is used to identify the intended audience of the HTML document. This information is used in the following ways:

  • Search Engines use this for determining which document to include in search results (e.g. it will not show a document with content-language set as Chinese if a search is looking for english documents, but most search engines use more than these two to determine which documents to show).
  • Content negotiation by Apache servers based on the language preference set by the users on their browsers.
  • Identify the default language of a document This concept is new in HTML 5. If you specify only one language using the above two methods (i.e. <html lang="en"> instead of <meta http-equiv="content-language" content="en, fr">), then the text of the entire document is processed as that language (except for the text that is contained in an element which has another lang attribute, which is processed as the language tag value in lang attribute).

The last method is to explicitly declare a language to be used for text processing by the user agent. Use the lang attribute if you want the browser to process the text in that HTML element in a specific language.

The language code that comes after Content-Language or content in meta http-equiv or in lang attribute need to be from subtags in the IANA language subtag registry. You can read more on choosing language values here

Default Language of a Document

Unless you explicitly use the lang attribute to define the language of the document, HTML 5 specifies the following inheritance rules to determine the language of a HTML element:

The HTML element has a lang attribute (e.g. <span lang="en">), if not —

The nearest parent of that element has a lang attribute, if not —

The document has a single language tag set through pragma directive (e.g. <meta http-equiv="content-language" content="en">), if not

The HTTP header Content-Language contains a single language tag, if not —

The document is treated as that of an unknown language.

Bottomline

This is not the last word on detecting the language of a document, but for the time being, if your document has content that is mostly not English, use the lang attribute on the <html> element to specify the language. If there are elements of the document which use language other than the one specified for the whole document, use lang attribute for each such element.

Comments