In a discussion with Taiwen Jiang, the author of Pi Engine (Zend Framework 2 powered CMS) at GitHub regarding multi-lingual site structure, I mentioned some possible solutions. This inspired me to write the article about multi-lingual content organization in databases in general.
No matter how a CMS is popular it doesn’t mean that it has all features. In fact, many of the most popular CMS systems (read WordPress, Drupal) are NOT able to create multilingual content at all! Maybe there are some plugins for this purpose but I’ve tried them all (free and paid) and they’re far away from real-life usage. You know, core systems are designed in one way and plugins simply cannot change the database structure.
Caveats
There are several ways to model multilingual content in a relational database. It is important to understand these differences in order to make an informed decision about which one fits your content model best.
1) Storage in language columns
Each translated value is stored in a new database column alongside its original record, e.g. a Content column gets extended to Content_en and Content_de.
Advantages: Translation can be limited to certain columns.
Disadvantages: If applied to complex data like hierarchical pages, it only works if the content structure is very similar between languages. It would be difficult to e.g. have a new page section just in one language, and still retain all the framework’s features (e.g. permission checks).
2) Storage in language tables
Similar to “Storage in language rows”, but creates a new table for each
Disadvantages: All-or-nothing approach to column translation (including columns where translation doesn’t make much sense, like numeric values). More complex data model with relational tables.
3) Storage in language rows with relation to master content
Each translated record gets copied to a new row in the same table, retaining the original database column layout.
Advantages: Allows for flexible structures like page trees per language, and permission checks per language. Works transparently with most other modules which modify queries (e.g. the “subsites” module).
4) Storage in language rows without relation to master content
This approach doesn’t have nothing similar with above solutions because it assumes content in different languages just as another new content.
Advantages: It allows creating completely independent site structure for each language separately.
Disadvantages: Content is separated so for each content in one language so you need to manually create it for all other languages. Note that this approach differs from “Storage in language rows with relation to master content” because records are not linked, i.e. page-one-en is not linked in any way to page-one-de.
What approach is the best
From the possibilities mentioned above it’s easy to conclude that “Storage in language rows with relation to master content” is far the best approach suitable for almost all needs. This solution is used in SilverStripe CMS and OpenCart (I think even in ExpressionEngine, but not sure). I’m very satisfied how easy is to edit content in different languages with this approach.
Alternatives
If you know any other alternatives I’ll appreciate if you mention them in the comment form below.
I would suggest having a a table for the master content that conatin only neutral values like id, category_id, user_id, default_lang etc.
then in a separate table called content_translation having everything else:
content_id (FK, to content)
language
content_body
content_preview
etc.