How do well-known MySQL users such as Facebook, Twitter, and Github store four-byte Unicode characters?
-
MySQL's utf8utf8utf8 encoding uses three bytes to store each character, which prevents the use of characters outside of the basic multilingual plane (for example, Emoji characters). MySQL 5.5.3 adds the utf8mb4utf8mb4utf8mb4 encoding, which supports these, but this version was only released in 2010, long after Facebook, Twitter and Github got started. How do big sites such as these store four-byte utf8 characters in a way that is indexable? Do they simply use a binary column?
-
Answer:
I've never heard of a database actually scanning 8-bit string input for illegal characters and rejecting the insert/update. However if you use a UTF-16 interface to the database (many programming languages including Java and Javascript and the standard Microsoft and Apple libraries for C use UTF-16) the database has to do conversion, and a non-BMP character that is a surrogate pair of two 16-bit units in UTF-16, may become a pair of 3-byte UTF-8-style representations of each surrogate unit, i.e. 6 bytes of incorrect UTF-8 rather than 4 bytes. This format is common enough for Unicode Consortium to acknowledge and name it https://en.wikipedia.org/wiki/CESU-8 though this is a technical report not part of the standard. A common situation with character code problems is that characters are stored in a way which is nonstandard, but supports round-trip conversion so that users can store and retrieve the character, and compare it for equality, although there might be more subtle bugs. However, if you fix some of the system to be correct but don't complete this, you can break existing data in the database, which all needs to be converted to the new representation.
Joseph Boyle at Quora Visit the source
Related Q & A:
- How to combine Unicode characters?Best solution by Stack Overflow
- How to share native image and text on Facebook and Twitter?Best solution by Stack Overflow
- How do you search for users in Yahoo?Best solution by Yahoo! Answers
- How well known is Australian football?Best solution by Yahoo! Answers
- What are some well-known colleges?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.