TIL JS: Safely Reversing Unicode Strings

spec Dec 16, 2019

TL;DR:  
Instead of using:
String.prototype.length
String.prototype.split('').reverse().join('')

You should be using:
Array.from(str).length
Array.from(str).reverse().join()

There is a lesser known trick when it comes to handling Unicode.  Many of the standard functions and method's we've come to know and love don't correctly handle many charters like emojis. Many JS native JavaScript functions assume your text contains only the first 65,536 Unicode charters ("code points"). This initial range is called the Basic Multilingual Plane (BMP)- and outside this range is called the "astral planes". Spooky terminology really.

In practice, the trick here is to know that most methods like String.prototype.length (AKA: 'hello'.length) and String.prototype.split (AKA: 'hello'.split('') specifically count UTF-16 code units, rather than code points. A dry distinction really, but an important one. By rote memory, you should know that ES6's Array.from method correctly chunks characters like emojis together.


ASIDE: Jump down the Unicode rabbit hole and checkout Awesome Unicode, a really comprehensive, and surprisingly exciting, guide on Unicode.

Here's how it all works out.



const str = 'taco 🐛 cat';

str.split('').reverse().join('');
:> "tac �� ocat"


Array.from(str).reverse().join('');
:> "tac 🐛 ocat"

Wisdom

The developers behind Wisdom, building amazing dev tools for web apps. We're logging every rage click, console log, network request, and stack trace, and redux action— with HTML replay.