First Look at Encodings

Extended character sets are encoded into smaller ones by a variety of methods. Recognize them by what they do with common characters.

Any nominee who had told either of us that he had a =93fiduciary responsibility=94 as a businessman or to his family to pay as little tax as possible, as Mr. Trump put it, would have been told to stop wasting the president=92s time.

While utf-8 has become the norm online, string primitives may choke on characters outside of its specification. Look for encoding transformation primitives that offer the option of ignoring these and apply them as needed.

text = File.read 'data.json', encoding: 'UTF-8'

Ruby has ignore options, but only apply them when a transformation is in fact required. One trick is to transform to utf-16 and back again.

def utf(text) unless text.nil? text.encode!('UTF-16','UTF-8', :invalid=>:replace, :replace=>'❖') text.encode!('UTF-8','UTF-16') end end