Right now Geograph uses a basic charset (Latin1) for storing/processing text, this mostly works, but has limited support for accented characters, ligatures and other special characters, meaning their display is intermittent (sometimes works, sometimes get corrupted)
... in theory it should be possible to convert EVERYTHING (website frontend, database, custom code, search engine system etc) all over to use UTF-8/Unicode for maximum compatibility.
(there is perhaps three major components to applying this 1) changing all underlying systems to correctly store utf8, 2) testing/fixing all systems to correctly process (not currupt it!) and importantly 3) cycle though all the already currupted data, and fix it!)
This is a big job with lots of interrelated systems to change and test for compatibility, so ideally could done a specialized project.
Created: Mon, 21 Mar 2016, Updated: Fri, 10 Feb 2017