Grapheme-to-Phoneme (G2P) conversion is the process of translating written characters (e.g., "hello") into phonemes that indicate how to pronounce the word (e.g., HH AH L OW or HH EH L OW).
English has so many quirks that state-of-the-art performance has error rates as high as 20%–30%. See, for example, CMU Sphinx's g2p-seq2seq. I'm currently working on a small and embeddable G2P system.
An interesting application of G2P is finding words that have surprising pronunciations.
Just for fun, below is a list of the 250 words that my G2P system found most surprising.
Published 15 May 2019 by Benjamin Johnston.