Parsing a Wikipedia XML file of all articles to lots of raw txt files, and remove most of wiki markup (not perfect: see issues first). For more info on wiki markup, see: https://en.wikipedia.org/wiki/Wikipedia:Tutorial/Formatting#Wiki_markup -
View it on GitHub