XHTML (it's media type/mime is application/xhtml+xml) was intended to clean up the mesh that was and still is regular HTML and HTML pages served with XHTML doctypes but still incorrectly served with the media type/mime text/html. The major obstacle to XHTML was Microsoft's decade long delay to supporting it and the fact that most people who write code simply either don't have the time or the enthusiasm to living up to higher standards. XHTML2 and (X)HTML5 diverged when several groups (mainly browser vendors) disagreed with the direction the W3C was taking XHTML2.
You can read more at Wikipedia though you'll not want to reference it for your paper though you may want to consider reading the pages
it references for this or any other related articles...
http://en.wikipedia.org/wiki/WHATWG