Yes, the idea is not to implement a spec-compliant web engine, “only” the wikitext parser. The thing is, as you say, that core (and probably plugins as well) assume there is a HTML/DOM object underneath. To what extent (and where in the code, and how much) do they assume it exists?
However if this is true:
excluding plugins, the UI and its internals should be increasingly easier to implement in another environment without HTML, JS, CSS
There is also this answer to help identify the different parts of the so-called “wikitext”: