Convert code attributes earlier
What does this MR do and why?
Related to #385036 (closed)
Currently our markdown parser passes the language for code blocks as, for example, <pre lang="ruby"
. This is due to a special flag called GITHUB_PRE_LANG
, and is not usually supported in other potential markdown parsers. In addition, the use of the lang
attribute is not semantically correct, see discussion in #385036 (closed).
However we rely on that attribute to find certain entities in the DOM, such as math blocks, etc.
This MR refactors the language parsing out of the syntax filter and into its own filter at the beginning of the pipeline. This way we can set our own attribute, data-canonical-lang
and use that for searching the DOM.
In order to break up and isolate changes, this is the first part, changing how the handling is done on the backend. In the syntax highlighting filter, we add back the lang
field (which was being done anyway), in order for any frontend code to continue to work.
A future MR will tackle that piece.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.