%% This is part of the OpTeX project, see http://petr.olsak.net/optex \_codedecl \hisyntax {Syntax highlighting of verbatim listings <2022-04-04>} % preloaded in format \_doc ----------------------------- The macros `\replfromto` and `\replthis` manipulate the verbatim text that is already stored in the `\_tmpb` macro. \`\replfromto` `{}{}{}` finds the first occurrence of `` and the first occurrence of `` following it. The `` between them is packed into `#1` and available to `` which ultimately replaces ``. `\replfromto` continues by finding next ``, then, next `` repeatedly over the whole verbatim text. If the verbatim text ends with opening `` but has no closing ``, then `` is appended to the verbatim text automatically and the last part of the verbatim text is replaced too. The first two parameters are expanded before use of `\replfromto`. You can use `\csstring\%` or something else here. \_cod ----------------------------- \_def\_replfromto #1#2{\_edef\_tmpa{{#1}{#2}}\_ea\_replfromtoE\_tmpa} \_def\_replfromtoE#1#2#3{% #1=from #2=to #3=replacement \_def\_replfrom##1#1##2{\_addto\_tmpb{##1}% \_ifx\_fin##2\_ea\_replstop \_else \_afterfi{\_replto##2}\_fi}% \_def\_replto##1#2##2{% \_ifx\_fin##2\_afterfi{\_replfin##1}\_else \_addto\_tmpb{#3}% \_afterfi{\_replfrom##2}\_fi}% \_def\_replfin##1#1\_fin{\_addto\_tmpb{#3}\_replstop}% \_edef\_tmpb{\_ea}\_ea\_replfrom\_tmpb#1\_fin#2\_fin\_fin\_relax } \_def\_replstop#1\_fin\_relax{} \_def\_finrepl{} \_doc ----------------------------- The \`\replthis` `{}{}` replaces each `` by ``. Both parameters of `\replthis` are expanded first. \_cod ----------------------------- \_def\_replthis#1#2{\_edef\_tmpa{{#1}{#2}}\_ea\_replstring\_ea\_tmpb \_tmpa} \_public \replfromto \replthis ; \_doc ----------------------------- The patterns ``, `` and `` are not found when they are hidden in braces `{...}`. E.g. \begtt \replfromto{/*}{*/}{\x C{/*#1/*}} \endtt replaces all C comments by `\x C{...}`. The patterns inside `{...}` are not used by next usage of `\replfromto` or `\replthis` macros. The \`\_xscan` macro replaces occurrences of `\x` by `\z` in the post-processing phase. The construct `\x {}` expands to `\_xscan {}^^J^`. If `#3` is `\_fin` then it signals that something wrong happens, the `` was not terminated by legal `` when `\replfromto` did work. We must to fix this by using the \`\_xscanR` macro. \_cod ----------------------------- \_def\_xscan#1#2^^J#3{\_ifx\_fin#3 \_ea\_xscanR\_fi \z{#1}{#2}% \_ifx^#3\_else ^^J\_afterfi{\_xscan{#1}#3}\_fi} \_def\_xscanR#1\_fi#2^{^^J} \_doc ----------------------------- The \`\hicolor` ` ` defines `\_z:{}` as `{}`. It should be used in the context of `\x {}` macros. \_cod ----------------------------- \_def\_hicolor #1#2{\_sdef{_z:#1}##1{{#2##1}}} \_doc ----------------------------- \`\hisyntax``{}` re-defines default \^`\_prepareverbdata```, but in order to do it does more things: It saves `` to `\_tmpb`, appends `\n` around spaces and `^^J` characters in pre-processing phase, opens `hisyntax-.opm` file if `\_hisyntax` is not defined. Then `\_the\_hisyntax` is processed. Finally, the post-processing phase is realized by setting appropriate values to the `\x` and `\y` macros and doing `\_edef\_tmpb{\_tmpb}`. \_cod ----------------------------- \_def\_hisyntax#1{\_def\_prepareverbdata##1##2{% \_let\n=\_relax \_let\b=\_relax \_def\t{\n\_noexpand\t\n}\_let\_start=\_relax \_adef{ }{\n\_noexpand\ \n}\_edef\_tmpb{\_start^^J##2\_fin}% \_replthis{^^J}{\n^^J\b\n}\_replthis{\b\n\_fin}{\_fin}% \_let\x=\_relax \_let\y=\_relax \_let\z=\_relax \_let\t=\_relax \_hicomments % keeps comments declared by \commentchars \_endlinechar=`\^^M \_lowercase{\_def\_tmpa{#1}}% \_ifcsname _hialias:\_tmpa\_endcsname \_edef\_tmpa{\_cs{_hialias:\_tmpa}}\_fi \_ifx\_tmpa\_empty \_else \_unless \_ifcsname _hisyntax\_tmpa\_endcsname \_isfile{hisyntax-\_tmpa.opm}\_iftrue \_opinput {hisyntax-\_tmpa.opm} \_fi\_fi \_ifcsname _hisyntax\_tmpa\_endcsname \_ifcsname hicolors\_tmpa\_endcsname \_cs{_hicolors\_tmpa}=\_cs{hicolors\_tmpa}% \_fi \_ea\_the \_csname _hisyntax\_tmpa\_endcsname % \_the\_hisyntax \_the\_hicolors % colors which have precedece \_else\_opwarning{Syntax "\_tmpa" undeclared (no file hisyntax-\_tmpa.opm)} \_fi\_fi \_replthis{\_start\n^^J}{}\_replthis{^^J\_fin}{^^J}% \_def\n{}\_def\b{}\_adef{ }{\_dsp}% \_bgroup \_lccode`\~=`\ \_lowercase{\_egroup\_def\ {\_noexpand~}}% \_def\w####1{####1}\_def\x####1####2{\_xscan{####1}####2^^J^}% \_def\y####1{\_ea \_noexpand \_csname ####1\_endcsname}% \_edef\_tmpb{\_tmpb}% \_def\z####1{\_cs{_z:####1}}% \_def\t{\_hskip \_dimexpr\_tabspaces em/2\_relax}% \_localcolor }} \_public \hisyntax \hicolor ; \_doc ----------------------------- Aliases for languages can be declared like this. When `\hisyntax{xml}` is used then this is the same as `\hisyntax{html}`. \_cod ----------------------------- \_sdef{_hialias:xml}{html} \_sdef{_hialias:json}{c} \_endcode % ------------------------------------------- The user can write \begtt \adef/{\_csstring\\} \begtt \hisyntax{C} ... /endtt \endtt to colorize the code using C syntax. The user can also write `\everytt={\hisyntax{C}}` to have all verbatim listings colorized. \^`\hisyntax``{}` reads the file `hisyntax-.opm` where the colorization is declared. The parameter `` is case insensitive and the file name must include it in lowercase letters. For example, the file `hisyntax-c.opm` looks like this: \printdoc hisyntax-c.opm \OpTeX/ provides `hisyntax-{c,lua,python,tex,html}.opm` files. You can take inspiration from these files and declare more languages. Users can re-declare default colors by \^`\hicolors``={}`. This value has precedence over `\_hicolors` values declared in the `hicolors-.opm` file. For example \^`\hicolors``={`\^`\hicolor`` S \Brown}` causes all strings in brown color. Another way to set non-default colors is to declare `\newtoks\hicolors` (without the `_` prefix) and set the color palette there. It has precedence before `\_hicolors` (with the `_` prefix) declared in the `hicolors-.opm` file. You must re-declare all colors used in the corresponding `hisyntax-.opm` file. \secl4 Notes for hi-syntax macro writers The file `hisyntax-.opm` is read only once and in a \TeX/ group. If there are definitions then they must be declared as global. The file `hisyntax-.opm` must (globally) declare `\_hisyntax` token list where the action over verbatim text is declared typically by using the `\replfromto` or `\replthis` macros. The verbatim text is prepared by the {\em pre-processing phase}, then `\_hisyntax` is applied and then the {\em post-processing phase} does final corrections. Finally, the verbatim text is printed line by line. The pre-processing phase does: \begitems * Each space is replaced by {\visiblesp`\n\ \n`}, so `\n\n` is the pattern for matching whole words (no subwords). The `\n` control sequence is removed in the post-processing phase. * Each end of line is represented by `\n^^J\n`. * The `\_start` control sequence is added before the verbatim text and the `\_end` control sequence is appended to the end of the verbatim text. Both are removed in the post-processing phase. \enditems Special macros are working only in a group when processing the verbatim text. \begitems * `\n` represents nothing but it should be used as a boundary of words as mentioned above. * `\t` represents a tabulator. It is prepared as `\n\t\n` because it can be at the boundary word boundary. * `\x {}` can be used as replacing text. Consider the example \begtt \replfromto{/*}{*/}{\x C{/*#1*/}} \endtt This replaces all C comments `/*...*/` by `\x C{/*...*/}`. But C comments may span multiple lines, i.e. the `^^J` should be inside it. The macro `\x {}` is replaced by one or more occurrences of `\z {}` in the post-processing phase, each parameter `` of `\z` is from from a single line. Parameters not crossing line boundary are represented by `\x C{}` and replaced by `\z C{}` without any change. But: \begtt \catcode`\<=13 \x C{^^J^^J} \endtt is replaced by \begtt \catcode`\<=13 \z C{}^^J\z C{}^^J\z C{} \endtt `\z {}` is expanded to `\_z:{}` and if `\hicolor ` is declared then `\_z:{}` expands to `{}`. So, required color is activated for each line separately (e.g. for C comments spanning multiple lines). * `\y {}` is replaced by `\` in the post-processing phase. It should be used for macros without a parameters. You cannot use unprotected macros as replacement text before the post-processing phase, because the post-processing phase is based on the expansion of the whole verbatim text. \enditems \endinput 2022-05-12 \hicolors can include only selected re-declarations. 2020-04-04 Introduced.