Parsing html and other tags

Folks,

I know this has being covered before, but just can not locate it. 40+mins searching so far.

Someone published the way to parse a tiddler containing tags such as one or more <atag>something</atag> in a tiddler. It also took account of attributes such as <atag some=attribute>something</atag>.

Of course TiddlyWiki’s prolific use of “tags” is possibly making it difficult for me.

Any pointers appreciated.

For anyone interested in why?

  • I want to develop a method where you can use an arbitary html tag such as
<quicklist>
one
two
three
</quicklist>
  • The above can already have css applied to it, directly or with style/class etc…
    • An examples may included “display: none;” to hide or “hard line breaks” etc…
    • title=‘tooltip text’ also works on such areas.
  • I would like for example a view template to parse the tiddler and find the content between the one or more quicklist tags within
    • I could then parse each line and act on the content of each line as desired.
  • Perhaps one day the content of the tag / section could be edited and saved back like Mohammad’s section editor does with headings.
  • I believe even eventcatcher can respond to a click on such an area defined by an arbitrary tag.

[Edited] Easter egg for those interested in “arbitrary tags” or wrapping content
EditorToolbar-wrapper.json (5.4 KB) which is the subject of this thread.

That was me, possibly. I do a lot of that kinda stuff.

And yes, you can style custom tags. As for making them “do things”, that’s up to you. For me, I wrote a bunch of macros to filter them out and present their content in other ways in other places.

Example use:

Create what’s commonly referred to as a “tag cloud”. Hide the element itself, but add terms inside it that you might want the search tool to find (that you don’t necessarily mention in the tiddler).

<tag-cloud>
Interesting terms not necessarily mentioned in the text above
</tag-cloud>

Elsewhere…

tag-cloud { display:none; }

Another example:

<dull>Your text.</dull>

CSS:

dull { color:#aaa; }

@CodaCoder good ideas; perhaps we can list useful one in this thread, However can you share the methods you use to;

  • I don’t mind if they are a little raw, I just want to see how to extract one or more of these from within a tiddler.

Of note;

  • There already exists a range of html tags that you can consider first.

Ideas for arbitrary tags;

  • Apply css to anything inside the tag
    • Hide content
    • Emphasise/Deemphasise content
    • Alter font in block or inline
    • Use to introduce boarders, shadows and other styles to parts of your text
  • Treat content differently, hide it and use the content in the viewTemplate
    • eg Quick list with only line breaks separating todo items

I will add other ideas in this reply, and a stylesheet

  • Use as a logical section you can search all instances of in the wiki
    • Use as a way to delimit content to act as an excerpt

The HTML specification already knows Custom Elements. The spec contains a lot of info about
JavaScript code, but that’s not important for what I want to explain.

We should respect the rules from the spec, since otherwise we can cause naming problems

The tag names you suggest are not valid. IMO you should change that in the OP


There are some rules in the spec which are important:

  • Custom elements have to start with a character [a-z]
  • They have to be all-lowercase
  • They have to contain a hyphen (-) somewhere in the name
  • They have to have a closing tag. eg: <my-element> something </my-element>
  • Self-closing elements are not allowed

The following names are reserved, since they are used already in SVGs and MathML

  • annotation-xml
  • color-profile
  • font-face
  • font-face-src
  • font-face-uri
  • font-face-format
  • font-face-name
  • missing-glyph

These rules ensure, that “custom elements” don’t ever clash with any future HTML spec. Since there will be no new HTML, SVG or MathML elements that will contain hyphens in the name.

Custom elements are treated similar to SPAN elements by modern browsers. So if they should be treated as “block elements” a display: block; has to be assigned with a stylesheet.

As I wrote, it doesn’t matter if there is any JS code. The suggested elements in the OP are possible, since modern browsers treat them as custom elements.

You are brilliant at this kind of thing!
That was an epic overview that saves days of work!

I hope you are making money from that skill? :slight_smile:

TT

TBH @TW_Tones I tend to think the flex you looking for is already there in @pmario’s Custom Markup???

You could use it both for all extant elements and for …

<invented-elements>:slight_smile:</invented-elements>

Or maybe I am missing the point? If so, please correct me!

Just a comment
TT

Yes, hence this whole thread, I have long had my eye on what I have called “arbitrary html tags” but custom markup goes a lot further (and in fact could complement this), here is some of an example stylesheet I was just playing with;

section-title {
  background-color: #e6e6e6;
  width: 100%;
  display: block;
  text-align: left;
  font-size: 1.2em;
  border-style: solid 1px;
  padding: 0 .5em 0 .5em;
}

section-title-centered {
  background-color: #e6e6e6;
  width: 100%;
  display: block;
  text-align: center;
  font-size: 1.2em;
  border-style: solid 1px;
  padding: 0 .5em 0 .5em;
}

.thin ( 
   line-height: 0.2em;
 }

section-dim {
 color: #d9d9d9;
}

hidden { display: none; }

shadow-box {
  width: 96%;
  display: block;
  padding: 0 .5em 0 .5em;
  border: 1px solid;
  padding: 10px;
  box-shadow: 5px 10px #888888;
  margin: 0 5px 0 5px 0;
}

inset-box {
  width: 95%;
  display: block;
  padding: 0 .5em 0 .5em;
  border: 3px solid;
  border-color; black;
  border-style: inset;
  margin: 0 5px 0 5px 0;
}

outset-box {
  width: 95%;
  display: block;
  padding: 0 .5em 0 .5em;
  border: 3px solid;
  border-color; black;
  border-style: outset;
  margin: 0 5px 0 5px 0;
}

lhs {
  float: left;
  width: 49%;
}

rhs {
  float: right;
  width: 49%;
}

rhs-aside {
  float: right;
  width: 33%;
}

// need to clear float https://www.w3schools.com/css/css_float_clear.asp

no-wrap {
  word-break: normal; 
  word-wrap: break-word;
  white-space: pre-wrap;
}

However this css/html element css is not half of the issue, without wishing to harp on it, the key to this full idea is to parse and extract such sections from the text.

Parsing html and other tags

For example, lets say I have this;

quick-list {
  word-break: normal; 
  word-wrap: break-word;
  white-space: pre-wrap;
}
  • now I want a viewTemplate addon to find each
<quick-list>
Item 1
item 2
item 3
</quick-list>

Incidentally except for hidden and rhs/lhs I am complying with the advice that @pmario stated. Thanks Mario yes I suppose I can just make quicklist into quick-list, and dull into dull-section etc … :nerd_face:

As I shared earlier I expect to make a separate version of EditorToolbar-wrapper.json to separately curate these.

  • Perhaps even to automatically find them in a named stylesheet tiddler, so you add a new html tag to the stylesheet and it appears in a custom wrapper dropdown.
  • By the way an honorable mention needs to go to @Mohammad’s Shiraz and other element options. In fact I was looking in his work because I thought he had documented the parsing process, which I had contributed to in the past, I just can’t find it. arghhh…
2 Likes

If you remember I used find macro to do this!
Utility plugin and Refnotes both use find macro ( kookma/find-macro: The find macro finds part of text separated by delimiters. This is a powerful macro to extract text snippets. (github.com))

I have also tried to use the method by @stobot and @EricShulman (discussed here Crazy todo concept / replacing current tiddler’s text - Discussion - Talk TW (tiddlywiki.org))

But the more powerful I think is what proposed by @Gk0Wk here: Feature suggestion: use custom filter to split sections · Discussion #7 · kookma/TW-Section (github.com)
This method later was partly implemented in TW-Section dev branch and is very capable to parse any html tag. It is a JS code and due to lack of time, it has been frozen there.

Gk0Wk (Ke Wang) (github.com) has developed amazing tools for Tiddlywiki, he can have good advices/solutions here. We know him here as @Sttot :wink:

Thanks @Mohammad , I am working through these resources now;

I am interested in the custom filter to split sections but finding hard to see if I can make use of a working version and make use of arbitrary section tags

  • Also that may include attributes in the opening tag.

[Edited] I have the find macro working for me on simple sections without any attributes in the opening tag. I am not sure the find macro would be easily extended for this. It seems the regex approach is needed, but sadly I am only learning regex.

FYI:

I found a good looking resource on custom elements here, but do not yet understand the whole picture.

I have realised however once we have a way to parse arbitrary html tags, able to accept attributes and even extract the “attributes string” in the opening tag we can do quite a lot.

  • Arguably we can add arbitrary/custom attributes as well which could allow us to pass a template name, some variables with values and more that the template can use

eg

<todolist tw-tempate=tiddlername tw-user="user name" tw-domain=personal>
item 1
Item 2
Item three
</todolist>

As I said before when you combine it with an easy wrapper tool, or an asymmetric wrapper tool so your wrapper can include attributes then such “extension’s” would be easy to use and read.

Thank you! I removed the wrong link.

The lines inside quicklist are not HTML elements, so you can’t really “find” them individually. They are stored in a single text node as one string. You’d have to either put them inside HTML elements or declare that the contents of quicklist is a non-HTML markup language with newline as a delimiter and so on. That could end up in a mess when you want to add attributes to different lines and so on.

I’m not sure why you don’t use a TiddlyWiki widget or macro or the existing <ul> tag.

I’m not really following along here, but don’t use regular expressions to parse HTML. It will work fine when you’re testing it in isolation, but be riddled with bugs for edge cases in the wild.

In my Markdown export plugin, I’m using this code to let TiddlyWiki parse the WikiText to a tree of nodes (translated from Typescript to Javascript; it’s possible I did something wrong). It could probably be simplified for your use case.

/** Let TiddlyWiki parse the tiddler text and build a widget tree */
function renderWidgetTree(title) {
    // Imports built-in macros and custom macros in the tiddler, including the $:/tags/Macro/View tag
    var macroImport = "[[$:/core/ui/PageMacros]] [all[shadows+tiddlers]tag[$:/tags/Macro]!has[draft.of]] [all[shadows+tiddlers]tag[$:/tags/Macro/View]!has[draft.of]]";

    var widgetOptions = {
        document: $tw.fakeDocument,
        mode: "block",
        importVariables: macroImport,
        recursionMarker: "yes",
        variables: {
            currentTiddler: title
        }
    };

    var widgetNode = $tw.wiki.makeTranscludeWidget(title, widgetOptions);
    var container = $tw.fakeDocument.createElement("div");
    widgetNode.render(container, null);
    // Get the first-level nodes in the tree
    return container.children[0].children;
}

You could then traverse the returned tree to find any custom HTML element and text nodes. Maybe not as easy as you had hoped, and possibly the answer to another question, but anyway… :slight_smile:

Here’s a simpler way to parse only HTML using Javascript. It won’t parse macros or widgets or even WikiText.

var markup = $tw.wiki.getTiddler("Quicklist Test").fields.text;
var doc = new DOMParser().parseFromString(markup, "text/html");
var quickLists = doc.querySelectorAll("quicklist");
for (var i = 0; i < quickLists.length; i++) {
    console.log(quickLists[i].attributes);
    console.log(quickLists[i].classList);
    console.log(quickLists[i].firstChild);
}

They can be what ever you want, the designer can decide what they do with the result. In this case even the tag is arguably “not HTML” but you can use CSS to style it as in my examples. As you may be aware TiddlyWiki is great at handling pieces of text, normally this is the whole text field which is wikified. All I want to do is provide access to things so “wrapped”.

  • Ultimately I would like to also handle attributes and even nesting.

Your script looks interesting but it needs to accept parameters and return the content, and possibly the attributes on the opening tag.

  • Either I/we make use of CSS applied to the content
  • Or we do some other post processing
    • Which can include taking attribute value pairs as additional parameters.

I am not more than a script kiddy so far, so if you could make you code into a macro It would help, Perhaps one day I will do so.

  • However my request is a little more fundamental, build the tools so we can build more than just html and also make use of arbitrary tags and or arbitrary attribute variable pairs.

I am not sure there is any viable alternative here. But happy to see them.

  • Actually this is very stable regex I need to build in, so I do not think it is fragile.

Thanks for the feedback @cdaven

To bump this subject for which I need help, let us imagin a widget “extract-tag”

<$extract-tag tag=quicklist variable=list-content attributes=attrib-var>
   <<list-content>> contains everything between `<quicklist and </quicklist>`
   <<attrib-var>> contains everything between `<quicklist` and `>`
  Operating similarly to the list widget such as if contains a blank line uses a default template, its possible to provide a template=...
</$extract>
  • This example has a default tiddler=<<currentTiddler>> and default field=text
  • This example would search the current tiddlers text field and extract each occurrences of items inside the tag “quicklist”.
  • If one wanted to processes nested tags one would have to use a nested “extract-tag”.
  • The rest of the work is left to the designer eg extracting the attribute myclass and applying it to content through a span or div.
  • There would be value allowing an input-variable=varname if the content you wish to parse is within a variable and not in a tiddler/field. However this can possibly be introduced via filter as the current list does.

I am about to revisit all the references shared in this thread, but this is approaching the end product I am looking for.

  • With this tool in my hands I have a wide range of interesting solutions I can develop.