Wouldnât you have to make one big string out of all the titles first?
Also, shouldnât you use the title field instead of the text field (which is the main text typed into the main field in the tiddler)?
Maybe the first part of your filter run could be like this:
[!is[system]get[title]join[ ]]
That seems to make a huge string with all the titles together. Then maybe you can count it. Or maybe you donât need to join it first before counting it. I donât know. Good luck.
Thank you very much, you can count the total number of words in all the headings, it seems that there is an illusion that I did not succeed in this attempt before, which is a little strange
I suspect what you actually want is a filter that will count space-separated words (English) + individual characters (Chinese). But I donât know the regex to find Chinese characters only, so someone else will have to help you with that.
Are you looking for the total number of words/characters used in all your titles, or the total number of unique words? If you care about unique words, youâll probably also want to strip out numbers and punctuation (so Tiddlywiki and Tiddlywiki: arenât considered different words).
Iâm afraid there may be some language barriers here.
When you say ânotesâ, do you mean âtiddlersâ? Or is this some subset of the tiddlers, perhaps those tagged âNoteâ
When you say âheadingsâ do you mean the âtitleâ field? Or are you talking about the (possibly generated) H1, H2, ⌠H7 tags, which might come from wikitext such as !! Heading Level Two, or !!!! Heading Level Four?
Finally, I know that breaking Chinese text into words is more complex than in English or other Latin alphabet languages. So I would expect that splitting on spaces is not enough, but itâs the only technique I know. So would this be acceptable?:
Thatâs a somewhat bizarre result. When I try that, I do not get the two andâs, just the expected eâs.
But if you want to split English strings (or those of any Latin-script language, I believe) into words, you can split on spaces, with split[ ]. (Note the empty space between the brackets.) That will have some flaws, for instance, including starting/ending punctuation with the words.. I donât think thereâs anything entirely perfect, capturing all possible strings, but this should be pretty good: splitregexp[\W+].
For instance
[[This--and that--for "those" 123 people who care?]splitregexp[\W+]]
yields
This
and
that
for
those
123
people
who
care
(Note, though, that this still has issues. daughter-in-law would become three distinct words.)
But this doesnât work with Chinese characters. I donât know Chinese, so I may be mistaken, but I donât think there is any simple rule to split a string of Chinese characters into distinct words.
I imagine thereâs a technique with regex to split a string into sections of Latin characters and Chinese ones. So if you have a string â words technique for Chinese, we might be able to break into language-grouped sections, applying the appropriate rule to each section and combining the results back into a single string. But I donât know such a technique.
Yes, but thatâs a big library, not a âsimple ruleâ. To do this in Tiddlywiki, we could import the JS port of jieba to do this work, but that would involve including an 11+ MB dictionary. This is not for the faint-of-heart.