Fun Challenge: How would I count the number of characters/words in a TW?

TiddlyTitch · January 6, 2022, 12:14pm

@jeremyruston has a minimalist writing style. All to the good. Yet tiddlywiki.com is a serious wodge of texts.

The challenge …

How would I in TW count how many characters/words he has written?

That would be for only words/characters visible, not for code.

Just a fun idea fo people who like that kind of thing

TT

CodaCoder · January 6, 2022, 1:44pm

That’s a solved problem – I use @telmiger 's EditorCounter plugin (sorry, I don’t have a link, but I’m sure he’ll be along soon enough).

TiddlyTitch · January 6, 2022, 2:57pm

Ciao @CodaCoder, @telmiger’s excellent Counter is, unfortunately, for single Tiddlers.

The challenge is to get the character & word totals for the entire tiddlywiki.com.

A dopo,
TT

CodaCoder · January 6, 2022, 3:31pm

OIC

telumire · January 6, 2022, 4:16pm

Around 163357 words :

{{{ [!is[system]get[text]split[ ]splitregexp[\n]!is[blank]count[]] }}}

This filter count the number of words based on spaces, so it’s not perfect (and to be more accurate I should use a wikify widget as well but I dont want to crash my browser…) + it only count for words in the text field.

I wasnt able to calculate the number of char : too many arguments ! You can try it yourself if you want (filter taken from PhantomYdn)

{{{ [!is[system]get[text]split[ ]splitregexp[(.)]!is[blank]count[]] }}}

TiddlyTitch · January 6, 2022, 4:45pm

Whoah, that is great! Very interesting. I does not matter if it is not 100% correct. It is indicative in exactly the right way. 163K words is a lot already! I wonder if @jeremyruston himself grasps how much he has written? 163K is a lot for a minimalist!

Thankyou for having the interest to do that! Fun & indirectly informative.

Best, TT

jeremyruston · January 6, 2022, 5:28pm

As a point of comparison, I copied all the text from https://tiddlywiki.com/alltiddlers.html and pasted it into a blank word processing document, which reported a total of 85,148 words.

TiddlyTitch · January 6, 2022, 6:36pm

Ha, very interesting! It is a kinda obscure thing but I think it is still intellectually interesting. TBH, I am actually quite taken by your writing style which I think is unusually compact.

It be would great if in TW we could natively, occasionally, get such figures?

TT

telumire · January 6, 2022, 6:37pm

Oh wow that’s an enormous margin of error ((163357-85148)/85148 * 100 = 92% !)) … I definitively need to improve that filter

telmiger · January 6, 2022, 9:40pm

@jeremyruston your point of reference might contain many double counts in form of transcluded content. The OT asks for written (as opposed to displayed) words and characters. How much difference might there be?

jeremyruston · January 6, 2022, 10:51pm

I suspect that the estimate from the word processor would be fairly conservative in terms of not counting punctuation as words, and that that would be significant given the large amount of non-prose content. Perhaps also interesting to experiment with the concatenated source of all tiddlers too.

TW_Tones · January 7, 2022, 7:48am

Personally I think most code tiddlers would not really be counting true words. Only textual tiddlers are relevant in many ways. There is a way to use splitegexp and a w parameter to retrieve words that is more effective because it recognises new lines, punctuation and spaces etc…

pmario · January 7, 2022, 10:52am

IMO the formula for relations is:

163357 relates to 100% in the same way as 85148 relates to "x"

In math terms 163357 : 100 = 85148 : x
which resolves to 163357 * x = 85148 * 100.
If we resolve for x we get x = 85148/163357 * 100 = ~ 52%

That’s still a big difference, which would be interesting to explore.

telumire · January 7, 2022, 12:18pm

Hm I thought the formula for % of error was

Percent error = ((Approximate or experimental Value - Exact or known Value) / Exact or known Value)∗100

I verified with an online tool and it seems correct :
https://www.calculator.net/percent-error-calculator.html?observedvalue=163357&truevalue=85148&x=42&y=10

I think the error come from all the punctuation symbols incorrectly counted + tiddlers with code I shouldn’t count. Wikify should help with that but it’s too much text, maybe it would be possible with several “loops”…?

@TW_Tones \w does not account for accentuated words but in this specific case since the words are in English this shouldn’t be an issue I guess…