Fun Challenge: How would I count the number of characters/words in a TW?

@jeremyruston has a minimalist writing style. All to the good. Yet tiddlywiki.com is a serious wodge of texts.

The challenge …

  • How would I in TW count how many characters/words he has written?

That would be for only words/characters visible, not for code.

Just a fun idea fo people who like that kind of thing :smiley:

TT

That’s a solved problem – I use @telmiger 's EditorCounter plugin (sorry, I don’t have a link, but I’m sure he’ll be along soon enough).

Ciao @CodaCoder, @telmiger’s excellent Counter is, unfortunately, for single Tiddlers.

The challenge is to get the character & word totals for the entire tiddlywiki.com.

A dopo,
TT

OIC       

:face_with_monocle:

1 Like

Around 163357 words :

{{{ [!is[system]get[text]split[ ]splitregexp[\n]!is[blank]count[]] }}}

This filter count the number of words based on spaces, so it’s not perfect (and to be more accurate I should use a wikify widget as well but I dont want to crash my browser…) + it only count for words in the text field.

I wasnt able to calculate the number of char : too many arguments ! You can try it yourself if you want (filter taken from PhantomYdn)

{{{ [!is[system]get[text]split[ ]splitregexp[(.)]!is[blank]count[]] }}}

4 Likes

Whoah, that is great! Very interesting. I does not matter if it is not 100% correct. It is indicative in exactly the right way. 163K words is a lot already! I wonder if @jeremyruston himself grasps how much he has written? 163K is a lot for a minimalist! :smiley:

Thankyou for having the interest to do that! Fun & indirectly informative.

Best, TT

As a point of comparison, I copied all the text from https://tiddlywiki.com/alltiddlers.html and pasted it into a blank word processing document, which reported a total of 85,148 words.

2 Likes

Ha, very interesting! It is a kinda obscure thing but I think it is still intellectually interesting. TBH, I am actually quite taken by your writing style which I think is unusually compact.

It be would great if in TW we could natively, occasionally, get such figures?

TT

Oh wow that’s an enormous margin of error ((163357-85148)/85148 * 100 = 92% !)) … I definitively need to improve that filter :sweat_smile:

1 Like

@jeremyruston your point of reference might contain many double counts in form of transcluded content. The OT asks for written (as opposed to displayed) words and characters. How much difference might there be?

I suspect that the estimate from the word processor would be fairly conservative in terms of not counting punctuation as words, and that that would be significant given the large amount of non-prose content. Perhaps also interesting to experiment with the concatenated source of all tiddlers too.

1 Like

Personally I think most code tiddlers would not really be counting true words. Only textual tiddlers are relevant in many ways. There is a way to use splitegexp and a w parameter to retrieve words that is more effective because it recognises new lines, punctuation and spaces etc…

IMO the formula for relations is:

163357 relates to 100% in the same way as 85148 relates to "x"

In math terms 163357 : 100 = 85148 : x
which resolves to 163357 * x = 85148 * 100.
If we resolve for x we get x = 85148/163357 * 100 = ~ 52%

That’s still a big difference, which would be interesting to explore.

1 Like

Hm I thought the formula for % of error was

Percent error = ((Approximate or experimental Value - Exact or known Value) / Exact or known Value)∗100

Absolute error = | Estimate - Exact |
Then relative error = Absolute Error/ |Exact| = | Estimate - Exact / Exact |
Thus the percent error is 100% x | Estimate - Exact / Exact |

I verified with an online tool and it seems correct :
https://www.calculator.net/percent-error-calculator.html?observedvalue=163357&truevalue=85148&x=42&y=10

I think the error come from all the punctuation symbols incorrectly counted + tiddlers with code I shouldn’t count. Wikify should help with that but it’s too much text, maybe it would be possible with several “loops”…?

@TW_Tones \w does not account for accentuated words but in this specific case since the words are in English this shouldn’t be an issue I guess…

2 Likes