Is there any spyware javascript in empty Tiddlywiki that sends statistics to developers? And how can I disable it?

Hi! I live in Russia. I use Tiddlywiki for making small sites and share my TW5 templates for people.
Last week I see a very strange statistic for templates people used (this statistic is private).


It looks like robot from VPN opens one by one all sites (and its #pages), that was created on Tiddlywiki templates many years ago. And I don`t know where this robot takes data about these sites. Practically all of them are not in index of Google or any other search systems.

There used to be Google analytics. But that was removed ages ago. What version are those sites running?

GA is a plugin, that has been part of tw-com some time ago but has been removed as GDPR became a thing.

Empty.html by default only contains the $:/core plugin.

As far as I know, there is no code in the core source that does send any info to any developer site.

TW code is public and can be reviewed at: https://github.com/Jermolene/TiddlyWiki5/tree/master/core

Just to be very, very clear:

  1. The empty TiddlyWiki at https://tiddlywiki.com/empty.html has never and will never contain any tracking code
  2. The documentation TiddlyWiki at https://tiddlywiki.com/index.html included a Google Analytics beacon until 2018

The core plugin library includes a Google Analytics plugin that has been updated to the latest v4, and also a Consent Plugin which presents a β€œcookie banner”, and integrates with the Google Analytics plugin to ensure that data is not sent to Google unless the user consents.

All of this is very, very important to me. It’s also the reason that TiddlyWiki doesn’t include a popup telling the user that a new version is available; polling tiddlywiki.com would necessarily alert others on the same network that somebody was using TiddlyWiki. So we are very, very conservative, and don’t do anything by default that could conceivably result in users being unknowingly tracked.

2 Likes

Hi @Siniy-Kit I wanted to check that I understand exactly what has happened.

It seems that via your visitor logs you are seeing an unexpected spike of visits to a site from a robot when you expect the site to not be crawled by robots. Is that correct?

This appears to me what you may expect from a search engine which you can ask not to index the site. If there were links to the templates found anywhere else on the internet it could have led it there.

Every site I have put up has expierenced the same kind of activity from all over the world and the only way to stop it is to put the content behind a login.

A deeper analysis could possibly be undertaken.

I read forums. Many different sites have problems with this ip6 bots. But I dont understand, for what they open these old β€œgarbage” sites many times every day. Where they take these sites URLs? And how they can navigate by # ?


It can be a problem of free hostings

1 Like

My guess is that they are doing something Google has done for many years: crawling the web with a headless browser that is capable of executing JavaScript. The motivation for them going to that much trouble is that so much of the content that a bad guy might want to crawl is buried in JS-only websites.

I run a website with a large number of visitors and I use bespoke hand written analytics which I can adapt to suit a particular emergency or attack, I always kept a very careful eye on bots.

I mean this nicely but I think it is very naive to think that bots will not visit a particular page on your site, you only need one person to reveal the URL of a page and a bot may end up trying to access it.

Expect bots to try to visit every page that your server offers - the only choice you have is what you serve an un-authorised user if they try to visit a page you consider private - do you serve them the real content or an β€œAccess denied” page instead.

There was a time in my experience when you could expect a page to remain β€œquiet” simply because only a few people knew about it and you did not link to it anywhere but I think with social media, Google, the Chrome browser your β€˜privacy’ becomes very leaky very quickly as soon as you tell one other person about your page.

You may have been lulled into a false sense of security because a particular page was quiet for a long time but again I think bots can ignore a page for ages and then all of a sudden they are all over it.

You may also be experiencing spam/scam bots which are often naively written and will continue to try and load the same page multiple times, these are the sort of bots which try and inject SQL and so on and probe for vulnerabilities - I often look at the injected code they are trying to use and it’s often hopelessly out of date and assumes people are still using vulnerable versions of mySQL which makes me think that there are sufficient numbers of unprotected devices out there to ensure that old viruses don’t die out.

They are like flies - nothing for months and then all of a sudden you experience what looks like a denial of service attack but actually on examination it looks more and more like a very naively written virus on a remote device somewhere that simply does not have the imagination or knowledge to break out of cyclic loop of links, it’s literally banging its head on the door repeatedly. You would think it would get bored and move on but no it just keeps on knocking.

I learnt a lot from writing and adapting my own analytics code.

1 Like