TiddlyWiki for Data Analytics Data Science R

There have been a few posts here recently that give me the impression I’m not the only work with a day job in Analytics. Most recently @Zheng_Bangyou did a post about an R package which I’m trying to get working, and I think @talha131 mentioned something else R-ish related.

Just curious about if TiddlyWiki is part of the data analytics / data science workflow for anyone here. I’ve played with building data catalogs in TiddlyWiki for my team and other things like that with some success. Any other ideas?

@stobot I am not a professional in data Analytics but have a history of analysing and manipulating large or complex data and conversion between sources. I also have non-professional insights into big data.

Perhaps you can bring you skills and knowledge to us about data Analytics and its relationship to tiddlywiki.

  • People like myself will always be happy to help to find innovative ways to make use of data in tiddlywiki.

But a few observations, should they be of help.

  • The Free and open ways to manipulate, parse and reorganise data with TiddlyWiki may help.
  • Such use of TiddlyWiki may require a fairly broad and deep knowledge of tiddlywiki, although I find it easy to learn something new as I go.
  • Without media such as Audio/video and images it is amazing how much information can be stored in a few megabytes, but I imagine for a lot of business cases TiddlyWiki may be best used towards the end of the analytics process after the volume has being reduced to more manageable levels.
    • The ease of creating another wiki for a subset of data processing then using this to funnel into a final dashboard is helpful.
  • TiddlyWiki is empowering in so far as it can be used to capture in one tool the input, interface, tools, process and workflows of most “algorithms”, and to look at the very same data from different perspectives and using alternative “data models” eg; random, hierarchical, indexed, searchable etc…
  • TiddlyWikis power may be in the way you can present the result of the Data Analytics process in a way the “end user” can interact with the data. from dashboards to experts systems, drilling into data, and exploring the data.
    • One way to use it is to think of it as an interactive report.
  • I can’t recommend enough the value of the JSON Mangler Plugin which I keep in a utility wiki to first “inject” and prepare the data, I then move into another wiki.

Perhaps not suitable for very large data sets, you can automate a lot from the command line and node but I expect the need to fragment the data, even if you bring it back together later, results in diminishing returns.

  • As I understand it, a lot of algorithm’s used in “big data” do the same thing to divide and conquer across large data stores and multiple compute resources.
  • If the destination for the analysed data is humans, rather than driving automation, then I expect TiddlyWiki could excel at this.

I hope this helps, I expect knowing when not to use TiddlyWiki could be as important to understand as, when to use it.

I am reproducing some examples of R leaflet package using Rmarkdown to generate a new tiddler in markdown format which can be imported into TW.

A few example can be found here: tw-widgets — Examples for htmlwidgets from R

This is the source code of R markdown file

This method might be a way to conduct data analysis in R and manage all reports in TW.

3 Likes

Ah, TiddlyWiki as a place to store / organize the reports, that’s clever! Assuming part of that is sharing your report with others, what’s the mechanism (saving / hosting method) you’re using to share that with others?

Unfortunately for me, I’m trying to figure out why I can’t get rtiddlywiki installed at the moment…

image

Thanks for your feedback @TW_Tones. Actually I wasn’t thinking of doing much actual analytics within TiddlyWiki, but you’re right, for some small datasets and some of those charting add-ins it could be done.

One thing I’d love and thought about often is extending the datatiddler to multi-column situations, specifically allowing to dump a delimited dataset into a tiddler and having the ability to do multi-column filtering. I’ve seen some chatter about it in the forums here looking back a few months.

I’m well aware of all the objections to this - I’ve read comments from the core team that sound like even creating the datatiddler was maybe a mistake. I’m also well aware of, and largely agree with sticking to the “one data point per tiddler” philosophy. However, there are situations that tip the scales the other way.

One example that I use all the time is in my work-related wiki, it’s handy to have an “offline” company directory to look up people’s contact information etc. Well for me that’s more than 40,000 employees to house, and I don’t need it often enough warrant adding 40,000 tiddlers to my wiki. While I’m sure file size isn’t much different, overall performance change is very noticeable. I use either a single big tiddler and rely purely on regex text search (with appending special character strings to relevant columns to make them easy to find), or do a separate datatiddler for each column and then :intersect the results together so I can piece things back into a table. The filtering only possible using the keyvalues plugin from @pmario.

The obvious problem for me is that I don’t know javascript at all, and so it’s very slow going. For the simple things I’ve done with <<dateadd>> and [adddays[]] (link) they were so simple it didn’t take much understanding. I know there’s a lot of activity with the JSON Mangler plugin that you advertise heavily, but it seems like filtering capabilities are not quite there yet. Anyways, just thoughts based on what you mentioned.

2 Likes

I think a big part of that referred to dictionary tiddlers, but with the next release more support for JSON tiddlers and the fact that is how both plugins and the internal tiddler store now works JSON is the way to go, however csv?

  • I think we could extend the CSV view to support multi-column sort and filtering.
    • here I go again, With the JSON Mangler plugin installed and importing CSV makes a nice viewable table, I think CSV may be a good way to store large reference data sets as well.
  • This would be a good way to add your contact list. Then when you make use of or reference a contact create a contact tiddler. ie create on demand from the data set. With an ability to discover changes.

After playing with the new JSON operators in 5.2.4, it seems like I’m looking for a combination of jsonget and search-replace.

data=

{
"lukeskywalker": {"first":"Luke","last":"Skywalker"},
"anakinskywalker": {"first":"Anakin","last":"Skywalker"},
"obiwankenobi": {"first":"Obi-Wan","last":"Kenobi"},
"hansolo": {"first":"Han","last":"Solo"}
}

So what I can do is get the first name for Anakin Skywalker with:
[{data}jsonget[anakinskywalker],[first]] = Anakin

What I’d want to do is something like
[{data}jsonsearch:last[Skywalker],[first]] = Luke, Anakin

Or even more interesting
[{data}jsonsearchregex:last:[^Sky.*],[first]] = Luke, Anakin

I’ll put it on my someday / maybe list to see if I can figure it out… If anyone has already built something like this, certainly let me know!

In the past I may have searched each line splitregexp[\n] of the data tiddler, for the search string and obtain the key, then retrieve the record, but I am yet to play with the new operators.

That’s interesting. I did propose to implement JSON-PATH which is a similar syntax to XPATH that’s used to query XML … see: [Proposal] implement JSON-PATH into the core - may be as a plugin · Issue #7116 · Jermolene/TiddlyWiki5 · GitHub

The mentioned library has some interesting possibilities. … The test data they use is quite big. The small JSON dataset is 32MByte and the big one is 182MB and it still has a decent speed.

I think the usecase you describe is a bit different. You want to query the dataset in TW, but you want to prepare the data outside of TW.

data-tiddlers are problematic if they are modified with TW.

Just some thoughts.

1 Like