Searching/indexing web site/page

Not specifically a TW topic but this group seem populated with intelligent people with wide experience.

I am reviewing the use of TW for my Central Street Archive TW app (Central Street Archive). My proposed solution, of a tiddler per document populated with tags for each named person, place, organisation and exhibition, is too onerous for me to complete in any reasonable timeframe.

So I am looking at a web site of searchable PDF’s covered by a search facility. Now this is the issue, I can not find a web site search facility (preferably free). I dont just want to have Google/Bing/etc index the site as then search results fro this document set get interspersed with results from other web sites/pages.

So, can anyone point me to a site search facility that I could implement.

thanks in advance

bobj

I have found the Google Search Console as a way of limiting searches to my web site and am trying that to see if it works for me. If so, then I assume I’ll just have to add a front end to implement Google searches.

If you want to search your wiki in Google, you can try

  1. Search your nodejs wiki in Google if you use nodejs based wiki
  2. Use the Site Search Operator if it is indexed by google

In addition to lineonetwo’s comments. Google has in the past provided a way to embed a search limited to your own site;

  • You may need to use the $:/tags/RawMarkupWikified/TopBody to define a script, then use a div to display the search box.

I will see if I can share an example tomorrow.

I don’t know your skills (do you program?) or your data, but if you already have this data in another reasonable format, then you might find my workflow useful: For several large wikis, I write simple JS modules to convert the old format to my desired tiddler format, and save the results as .tid files inside a Node wiki, under a specific folder either in tiddlers or plugins. As I build up those modules, I keep overwriting these folders, until I’m happy with the results. While I do this in JS/Node, if you’re a programmer, you should be able to do something similar in nearly any programming language.

I used to see a lot of this, although it seems to be disappearing. One place I still see it is at Search - FactCheck.org. They might serve as a useful model.

Scott,

thanks for your thoughts. I have been programming for over 40 years but the problem is that I need to read each document to extract proper nouns. They are not i any other format. This takes time and time and time and I am hoping that I can short circuit things by providing searchable PDF’s.

I have converted all PDF’s so far (165 of them) to searchable format using PDF2Go.com, have registered the subdomain with Google and now will try and play with searching their content.

bobj

4 posts were merged into an existing topic: Identifying proper nouns

Scott,

could you expand on your notion of developing .tid files as this sounds intriguing.

I assume once we have a tid file we can merely import it into a TW and it would create the tiddler structure, tags and all. Would this also update an existing tiddler so that if the skeleton was already in the TW, this would add missing content?

Can you provide an example of such a tid file?

bobj

It’s about bedtime for me, and I’ll try to write more tomorrow, but very briefly, .tid files are how a wiki is stored when running on Node. You’re right that they can easily be imported into any wiki, but in Node, these files are the tiddlers.

For instance, this tiddler: https://tiddlywiki.com/#Drag%20and%20Drop

Is stored directly in this format:

created: 20170328143119836
modified: 20170328173846754
tags: Features
title: Drag and Drop
type: text/vnd.tiddlywiki

~TiddlyWiki uses drag and drop to power two separate features:

* [[Importing Tiddlers]] into ~TiddlyWiki 
* Manipulating tiddlers within a ~TiddlyWiki 

Tiddler manipulation via drag and drop is supported by the core user interface in the following contexts:

* Entries in the "Open" tab of the sidebar can be reordered by drag and drop; new tiddlers can be opened by dragging their titles into the list
* Entries within a tag pill dropdown can be reordered by drag and drop; new tiddlers can be assigned the tag by dragging their titles into the list
* Entries in the [[control panel|$:/ControlPanel]] "Appearance"/"Toolbars" tab can be reordered by drag and drop. (Less usefully, new entries can be added to the toolbars by dragging their titles into the list)

All tiddler links are draggable by default. They can be dragged within a browser window for manipulating tiddlers, or dragged to a different browser window to initiate an [[import operation|Importing Tiddlers]]

If you want to drag a link, first move it vertically, because horizontal movement is recognized by the browser as text selection.

Tag pills are also draggable, and are equivalent to simultaneously dragging all of the individual tiddlers carrying the tag.

Some common scenarios for drag and drop tiddler manipulation are available as reusable macros:

* [[list-links-draggable Macro]] for reordering the entries in a tiddler ListField
* [[list-tagged-draggable Macro]] for reordering the tiddlers that carry a specified tag

See DragAndDropMechanism for details of how to use the low level drag and drop primitives to build more complex interactions.

The standard HTML 5 drag and drop APIs used by ~TiddlyWiki are not generally available on mobile browsers on smartphones or tablets. The [[Mobile Drag And Drop Shim Plugin]] adds an open source library that implements partial support on many mobile browsers, including iOS and Android.

in this file: https://github.com/Jermolene/TiddlyWiki5/blob/master/editions/tw5.com/tiddlers/features/Drag%20and%20Drop.tid

If your tiddlers are more data-oriented than text-oriented, then it might be simple to build them in a custom transformation from a raw data source, storing them in individual files.

I will try to expand more on this idea after I get some sleep.

I’m going to describe parts of how I built a wiki for my local political party. I think the analogies should be obvious. But if not, please ask.


Input Data

Among other things I want in my wiki is a list of voters, so we can query various things about them without going out to an external tool. In fact only a few of the intended users of this wiki have access to that tool, but I was able to use it to get an extract, which looks like this:

            RawData/VAN_Extract.txt

Voter File VANID|LastName|FirstName|MiddleName|Suffix|Sex|DOB|Age|Party|mAddress|mCity|mState|mZip5|mZip4|mAddressID|Address|City|State|Zip5|Zip4|AddressID|Preferred Phone
123|Bear|Yogi|Da||M|05/01/1958|66|U |123 Jellystone Park |Hanna Barbera|ST|12345|6789|294191234|123 Jellystone Park |Hanna Barbera|ST|12345|6789|294191234|8005551234
234|Rubble|Betty|Marie||F|11/01/1960|64|U |347 Cave Stone Rd |Hanna Barbera|ST|12345|6789|294191491|347 Cave Stone Rd |Hanna Barbera|ST|12345|6789|294191491|8005552345
345|Doo|Scooby|Joseph||M|05/01/1960|63|D |5 Mystery Machine Ave |Hanna Barbera|ST|12345|6789|294191899|5 Mystery Machine Ave |Hanna Barbera|ST|12345|6789|294191899| 
456|Flintstone|Fred|C||M|03/01/1960|64|R |345 Cave Stone Rd |Hanna Barbera|ST|12345|6789|294195505|345 Cave Stone Rd |Hanna Barbera|ST|12345|6789|294195505|8005553456
567|Flintstone|Wilma|J||F|03/01/1960|64|D |345 Cave Stone Rd |Hanna Barbera |ST|12345|6789|294190525|345 Cave Stone Rd |Hanna Barbera|ST|12345|6789|294195505|800055553456

(One row for each of the approximated 2500 voters in my town.)

Okay, perhaps I changed the details to protect the innocent guilty.

I extracted all the addresses from there and used an online service to get latitude and longitude information for each address, and stored it alongside the above, in this format:

            RawData/LatLong.json

{
    "Hanna Barbera": {"latitude": -14.599400, "longitude": -28.673100},
    
    "123 Jellystone Park": {"latitude": -14.592123, "longitude": -28.654321}, 
    "345 Cave Stone Rd": {"latitude": -14.598888, "longitude": -28.684666},
    "347 Cave Stone Rd": {"latitude": -14.597777, "longitude": -28.684567}, 
    "5 Mystery Machine Ave": {"latitude": -14.600600, "longitude": -28.678901} 
}

(and the lat/long’s have been moved to the middle of the Atlantic to protect them :wink: )

Conversion Process

I wrote some Node.js code to parse these two files and convert to the tiddler .tid format, one for each Voter, one for each Address. The specific code is probably not very helpful, but it’s available if you want to see it:

Code

            scripts/buildContent.js

const {writeFile, mkdir, rm, readFile: rf} = require ('fs/promises')
const tap = (fn) => (x) => ((fn (x)), x)
const map = (fn) => (xs) => xs .map (x => fn (x))
const call = (fn, ...args) => fn (...args)
const display = msg => tap (() => console .log (msg))
const allPromises = (ps) => Promise .all (ps)
const readFile = (filename) => () => rf(filename, 'utf8')

const main = (fileName, latLong) =>
  deleteOutputDirs()   // ensure there's no detritus from previous runs
    .then (createOutputDirs)
    .then (readFile(fileName))
    .then (delay(500))
    .then (display ('Built directories'))
    .then (psv2arr)
    .then (handleVoters)
    .then (handleAddresses(latLong))
    .then (() => console .log ('Completed!'))
    .catch (console .warn)

const deleteOutputDirs = () => 
    rm ('./tiddlers/HannaBarbera/Voters', {force: true, recursive: true})
    .then (() => rm ('./tiddlers/HannaBarbera/Addresses', {force: true, recursive: true}))
  
const createOutputDirs = () =>
  mkdir ('./tiddlers/HannaBarbera/Addresses', {recursive: true})
  .then (() => mkdir ('./tiddlers/HannaBarbera/Voters', {recursive: true}))

const delay = (t) => (v) => new Promise (r => setTimeout(() => r(v), t))

const psv2arr = ( 
  psv, [headers, ...rows] = psv.split('\n').filter(Boolean).map((r => r.split('|')))
) => rows.map((r) => Object.fromEntries(r.map((c, i) => [headers[i], c.trim()])))

const handleVoters = (rs) => Promise.resolve(rs)
  .then (map(getOverview))
  .then (map(writeTiddler))
  .then (allPromises)
  .then (tap (ps => console .log (`Wrote ${ps.length} Voter tiddlers`)))
  .then (() => rs)

const getOverview = (r) => [
  `./tiddlers/HannaBarbera/Voters/van-${r['Voter File VANID']}-${r.FirstName}_${r.LastName}.tid`,
  convertPerson(r)
]

const convertPerson = r => `title: Voters/${r['Voter File VANID']}
tags: Voter
caption: Voters/${r.FirstName + ' ' + r.LastName + (r.Suffix ? (' ' + r.Suffix) : '')}
first-name: ${r.FirstName}
last-name: ${r.LastName}
middle-name: ${r.MiddleName}
suffix: ${r.suffix || ''}
full-name: ${r.FirstName + ' ' + r.LastName + (r.Suffix ? (' ' + r.Suffix) : '')}
gender: ${r.Sex}
age: ${r.Age}
party: ${getParty(r.Party)}
phone: ${makePhone(r['Preferred Phone'])}
address: ${r.Address}
`

const makePhone = (p) => p
  ? `${p.slice(0, 3)}-${p.slice(3, 6)}-${p.slice(6, 10)}`
  : ''
 
  
const getParty = (p) => ({
  'U': '', 'D': 'Democratic', 'R': 'Republican', 
  'I': 'Independent', 'G': 'Green', 'L': 'Libertairan',
}) [p]

const handleAddresses = (latLong) => (rs) => 
  Promise.resolve(rs)
    .then (convertAddresses(latLong))
    .then (map(writeTiddler))
    .then (allPromises)
    .then (tap (ps => console .log (`Wrote ${ps.length} Address tiddlers`)))
    .then (() => rs)

const convertAddresses = (latLong) => (rs, loc) => Object .entries (Object .fromEntries (rs.map (r => [ // `entries` dance for uniqueness
`./tiddlers/HannaBarbera/Addresses/${r.Address.replace(/\s/g, '_')}.tid`,

`title: Address/${r.Address}
tags: Address
caption: Address/${r.Address}
address: ${r.Address}
street-number: ${r.Address.split(' ')[0]}
street: ${r.Address.split(' ').slice(1).join(' ').replace(/ (?:Apt.?|#).*$/i, '')}
${addApt(r.Address)
}city: ${r.City}
state: ${r.State}
zip5: ${r.Zip5}
zip4: ${r.Zip4}
${(
  loc = latLong[r.Address.replace(/ (?:Apt.?|#).*$/i, '')] || latLong['Andover'], 
`lat: ${loc.latitude}
long: ${loc.longitude}
alt: 0`
)}`
])))

const addApt = (a, m = a.match(/ (?:Apt|#) (.*)$/)) => m ? `apt: ${m[1]}
` : ''  

const writeTiddler = ([fileName, content]) => writeFile (fileName, content, 'utf8')

   
main ('./RawData/VAN_Extract.txt', require('../RawData/LatLong.json'))

And I run this code with node scripts/buildContent

Tiddler format

This reasonably simple script creates files that look like this:

            tiddlers/HannaBarbera/Voters/van-123-Yogi_Bear.tid

title: Voters/123
tags: Voter
caption: Voters/Yogi Bear
first-name: Yogi
last-name: Bear
middle-name: Da
suffix: 
full-name: Yogi Bear
gender: M
age: 66
party: 
phone: 800-555-1234
address: 123 Jellystone Park

and like this:

            tiddlers/HannaBarbera/Addresses/123_Jellystone_Park.tid

title: Address/5 Mystery Machine Ave
tags: Address
caption: Address/5 Mystery Machine Ave
address: 5 Mystery Machine Ave
street-number: 5
street: Mystery Machine Ave
city: Hanna Barbera
state: ST
zip5: 12345
zip4: 6789
lat: -14.6006
long: -28.678901
alt: 0

This is the .tid format. Fields are given in lines like field-name: Field value. title is a required TW field. tags and caption are very common ones. The others in these samples are specific to my wiki. Note that here I don’t include a text field in these tiddlers, but if you want one, it simply appears last, and without any key, separated from the other fields by a blank line:

title: My Tiddler
tags: Demo [[This is temporary]]

And here we begin the multi-line
`text` field, with //whatever// wikitext
we choose.

Folder Structure

Note that these files get dropped into subfolders inside the tiddlers folder. If you’re running in Node, all tiddlers live inside tiddlers, and the internal folder structure is simply a convenience. They all end up in TW’s flat namespace – based on internal title and not the file name. But if you’re not running in Node, these files can still be dragged and imported into any wiki.

(In actual practice, I don’t drop these in the tiddlers folder, but in the plugins one instead. That treats them essentially as data-only tiddlers. But it’s not important here.)

Now by adding various templates and cascade options, I can view and search this data in an interconnected manner. You can see it in action at http://scott.sauyet.com/Tiddlywiki/Demo/HannaBarbera.html.


So that’s my practice. I’ve done similar things on a number of wikis. I don’t know if you can get your data into good enough shape for such an automated convers. I have my doubts, seeing the text generated by pdf2go, but there’s much I don’t know. In any case, I find this a useful way to initiate data-heavy wikis, based on an external source.

Scott,

Thank you for the explanation. I will have to think about how I might utilise this approach.

Bobj