Chinese, Japanese, Korean Character Lookup Database Project

AlfieA · August 18, 2025, 2:33am

Thank you. Yes. I’ll be adding jp-en dictionary as well. In addition to that, there will be vocabulary lists on topics which in turn are tagged.

While I’m here and so I don’t waste your precious time, I’ll give you a few more examples: 種 (tane = kind) (because it’s on my beer can (3 kinds of hops) as I write) non-alcoholic, if you’re wondering, 錠 (jou = tablet) from the vitamin C tablets on my cabinet, and 都 (miyako = metropolis), my favourite kanji (for reasons I don’t really know; maybe shape and sound?).

Can you try to guess what the codes would be? 種 has 14 strokes. 錠 16 and 都 11.

with 種, the first stroke starts at 2. The last stroke at the bottom starts at 8. So it would be 2814jo. 錠 is 2516jo and 都 5511mi. I know the last one was difficult.

Ste_W · August 18, 2025, 6:37am

@Charlie_Veniot might be worth reaching out to for a chat as he’s hammering away at a French-Acadian TW (hope I got that right)dialect project and might have some useful thoughts or solutions.

(edit spelling corrected)

AlfieA · August 18, 2025, 10:33am

Yes. I am certainly aware of his work. Acadian. Especially in regards to tagging words. Words as tiddlers. Phrases as tiddlers. So many questions.

Springer · August 19, 2025, 1:12am

Both @etardiff here and I have some significant study of Japanese in our experience (maybe others as well?)

I’m intrigued by the idea, and it sounds daunting!

It seems the first numeric part of your system (“2402” in your sample) has the advantage of being potentially intelligible to someone who does not know the readings (and to learners coming from a Chinese-language background, whose guesses at Japanese on-readings are hit-and-miss). The only challenges would be around strokes that may seem positionally ambiguous (or maybe someone looks at the character for the number 9, and thinks the second stroke seems to be at the line between top-third and middle-third, but guesses “wrong” — indeed where would we put 井? Err towards the middle third, if stroke doesn’t court the edge?). This number system, if it could be intuitive, would bypass needing to be confident with the radical system (the traditional dictionary approach). Of course, someone would still need to be confident with stroke direction norms — so, the numeric part would not be entirely beginner-proof…

Having to categorize by reading (on-yomi?), though, seems to be a potentially rougher path. Perhaps you have a cascade of ideas about what trumps what (on-yomi over kun-yomi, more common readings over less common?).

At any rate, the power of a TiddlyWiki for info about characters (including compounds, historical variants, cross-references, fuzzy-logic connections, etc.) is enormous, and I wish you great luck with this project!

(And one beautiful thing about using TiddlyWiki like this is that you could easily superimpose index systems, so that the same character could be identified with your new system and also through radical-based lookup, or Joyo numbers, and/or other reference works you’re inclined to incorporate…)

Have you already gotten your yomigana integration set up for your TiddlyWiki resource? (Ruby markup, or…?)

AlfieA · August 19, 2025, 3:43am

Thank you. Yes, the idea is to create a system that even people with limited knowledge could navigate.

On-yomi, kun-yomi. Thank you for pointing that out. My answer to that is that there would be multiple entries for the same kanji. That’s the beauty of TW!

九 kyuu, ku, kokono are the three readings for “nine”. This follows that there would be three entries: 2402ky, 2402ku and 2402ko respectively.

There are other readings for 九. 九十九: tsukumo. 九十九折り tsudzuraori and やつがしら yatsugashira. But it’s beyond the scope of my initial joyo kanji (daily use) kanji. But those entries would be 2402ts, 2402ts and 2402ya.

There is a second code I did not mention. Same as the first code and involves the first and last strokes. But they involve: where to?

So can I explain the first code a little more…

As I mentioned, it is a 3x3 grid. But imagine that grid as having the 5 (middle of the grid) taking up everything except for the fringes:

12223
45556
78889

So, unless the stroke starts at a very edge or corner, it will be counted as 5.

Now for the second code; the “where to”. That is where the stroke is directed from a “centre” of 5.

九. The last stroke’s final direction is up. That is, from the centre position of 5: 2.

That grid could be thought of as inverse the first grid mentioned. In that the outer parts make the majority of the grid. So:

1112333
4445666
7778999

Actually, in this case a “5” would almost certainly be not possible.

I know the second code is a little challenging to take in. But it gives the kanji a “fingerprint”. With TW, you would not need the “on” and “kun” yomi -“readings”. No need for ruby, either".

Ambiguous kanji. Yes! Great point. 井. Great kanji. My wife’s family name! My favourite kanji?? But a great example you put out there. What would it be? Well, the first stroke is the top hirizontal. It is clearly shorter than the second horizontal. It would be 5. Of course the last vertical stroke would be coming fromt the top and would be 2. So: 52. 5204ii. Oh, I doubled the “i” because there is only one “i”, but I need 6 characters for consistency. The second code; the “where to” would be 68. So the whole code “kanji fingerprint” would be 5204ii-6804ii. In fact in the TW dictionary (I’m calling the the “Hexidecimal Dictionary”) would be: 5204ii-6804ii–井–二–せい–しょう–い–sei–shou–i–well. You’ll have to guess that well means a water well. My wife’s name means “above the water well”. How romantic is that?

Thank you for the question, Springer. Important points you made!

Alfie

mlp · August 17, 2025, 11:04pm

I speak Japanese for work, and I’m very interested! That seems cool! Can it be a sort of en-nihongo (and vice-versa) dictionary too?