Download for Windows Download for Linux Download for FreeBSD Download for Mac Manual Wiki Forum IRC Trac

Friday, October 10, 2008

Kanjimemo brainstorming and input request

漢字メモ?

[Text above added to try to attract some attention to this post]


Ever since I've posted about Kanamemo, there have been quite a few requests for a "Kanjimemo", a tool based on the same idea, but for Kanji.

Even before, I had considered writing something like that... But I never started because I couldn't quite figure out all the details on how it'd work. On this post, I'll talk about some of the ideas that I had for it. If you're interested in a "Kanjimemo", please leave your feedback and suggestions in the comments!

Programming Language

First of all, I'm not sure which programming language to write it in. At first, I considered C++, since that would be the easiest for me and allow the maximum flexibility, at least as far as PCs are concerned. The problem is that I'm already fairly experienced with C++, and so it wouldn't be much of a learning experience (which is always a plus :)) unless I went for Direct3D.

Then I pondered about Java: with all the cell phones supporting J2ME, it seemed like a good idea - Kanjimemo on the go? Great! The real problem came when I realized that J2ME is *REALLY* limited - you often have less than 1 MB of heap memory available (!) for your application, which makes a program like Kanjimemo almost impossible to implement. I also lack a J2ME-enabled cell phone, so I couldn't even work on a J2ME port right away.

A few other languages crossed my mind. C# is something that I've always wanted to learn, but its cross-platform support is quite bad (I'm looking at you, Mono). It's also much slower than Java. Python is another "to learn" language, but I question the sanity of doing complicated data analysis on such a high level and slow language... Plus all the horrible dependencies. Same goes for Ruby.

So, any thoughts on the "language barrier" might be useful.

Basics

On to how the program would ACTUALLY work... Learning kanji is nowhere as easy as learning kana. The problem with kanji is that most of them have multiple (typically two) readings, depending on the word... but some (like 日, one of the most basic kanji) can have many more. So my idea is to have an algorithm that works like this:

  1. Select a group of five or so kanji for each level (like Kanamemo)
  2. Mine EDICT for all words marked as [Common] that use that kanji
  3. Perhaps attempt to extract the pronunciation of your kanji on that word? If that doesn't work, just go with individual words.
  4. Create a list of all the different unique pronunciations and associated words.
  5. Have the user learn all the unique pronunciations, preferably by using words that contain nothing but that kanji and kana.
  6. If there's no word with that kanji by itself, make sure that the user already "learned" all the other kanji in the word displayed.
 Of course, steps 3 and 6 might be very tricky to code. All of this will require mining data from EDICT and possibly KANJIDIC. If it becomes necessary, I might use a SQLite database to store this information.

Progression
Progression would work similarly to Kanamemo, with a new set of 5 kanji unlocked with each memorized set. Ideally, the user could choose profiles to control the new kanji: perhaps follow the JLPT progression, or the japanese school system progression, or how common a given kanji is, or a combination of them (i.e. start with all JLPT4 kanji sorted by frequency, then all JLPT3 sorted by frequency, etc). The user should also be able to customize a list of kanji that he wants to learn.

Given this system, it'd be possible to simply consider kana as being kanji, and have the program work in the same way for those, so you'd be entering actual japanese words when learning kana. This has the advantage of making your japanese reading skill progress.

Multiple fonts
One problem that I noticed with kanamemo is that it was easy to just memorize the font glyph, as opposed to the more abstract shape of the kana. This could prove to be an issue with kana that are very different depending on how they're written (such as さ and ふ). This program would fix that problem by using different types of fonts (cursive, brush, type) randomly, or perhaps by forcing you to learn all the different variation before progressing.

Translation
Since the concept of the program is word-focused, it might feel strange to be learning how to read words without learning what they mean. If you're an anime watcher, then perhaps you already have a relatively big vocabulary of words, but you won't know all of them, and not everyone is an anime watcher. EDICT provides translations, but I'm not sure if just slapping the translations there will do any good... Thoughts on this?

Voice
Finally, it might be useful to have someone read the words out loud for you whenever you get them right. I'm not sure how hard it would be to add support for some third-party voice synthesizer, but it might be worth the trouble.

Other ideas
Perhaps the program should be designed to look more like a game? A little mascot cheering for you, a scrolling background, some background music? Perhaps this game could have multiple "stages" that you would do in alternating order: First learn to read the kanji, then what the word means, then perhaps a speed typing test? Maybe even a grammar test mode?

Development
Of course, what this needs the most right now are IDEAS! If you have any, please share them with us. If you know of somebody who might be interested in this sort of thing, link them to this page! If you want to help with the development itself, drop by IRC and let us know. The idea is that this should be an open, free project.

Related Posts by Categories



26 comments:

  1. You could do it in C#/Mono with a GTK# user interface. This would allow it to run on all platforms supported by GTK and Mono, and you'd probably learn a lot. Mono 2.0 was recently released and it supposedly offers a ton of improvements.

    ReplyDelete
  2. The problem is that GTK is quite horrible under Windows.

    ReplyDelete
  3. wxWidgets can also be used with Mono/C# : http://wxnet.sourceforge.net/

    ReplyDelete
  4. I've been looking for such a program for a while now, never found one I really like.

    I think it's really important to allow the user to enter a custom list or kanji since every single book or course has its own order.

    Turning the whole thing into a game may give extra motivation to some, but other might want to study kanji while doing something else (like watching a so-so anime series or pretending to be earning their living) and will prefer a very simple interface. Having two different programs using the same progress data might be a solution.

    About the translations, I believe they must be learned too, but kanji usually convey a general idea or two, and their meaning change when they're used in different verbs/nouns... Also, when asked for the translation of a word, you might think of a synonym of the 'answer' and will end up being very frustrated... Querying a synonym dictionnary is probably overkill though. Just having a few translations displayed when a correct reading is entered might be enough to actually remember the meaning of the character/word.

    Special readings are another frustrating thing, you may know 今 and 日 perfectly, but that doesn't mean you can write "kyou". Having the program ask for special reading when you know the two or three kanji composing the word is important, i belive.

    Another interesting thing to do is making sure the learner doesn't mix up kanji that kinda look the same (牛 午 and 年 for example) by incresing the probability that kanji known but similar to an unknown one appear. For example, if you failed on 午, 年 and 牛 are very likely to appear in the next few draws (though it probably wouldn't be fair to decrease your score for the "look-alike" kanji). To get this working you'd need to use some kind of OCR, and i'm not sure wether it's worth the extra dependencies or not. (I'm not even sure how well OCR works with kanji.)

    Yet another aspect of learning kanji, is learning to write them, in the right order and with the correct number of strokes, but once again, it requires some kind of visual recognition and is probably not worth the effort since most people only want to be able to *read* kanji.

    Sorry for the wall of text >_>;

    ReplyDelete
  5. Thanks for the feedback. On the "mixup" thing, Kanamemo implemented the following algorithm: every kana starts at 0 points. You must reach 3 points to "learn" it and 10 to "master" it. When you learn all on a given level, you get another batch of 5 kana, and the more you know it, the less it will show up in the future. When you mix up two (say that it shows さ and you enter "chi"), it deduces 10 points from both. I found this to be very effective as it forced the "problem" kana to show up a lot, both of them, ensuring that I learned the difference between them.

    For kanji, this would be a lot trickier, since there would be no easy way to tell WHICH kanji the user mistook it for, but it's something to consider...

    ReplyDelete
  6. Oh, and about using wx for C#, I don't think that I even want something with an user interface similar to that of traditional apps... besides, since I already know wx, I'd be tempted to just use C++ for that. :)

    ReplyDelete
  7. For programming languages, well, it depends on your goal and you didn't make this clear. Does it need to be something new? It seems to me like C++ would be best suited for the program, mostly for performance reasons. If traydict is any indicator, searching through EDICT can be pretty slow.

    3 does seem like it would be difficult, I have a few ideas but I'd need a giant pile of tests to see if it actually works.

    Why is 6 hard? You have your new kanji, the word your are checking, and a list of kanji the user has learned. Check that all kanji except your new kanji are in the learned list. Maybe I am missing something?

    I might be interested in contributing, give me a poke if you get anything started.

    ReplyDelete
  8. Well, six might be tricky because it might be that a kanji "up next" in the list doesn't show up in any words that don't involve more advanced kanji, and we can even get a deadlock here: It's possible that kanji A only shows up in combination with kanji B, and vice-versa. If that does happen, we'd be forced to teach both at once.

    For performance, it's likely that all relevant data would be precomputed (and perhaps stored in a SQLite database?), so I don't think that it will matter much.

    ReplyDelete
  9. Maybe I should point out that I'm now considering using Python for this.

    ReplyDelete
  10. I've decided to just stop trying to do things the hard way and go with C++ (if that statement sounds contradictory to you, you should take into account the fact that I'm insane).

    ReplyDelete
  11. <--- slow

    Anyway, for different displays, I'd suggest using the 6 "main" font styles, "Gothic", "Kaisho", "MaruGothic", "Mincho", "Pop" and "Textbook".

    You can see the differences in hiragana on the following page.

    http://simnet.is/benegaut/InstantzJP/shape.html

    Also, I suggest that you set up options for how many points the user wants to have to get to "Master" a kana/kanji. Also, an import function for different setups of kanji. You could have your system of 5 at a time, an option to have them appear in the JLPT order and in the order of James W. Heisig (See: http://kanji.koohii.com/ ) -- And an ability/function for users to build these themselves. (Maybe some place where you can get more/different settings?)

    Just a couple things that I think would be great to include. :3

    ReplyDelete
  12. Thanks, I'm sure that site will be handy. :)

    Being able to customize your own Kanji list is pretty much a given, and will be one of the first features.

    ReplyDelete
  13. The main point of that was for it being exportable and share-able if other people would perhaps want it. (Maybe someone would make it for a textbook and someone else has the same book, #2 wouldn't have to reassign all the kanji again, but could rather import the already-made order. :)

    ReplyDelete
  14. My idea is that it would list "profiles". On that same screen, you could add, remove or edit profiles.

    Each profile would also list how much % through it you are. So, for example, it would ship with a few standard profiles like JLPT4, JLPT3, Grade 1, Grade 2, etc... Completing one would also count as progress for others.

    Perhaps the program would let you enable or disable profiles as in a checkboxes, so you could study more than one at a time (although nothing stops you from studying one, then the other, as progress on the first would count towards the shared kanji in the second).

    ReplyDelete
  15. [Oh, and yes, profiles would be simple .txt files that you can just share.]

    ReplyDelete
  16. I don't know if anyone is still keeping up with this thread over two months later, but I'll post anyway...

    Anyway, as far as content, I would consider following the Japanese Kanji Certification test (日本漢字能力検定) steps rather than the JLPT. The JLPT doesn't really focus that much on kanji, plus only four (4) steps means a lot groups together. The "Kanken" (漢検) goes from level 10 to level 1, with a level 2.5 and 1.5 for a total of 12 steps; Of course level 10 being the lowest.

    The problem is finding the breakdown of kanji/level. Short of buying (importing) some practice books/tests, I'm not sure how you'd find the breakdown.

    Here's the official Kanken site: www.kanken.or.jp But unless you're already pretty fluent in Japanese, you probably won't be able to navigate it very well.

    ReplyDelete

  17. Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write.
    Thanks for sharing !
    tanki online 2 | 2048 game online

    ReplyDelete
  18. Wonderful blog! I found it while searching on Yahoo News. Do you have any suggestions on how to get listed in Yahoo News? I’ve been trying for a while but I never seem to get there! Many thanks.
    2048 online | tanki online 3

    ReplyDelete
  19. Oh, and about using wx for C#, I don't think that I even want something with an user interface similar to that of traditional apps... besides, since I already know wx, I'd be tempted to just use C++ for that. :) madalin stunt cars 2

    ReplyDelete
  20. This blog is so nice to me. I will continue to come here again and again. Visit my link as well. Good luck
    obat aborsi
    cara menggugurkan kandungan
    obat penggugur kandungan
    obat telat datang bulan

    ReplyDelete

If you need help with Aegisub or have a bug report please use our forum instead of leaving a comment here. If you have a feature request, please go to our UserVoice page.

You will get better help on our forum than in the blog comments.