Download for Windows Download for Linux Download for FreeBSD Download for Mac Manual Wiki Forum IRC Trac

Tuesday, February 10, 2009

Kumaji explained

Kumaji is an advanced subtitle rendering engine, in development.

At the time of writing this, Kumaji does not render any subtitles whatsoever and is in general in a very early stage.

There are several reasons I started the Kumaji project, I will try to cover them in this post.

The name

First what does "Kumaji" even mean? It's derived from Japanese where it would be written クマジ. If you reverse that, you get ジマク, "jimaku" (字幕), which means subtitles. Also, Kumaji can be understood as 熊字, "bear" + "writing", hence Kumaji has a bear for a mascot! (Lots of people have suggested using Pedobear, this is wrong, I don't want to have that association.)

"Kumaji" should be reasonably easy to pronounce for most people and as far as I know it doesn't have the potential to offend people, like "libass" might have. The name is also format-agnostic like the actual renderer will be.

The goals

The key goals are:

  1. Portable code without sacrificing compatibility

  2. Maintainable and hackable code

  3. Speed

  4. Flexibility

Portability is the first and foremost goal. All current subtitle renderers have major problems with this. Those that do compile and work on multiple platforms (libass and the abandoned asa) are strongly tied to details of text and font handling on UNIX-like systems, which means they fail on Windows and Mac platforms because those have much different ways of handling fonts which FontConfig doesn't wrap properly or over-complicates. The result is very sub-optimal. VSFilter depends on not just Win32 (and Wine doesn't implement everything it requires yet) as well as MFC and COM. Perian's subtitle rendrer is Objective-C and entirely dependant on Cocoa text API's. Kumaji will achieve portability by plugging in platform-specific code where appropriate. The motto would be doing the right thing on each platform, whatever the cost.

Maintainable and hackable code is important. It must be possible to jump into the code without having a great understanding of the entire system beforehand, and it must be possible to learn good techniques from reading the code. The code must be well-commented or self-explanatory all around. (Portions of the code with poor, little or no explanations should be treated as bugs and reported.) VSFilter is a prime example of how I do not want the code to end up.

Speed is obviously important, to a certain degree. Reasonable subtitle scripts should render in realtime so Kumaji can be used for softsubbing. This may mean writing some critical routines in multiple versions optimised for different systems, using SIMD intrinsics or hand-optimised assembly code. However, good algorithms and data structures still take priority over SIMD and assembly tricks.

Finally, Kumaji must be flexible. It should be possible to implement support for new subtitle formats without providing more than a parser for them. If a format requires special rendering support not present, it must be possible to add that without taking the entire system apart and jeopardising the previous goals. It should also be possible to use Kumaji as a framework to write custom special-purpose renderers. For example, one can imagine creating a Lua interface for Kumaji's internal functions and use that for scripting advanced karaoke effects.

Help wanted!

Currently, Kumaji is pretty much my own pet project, but I would really like to have some help. What's needed right now is data structure design. If you want to help I expect you to have some knowledge of digital typography, the intricacies of Unicode complex scripts and the Bidi algorithm, OpenType, as well as general data structure and algorithm design. Or you should be have or be able to take an interest in those topics and read lots and lots about them! (It's interesting stuff, really!)

Kumaji is being written in C++ using just the STL (no TR1 libraries, boost or otherwise) so if you want to help reviewing or writing code you should be familiar with that.

The project is being hosted at under the name kumaji.

Related Posts by Categories


  1. I would love to help, but my C++ is kinda limited at the moment. I will try my best anyway, if it's not good enough, then so be it. :)

  2. Sure programming (or code review!!!!) help would be appreciated, but right now it's really data structure and algorithms design that's more important IMO.

  3. Most of Perian is quite portable via GNUStep: I've used it on Linux for a dead project ( Of course, that doesn't include the renderer, which uses a pre-Cocoa API; I'll have to port it to CoreText sometime...

  4. What will you use for font rendering ? freetype ? Because the system renderers give different results for the same input, this is a real hell to try to match them all (if not impossible). I currently work on a crossplatform projet where we ended by using freetype, but it's quite slow.

  5. does this mean you'll eventually abandon aegisub and move onto kumaji?

  6. Aegisub and Kumaji are completely different.

    Aegisub is a subtitle editor.
    Kumaji is a subtitle renderer.

    Rather, Kumaji will be used by Aegisub at some point.

    For rendering, Kumaji uses custom routines. It does (and will) not use rendering routines from any system font library. It will, however, use system font libraries to obtain the outlines for fonts. Simplfied overview of pipeline:
    1. Kumaji reads subtitle data into appropriate structures.
    2. Kumaji uses a system font library and system text layout library to generate laid-out outlines for the fonts.
    3. Kumaji transforms the outlines according to the style information.
    4. Kumaji renders (scan-converts, rasterises and fills) the processed outlines.

  7. I am looking forward to this...
    but I afraid that I can't be any help...
    Anyway, please do your best!

  8. Please make support for MPEG-4 Part 17 (aka MPEG-4/3GPP Timed Text) as well.
    This subtitle format is *much* more flexible, powerful and extensible than ASS, and it's also an open ISO standard (unlike ASS, which is proprietary and very poorly documented, leading to incompatibilities). Its only current disadvantage is lack of good implementations (VLC and MPC support is incomplete), which is the thing that you are able to fix.
    Thank you.

  9. So then, how is the Aegisub version for Mac going?
    I desperately need something that will make subtitles for Mac! All the other ones I've tried were not as efficient as Aegisub is >_<

  10. Hello jsf, I am the MPC-HC project manager.

    I'm really glad that you are trying to create a new multi-platform open-source subtitle renderer.

    I know you are still in early planning stage but i have some requests:

    1. Could you try and make your subtitle renderer compatible with hardware decoding? (DXVA, VDPAU, CUDA, OpenCL)
    2. Would i be possible to replace the MPC-HC built in decoder with your version if/when it gets better?

    I wonder if it would be possible to split the current MPC-HC subtitle renderer/VSfilter into a platform dependent and platform independent part.
    And then replace the independent code with your new codebase.
    This would make the MPC-HC built in decoder much better than it is now, VSfilter will give you windows support and you would only need to add the api's for linux/unix and MacOS X

    Then again, if you manage to create a standalone filter that supports all hardware decoding flavors than mpc integration would not be needed at all.

    You might be able to find willing coders in the MPC-HC, XBMC and Mediaportal pool..

  11. I replied to the above comment here:

  12. I think I may have misunderstood and posted something in the kumaji project's feature request tracker that didn't belong. sorry about that. a better description, summary and feature list on the project page would help prevent future flubs of that sort.

  13. Will Kumaji be multi-threaded? This is the biggest failing of VSFilter IMO as even minimal effects at any high-ish resolution quickly result in a maxed out core.

  14. Yes as much as possible will be put into worker threads, ie. depending on what granularity ends up being practical, "work units" are created and put into queues, which are then picked up by some worker threads.

  15. That sounds brilliant. I only wish I could help out, but as it stands I shall just have to wish you well in your endeavour and look forward to the results!

  16. I guess you are aware of it, but still. It would be great if the renderer was able to work in setup where output sample queue is being used between the decoder and video renderer for improved handling of bitrate spikes and other slowdowns of decoding.

    Good luck with the project!

  17. could you please fix aegisub's bugs instead of wasting time with this?

  18. Before making claims that work isn't being done on Aegisub, I urge you to look at the recent commit logs for Aegisub. The past couple of weeks lots of work has been done, bugs have been closed.

    But there's also some of the more serious problems with Aegisub that lay so deep they can't just be patched around. Some of them are related to poor rendering support, and that's where Kumaji comes in.

  19. I would guess he's griping on the fact that there's been no status updates on the news page for us non-programmers, given that 2.1.6 is now just about 7 months old. and before you say "you could go check the bug tracker", try to remember I did just say "non-programmers". just a current change log from 2.1.6 till now would be very much appreciated.

  20. After reading through the full explanation of what you hope to accomplish in your project, I felt compelled to say, hats off to you sir. You are attempting a very intricate and complicated project. However, I thought that you might want some advice from a fellow programmer, so here's my two cents.

    As every decision for a new project starts from the decision of what platform the subsequent iterations are going to be encoded in, I think you are needlessly limiting yourself to C++ without the TR1 libraries. I can understand the reasons behind choosing a limited subset of a language you are comfortable with, however the inclusion of libraries are to a programming language as patches are to boxed software. If you want to maintain portability, flexibility, compatibility and maintainability, the inclusion of libraries would most likely be a boon rather than hindrance. Also, although I am more proficient in Java than C++, and more in C++ than C#, I still think C# is a better language to design any software intended to manipulate softsubs at runtime. With C++, you run into a lot of tedious garbage collecting maintenance, and in Java you are somewhat limited by the abstracted nature of the environment where you have to research every special task you want to use to accomplish runtime performance enhancing measures.
    C# has the benefit of being heavily supported, as well as concise. I remember creating a parser from scratch in Java because my TA wanted to us to, and then having to remake it in C# when our professor got back from a conference, the difference was about 1000 lines of extra code in Java from my own experience to do the same things in C#.

    Anecdotal evidence aside, I know Java really well, and still prefer C# for its elegance. Another thing to keep in mind is that your goals are in competition with each other no matter what language you use or design decisions. Portability will conflict with maintainability, and readability will conflict with everything else. Make sure to structure your code with an object oriented approach (most programmers can relate to that design decision) and make sure that it is at least readable to you. Comment your code for each input, output stating where each is coming and going to, as well as clearly comment each method and object. Commenting may seem tedious, but it has saved me from having to scrap more code than I care to think about.

    I'm not sure if C++ has anything like Javadocs, but you might want to look into that. Also, don’t hesitate to save multiple versions of your program in iterations of functionality. If you ever get to the point where you break everything to fix one aspect, you can always fall back to a previous fully functional or partially functional version without having to start over. Granted, there are times when you may have to start over. Out of all the projects I did in my major, I had to start from scratch around 3 times on average. Starting over after you have figured out where you are running into problems allows you to optimize your approach in order to minimize the tricky areas you have had trouble with. Also, it allows you to take shortcuts with the object-oriented design approach in that you can reuse functional code as long as it links properly to your main function.

  21. My previous comment was initially too long to post at once, so I culled through it, and decided to simply post twice. All of this is meant to be constructive advice, I don't have any expectations for you to adhere to any of it in the Kumaji project. Here's the rest of my two cents:

    Try to stay away from Assembly (ISA level) instructions unless you want to label every line thoroughly, as even the average programmer will have to look up most of the assembly code individually to know what you are doing. Remember that your goal is to create an easily accessible and usable timing device, not everyone has a degree in computer science or any knowledge of what that encompasses.

    I would suggest that once you are ready to create a GUI for Kumaji, you should follow a design similar to Premiere, which separates video into stills, sound, and moveable sets of placards. A developer would much rather have the flexibility to input text fluidly, rendering in stages, than to have to do additional separate tasks along with timing, quality checking, encoding and translating. (This is under the assumption that Kumaji is to be used by fansubbers as well as other commercial applications).

    Because of the breadth of media players available to windows/mac/unix/linux users, your program should maintain the ability to hardcode subtitles and softcode them as well. This will require the program to save an x/y location generated upon the aspect ratio of the video's playback, length of display time, a value associated to what kind of font it is, as well as a value associated for what color the text is, for every sentence/placard that is generated in the playback. With this amount of static elements determined on the fly, it would save speed in rendering to enable a default font, default color, default location, and default aspect ratio that is changeable by counter-increments calculated on the fly at discrete times. Aspect ratio would at the front, coordinate location offset immediately following, and color/type-of-font as a buffered constant on-the-fly calculation incurred each time a placard is displayed according to the offsets (x and y each need an offset).

    The bulk of the work you will be doing is essentially making sure that the entire GUI works properly, as well as making sure that the final video format conforms to the media standards and is readable depending on how the transcoding is done. On top of that, you have to make sure your parser is efficient as well as thorough.

    Finally, if you need any help with programming theory or want some general advice for any problems you are having, feel free to contact me through my gmail account. I tend to check it sporadically, but I have a decent amount of free time, and I don't mind helping.

  22. Casey,

    You are completely misunderstanding what Kumaji is.
    Kumaji is not a subtitle editing software, it is a subtitle rendering library. Aegisub is a subtitle editor, which includes timing etc., and has been fully functional for years.

    Kumaji is a subtitle renderer, intended to replace existing renderers which are quite poor (in my opinion) and plug into all kinds of software.
    Kumaji will do nothing but read subtitle files and paint the subtitles on top of video.

    You write "don't hestitate to save multiple versions of your program". This point to that you don't know about version control systems, such as Subversion (SVN) and Git. Any serious software developer uses a version control system to be able to see a detailed history of changes, roll back through it and perform branching when required.

    You sound sincere in your advice, but it's the wrong advice for the wrong subjects. Sorry.

  23. Where can I submit long angry code reviews?

  24. So it's basically like a "new and improved" VSFilter?

  25. hoboX10: very basically, yeah... or rather, a replacement of the mess which is vsfilter

  26. Yes! Someone wants to write a render library! I just hope you won't abandon the project.
    I wanted to use some simple effects in my subs, but VSFilter can't take more than two effects at once or so, where libass had no problems. >_> I'd gladly help, but I don't have enough knowlegde on the matter. :/

  27. is there any progress on it?

  28. Things from Jimmy Choo have been a bit “blah” lately. They’ve continued to make a lot of popular styles from red bottoms shoes previous seasons in new materials and finishes, which should make the brand’s devotees happy, but I always prefer to see something new christian louboutin sale alongside old favorites.The Jimmy Choo Rosa Satchel maybe not seem all that striking or innovative to the casual observer, but for Choo, giuseppe shoes it’s something a bit different. Many of the brand’s bags have been a bit over-designed in recent seasons, and with that in jimmy choo outlet mind, the restraint and editing that went into this design are admirable, not to mention very attractive.Many brands have been looking to jimmy choo shoes pare down their offerings into sleeker, more subdued shapes that will appeal to today’s recession-addled consumers, and this is the right way christian louboutin sale to do it. This bag doesn’t lack in glamour or sophistication, but its lines are a bit less aggressive than what we christian louboutin shoes might have seen in 2006, for example. The embossed lizard gives the bag texture on top of the already shiny leather, and christian louboutin boots the way the light hits the finish of the bag creates a sophisticated shine that you just can’t get by dunking a christian louboutin bag in a vat of sequins or glitter. Well played, Tamara Mellon. Buy through Nordstrom for $1750.
    Designer Christian Audigier has passed christian louboutin away. The trendsetter was known for his namesake line as well as both Ed Hardy and Von Dutch, brands that defined the red bottom shoes for women

  29. This is one of the cult game now, a lot of people enjoy playing them . Also you can refer to the game :
    animal jam 2 | five nights at freddys 2 | hotmail login


If you need help with Aegisub or have a bug report please use our forum instead of leaving a comment here. If you have a feature request, please go to our UserVoice page.

You will get better help on our forum than in the blog comments.