KDIC Japanese Dictionaries and Notes

Home

Important Note: the files I "created" were really just transformed from Jim Breen's excellent Edict and Enamdict projects at Monash University. The Monash project represents many years of work; this project of mine represents a week or so of OCD. Prof. Breen and Monash deserve all the credit for these fantastic resources being available for us to use and adapt free of charge. Furthermore, all terms and conditions of Edict and Enamdict, copyright Monash University EDRG, apply to these files. Please see the copyright and license information on the EDRG site.

Note

I really recommend you take a look at WDIC instead; everything KDIC does, WDIC does better. It's more elegant, more convenient, more useful. KDIC is great, but once you get to know WDIC you won't be back. Using the PDIC program, it shouldn't be too difficult to convert any special dictionaries to WDIC format. If you have only the .pdb files and not the original text files, there are programs to convert .pdb files back to text format. From there it's easy to use PDIC to convert them to .dic format for PDIC on the desktop, WDIC on the Palm PDA, and other programs on Pocket PC. The KDIC files here are older versions and won't be updated. Take a look at my WDIC page for more information.

Installation

Installing: You need to have either a Japanese-language Palm OS PDA or a non-Japanese Palm OS PDA with J-OS or CJKOS or some other system installed in order to input and display Japanese on your PDA. Download KDIC and put the kdic.prc, kdic_da.prc, and locjkdic.prc in your main memory (these are all pretty small). If you've got tons of main memory, you can install the dictionaries to your main memory, too. But if you don't, install them to /Palm/Programs/MSFILES on your memory card or memory stick. I'm running mine off a memory stick pro and searches run pretty much instantly, so there doesn't seem to be a speed penalty for installing to the card instead of the main memory.

Dictionaries for KDIC. There are two separate dictionaries, one for words and one for Japanese proper names (of people, places, companies, etc.); you can probably live without the name dictionary if you're short of space, especially if you don't live in Japan or meet a lot of Japanese. Big_Edict allows look-ups from English and from Japanese by either kanji or kana; Big_Enamdict from Japanese by either kanji or kana.

The Big Dictionaries (updated May 17, 2005--some changes from 01MAY05 Edict, reworked about 1100 entries to be easier to look up):
Big_Edict: Word dictionary built from Dec. 2004 full Edict. 93K kanji-to-English entries, 109K kana-to-English entries, 175K English-to-Japanese words, and 73 K non-Japanese names of people, places, etc. from the Enamdict (e.g.., “How do you write 'New York' in katakana?”). About 11 MB.
Big_Enamdict: Japanese name dictionary built from Jan. 2005 Enamdict. To look up proper names by kanji or reading (kana). 350K kanji-to-kana-to-English entries, 411K kana-to-kanji-to-English entries, 9 MB.

This is a good English-only dictionary file (like a pretty complete Webster's on your Palm Pilot):
Wordnet: ~10MB.

Use

KDIC's documentation is pretty clear. Here's some advice about using my dictionaries.

To search Japanese-to-English: By reading, if you know how the word is pronounced: Just enter the reading of the word (in kana) in the search line, using your IME (built-in on a Japanese Palm, or the one you installed). By kanji, to look up a word you see written: KDIC itself doesn't have any kind of function for entering kanji, so you'll have to either (1) enter the kanji through your IME or (2) look up the kanji using another program (PAdict is the best), then copy and paste it into the KDIC search line. In either case, you can tap "L" for "List" to scroll through the list of all entries at the point of your search term (say, to look for all compounds beginning with a kanji you've entered--generally you'll find a lot more than you will in any electronic dictionary).

To search English-to-Japanese: Enter a term and search. KDIC will only show the first result for that search, even though there may be quite a few, so tap “L” to scan through all the entries beginning with that word. The first thing to come up may not be just what you want, but by scanning through that area of the list you can almost always find what you're looking for very quickly. Finally, KDIC alphabetizes upper case and lower case entries separately, so be sure to use the right case—you won't find “new york” or “Dog,” only “New York” and “dog.” Finally, there are quite a few proverbs and sayings in Edict, but they may not have all been handled the same way in my automatic formatting (see below); if you're looking for a proverb or saying and it doesn't come up, try some simple variations: with the first word capitalized or not, with and without any prepositions or articles at the beginning, etc.

Minor bugs in KDIC (and easy work-arounds): When I select some text and then slash in the graffiti area to bring up the command window, then tap the “copy” icon, I get a message saying that this function is not yet enabled. Ditto “paste.” The functions DO work; you just have to write “C” and “P” using Graffiti instead of tapping the icons. Also, when there is more than one entry in the list starting with the same word, if I tap any of those entries, KDIC will open the first entry starting with that word, even if I've tapped the second or third entry. Instead of getting frustrated, from inside that wrong entry just scroll down to the entry you want using the up-down triangle icons in the lower right corner. It rarely takes more than a second to scroll down to the entry you want.

For a key to the grammar notes, see the Edict documentation page. Terms like "v5r" refer to specific verb conjugations; I kept them in in case someone later figures out a way to link these to Jim Breen's tables showing the conjugations for each verb (look up a verb at Prof. Breen's WWWJDIC then click on "V" next to one of the results if you want to see how these can work).

If you want to use these dictionaries on Pocket PC, I can send you the .txt files of the databases so you can convert them to run on the Pocket PC dictionary software of your choice (I don't know much about that). You can download the originals from the Monash site if you like, but the English-to-Japanese file in particular was a pain in the neck to make (some of the strings of regular expressions I used in the search and replace ran to a couple of hundred characters) and I'd be happy to save you the trouble of duplicating my effort. Ditto if you want to use my files as a base for some customized version (e.g., a romaji version). Ask and you shall receive.

Technical Information (for the technically curious only, not necessary to use these dictionaries or KDIC)

Software: Step 1 in making your own dictionary file is to format the .txt files, as in the transformations listed below. Microsoft Word and its otherwise excellent open source alternative Open Office proved inadequate (they allow regular expressions in the search, but not in the replace, making them incapable of the steps detailed below). EmEditor turned out out to be a powerful and graceful tool for editing with regular expressions, and a comfortable tool for regular word-processing (it doesn't make asinine guesses about what I'm trying to do, and then forbid me from doing anything else, the way Word does). It also proved to be faster than either Word or Open Office at huge operations (global search and replaces on documents running tens of thousands of pages), sometimes by more than an order of magnitude. And, finally, EmEditor supports different languages and encodings seamlessly. I was so impressed with this software that even though I was able to complete this project within the free trial period and even though Open Office fulfills all my other needs at home and MS Word at work, all for free (at least to me), I decided to register my copy of EmEditor before the trial period ran out. I"m a cheap SOB, so that's a big endorsement.

Step 2 is to convert the huge .txt file to a .pdb formatted for KDIC. KDIC includes the easy and effective command line program gendic.exe to do this. Not much I can say: it was easy to use and it worked every time.

Why didn't I combine the Word and Name dictionaries? If I put them together, then you'd have to wade through long lists of names when looking for a word and vice versa; it would make scrolling through the lists looking for just the word (or name) you want a tedious task.

Big_Edict is just the kanji-to-English, kana-to-English, and English-to-Japanese Edict (word) dictionaries described below, all lumped into one file for convenience. Big_Enamdict is just the kanji-to-English and kana-to-English Enamdict (name) dictionaries lumped together.

Why isn't there an English-to-Japanese name dictionary? The entries in Enamdict are Japanese personal, place, etc. names, so listing them in English pretty much amounts to writing Japanese in romaji (I did pull all the katakana names out of Enamdict and put them into Big_Edict, indexed by English spelling, on the theory that these are mostly non-Japanese names). Which brings up another question: Why can't I look up a Japanese word by its romaji spelling? It would duplicate the same thing already written in kana, thus taking up a lot of space unnecessarily. Plus, I share the dislike of most Japanese teachers I've spoken with of what would essentially be using romaji for what should be written in some form of Japanese; I see it as unnatural, and a crutch for the new learner that quickly becomes an impediment to becoming comfortable in Japanese. Kana just aren't that difficult; why put off learning them? If you have to read them every time you use the dictionary, you'll soon become proficient at it. Some have suggested that it would be technically difficult to create a romaji version, since there's no romaji in the source Edict file, but with regular expressions it would be a trivial matter of an extremely boring day or two to transliterate the kana-first entries into romaji-first ones (most of the time would be spent waiting for your software to change 10 million occurences of “か" to “ka,” etc.).

Kanji to English Edict and Enamdict: Minimal changes from the original Edict file, other than formatting for KDIC (replace the first slash in the Edict entry with "space slash slash slash space" [" /// "]; dealing with KDICs policy of truncating the first term in the entry [the one you'd be searching for] after only 16 characters [see discussion near the end of the English to Japanese section, below]). Note that here and below “entry” means a complete dictionary entry, consisting of the indexed term (the one you'd look it up by) and all its definitions; each entry is one line of text in the .txt document (it may wrap on your display, but has no breaks or returns). I've manipulated the way KDIC handles definitions, so in my files, “definition” just means a subsection, ending in a single slash, of an entry; it's not necessarily a definition of the indexed term.

Kana to English Edict and Enamdict: Same KDIC formatting as the Kanji to English version, plus switching the positions of the kana and kanji versions of the Japanese word to allow looking the entry up by kana only (by reading).

齟齬 [そご] /(n) inconsistency/  becomes  そご [齟齬] /// (n) inconsistency/

Why was the English to Japanese dictionary such a pain in the neck to make (or "How to make your own KDIC dictionary")? First off, I have to say that I'm pleased at how well it does work; it's turned out to be an extremely useful and convenient tool. But, of course, it took some work and it's still not perfect. Why? KDIC indexes and searches the very beginning of the entry (unlike PAdict, which looks for your English search word everywhere in the entry). The Edict file I started with is a Japanese-to-English dictionary, so the beginning of every entry is Japanese, followed by an English explanation of that word. Most of the time the English definitions are just one word or short phrase, so all I had to do was switch it around to put the English first:

齟齬 [そご] /(n) inconsistency/  becomes  inconsistency /// 齟齬 [そご] (n)/

And then I had to do it for the other 248,834 word and name entries. That was easy. However, sometimes the key word in the English definition doesn't come first, requiring a bit of adjustment:

引攣る [ひきつる] /(v5r) to twitch/  becomes  twitch /// 引攣る [ひきつる] (v5r)/

That was pretty easy, too (the grammar code shows that it's a verb, so I don't feel bad about losing the “to”). Obviously, a lot of other changes were more difficult. And, there are 175,544 word entries in the final product, so of course I didn't want to dedicate the next few months of my life to editing each and every one of them by hand. Therefore, I had to design search-and-replaces to put the important English word first in as many entries as possible, automatically. I've always been amazed at the power of regular expressions, but to be honest, I'm still shocked at how well the dictionary turned out. Scan through a list of entries and you'll still find some that need improvement, but the bad entries are surprisingly few, considering they were edited not by some skilled editor but by simple lines of code that would sweep through 175,000 lines of text in a minute and a half (actually, some of the more complicated ones took 20-30 minutes to run).

Just for an example of the kind of thing that could use a human editor, though:

引篭り [ひきこもり] /(n) people who withdraw from society (e.g. retire to the country)/

KDIC will index this under “people,” which obviously isn't the best choice—first, we end up with over 60 entries for “people”; second, if you're looking for a word for something like “recluse” or “agoraphobe,” “people” won't be the first term to pop into your head to enter into a search line; and, third, there really isn't a word for "hikikomori," a unique Japanese phenomenon, in English—the term really needs even more of an explanation than this; it's more of an encyclopedia entry than a dictionary entry (the above definition is out of phase with how the term is used now). But the real issue is this: "hikikomori" is a tough call for me, so how do I get my computer to deal with it automatically? Now try to figure out how to tell a computer program to find the important words in the following entries.

場所布団 [ばしょぶとん] (n) /waiting wrestler's sitting cushion/

舌触りが良い [したざわりがよい] (n) /be soft and pleasant on the tongue/

Entries like this don't get the treatment they deserve. On the other hand, terms like this are Japanese things that we'd hear or read in Japanese and look up in the Japanese-to-English dictionary, not the other way around, so if the entries are a little vague in the English-to-Japanese dictionary, probably the worst that can be said is that they're wasting your PDA's memory. If I could find a way to get the computer to identify such items automatically, the best bet would probably be to simply delete them to cut down the file size. As a fringe benefit of their presence, though, if you take a stroll in the list around general terms like “people,” you'll find some interesting, random stuff.

By the way, Edict has 109,329 entries, so how'd I get 175,544? Since KDIC searches by the beginning of the entry, whenever one Japanese entry had several English definitions, I had to make each English definition a separate entry in order to allow KDIC to find it:

thus

野趣 [やしゅ] /(n) rural beauty/rusticity/rustic beauty/

becomes three separate entries:

rural beauty /// 野趣 [やしゅ] (n) /rusticity/rustic beauty/

rusticity /// 野趣 [やしゅ] (n) /rural beauty/rustic beauty/

rustic beauty ///野趣 [やしゅ] (n) /rural beauty/rusticity/

You'll see I also kept the other definitions of each word after the Japanese. Why? They provide context in case the precise meaning of the search term isn't clear (if you look up "creep," you'd like to know if the Japanese word that pops up means "move stealthily" or "spooky pervert"; note that I cut the entries off after the fifth "extra" definition to save memory). A little math shows that ~66,000 new entries had to be created. One more time, thank you, regular expressions, for doing months of work in a few minutes (these were the longest and most involved strings of regular expressions used in the entire project).

Then, I copied all the katakana entries from the name dictionary (since they are mostly non-Japanese things one might wonder how to write in Japanese), reformatted them to match the English to Japanese entries, and added them to the English to Japanese dictionary, bringing the total number of entries to 248,835.

The last difficulty is that KDIC displays only up to a certain number of characters in the indexed term (the term your search will look for--English in the E-to-J dictionaries, Japanese in the J-E). The entries are structured as "Indexed term /// definition/definition/definition/." That certain number is 31 for English indexed terms, 16 for Japanese. Everything after the 31st character (or the 16th) disappears. However, I noticed that KDIC displays only a small space between the indexed term and the first definition, so here's what I did. The final form of a normal entry in the text file looks like this:

aunt /// おばはん (ksb:) (n,adj-na) /middle-aged lady/

and will display like this on your PDA in KDIC:

aunt おばはん (ksb:) (n,adj-na) /middle-aged lady/

But a term like this

self-aggrandizement under pretense of aiding another /// お為ごかし [おためごかし] (n) /

would be truncated after 31 characters in KDIC:

self-aggrandizement under prete お為ごかし [おためごかし] (n) /

So I split the first part after 31 characters and put the rest of it as the first definition,

self-aggrandizement under prete /// nse of aiding another / お為ごかし [おためごかし] (n) /

so that it displays very much like a normal entry in KDIC:

self-aggrandizement under prete nse of aiding another / お為ごかし [おためごかし] (n) /

The only visible differences are the small space where it was split and the slash between the index term and the Japanese definition. Most of these long entries fall under the description above of Japanese terms with no precise English counterparts, terms that would only be looked up from Japanese to English and not vice versa, but I thought some of them might be useful, and of course there were too many (7635) to go through and make decisions about each one, so I kept them; deleting them would have cut the file size by only a few percent. I split the Japanese-first entries in the J-E Edict and Enamdict the same way (but after the 16th character).

Future development will be limited once I'm sure I've got the bugs worked out (and I'm pretty sure I'm there).

PAdict and some other tools search the whole entry, so they don't need each definition to be an entry all by itself, and they don't need separate Eng-Jap, Kana-Eng, and Kanji-Eng dictionaries. Having three separate, complete versions of Edict is a memory killer, so it'll have to run off a card unless you've got monster memory, but it's the only way to access all of Edict by kanji, kana, and English look-ups on your Palm until the new version of PAdict, using the FULL Edict, is ready later this spring or early this summer (fingers crossed). Even now, with PAdict still using barely more than half of the Edict database, PAdict is my main dictionary tool; this project is intended to be a backup for when PAdict can't find the word I want. (PAdict was the whole reason I bought a PDA to begin with). On the other hand, KDIC running these dictionaries has turned out to be a lot more effective than I'd expected; PAdict is quickly becoming the back-up to this, although that will probably change again when PAdict is able to run the full Edict.

Minor detail: How did I put the English and kana enties into alphabetical order? EmEditor doesn't have this function, so I tried it in MS Word, which told me the file was too big, and the free Open Office, which tried mightily but gave up after page 794. Apparently, you CAN alphabetize 794 pages of single-spaced roman alphabet and kana entries in Open Office (the English version), which is pretty impressive (note: this was with an older version of Open Office; perhaps the new one can do even better). I generally recommend Open Office over MS Word (resist the overlords). In the end, it wasn't necessary—KDIC takes care of this automatically, either when the gendic.exe program included with KDIC converts .txt files to KDIC-compatible .pdb files, or when KDIC generates an index file of all the entries the first time you run a dictionary. So if you undertake this sort of project, don't worry about alphabetizing..

Another minor detail. It's important to have an entry KDIC can default to if you search for something that isn't in the dictionary. Such an entry should read " ? ? ? ? /// " and the part after the three slashes is a good place to put creator, copyright, etc. information; e.g.., see the first entry in each of my dictionaries. Otherwise, when you search for something that isn't in the dictionary (and this includes switching to the dictionary if you don't have anything written in the search line yet), KDIC will crash your PDA, requiring a soft reset (not a disaster, but a pain). And of course each definition within an entry, especially the last, should end in a single slash. An entry that doesn't end in a slash will also make KDIC angry.

If you feel any desire to part with a few bucks out of gratitude for such a nice set of dictionaries being available, please consider donating to the dictionary project at Monash, which is responsible for the real work that went into making these dictionaries; and of course if you find the program useful you should register your copy of KDIC.

Home