How to create an EPWING dictionary or eBook

Step one is to download and install EBStudio.

Step two is to arrange the text of your source file into a format that EBStudio can understand.

Step three is to use EBStudio to convert that source file into an EPWING dictionary or eBook.

Note there's one crucial change from my earlier advice (changed Mar 14).

Formatting the text of your source file

     There are a number of options for formatting the text of your source file before converting it to EPWING. Note that if you have a dictionary file with tens or hundreds of thousands of entry, you will have to change the format of every single one of them--if you have no idea how to do that easily, you might want to rethink taking on this project--you'll have to find advice and technical help for that editing somewhere else. I'll just give you two hints to point you in the right direction. You need to teach yourself how to write regular expressions, and you want to use a text editor that is comfortable with Japanese and allows full use of regular expressions--NOT anything that describes itself as a word processor (EmEditor is a great choice; but with a really large file, you'd be much better off using a GREP tool than a text editor--PowerGREP is excellent for Japanese but it's not cheap). Again, the following instuctions assume you know how to do such large-scale editing; I cannot offer on any tech support on how to edit your documents. Anyway, of your formatting choices, here are the two I'd consider for a dictionary project (regular html would be fine if you want to create an eBook). Whichever format you choose, you should make sure the document uses the Shift-JIS (SJIS) encoding (usually best to edit as unicode, then convert to SJIS as the last step). EPWING and unicode don't play well together.

     The simple option is PDIC1行テキスト形式 format, a text file with one line per entry, formatted as:

面影 【おもかげ】 /// countenance / face / vestige / trace

Note the spaces around the triple and single slashes. Both the normal written form (here in kanji) and the pronunciation in the funky full-width brackers will be indexed, meaning you can enter either to search for this word. However, the pronunciations in brackets will appear in the list of results in the order in which the entries appear, not in kana order, which can make it harder to scan that list of results, especially in a big dictionary. To get optimal search results, you need to make two versions of the entry, e.g.,

おもかげ /// 面影 / countenance / face / vestige / trace

面影 /// ̄もかげ / countenance / face / vestige / trace

(this is quite easy to do, even for huge files, but having twice as many entries takes up more memory card space). You can play with the format a bit, as in the two alternatives shown. You should then put all the entries (each line) in the Japanese version of alphabetical order. PowerGREP does this very well. While most editing programs, especially greppers, work best with text in UTF8 encoding, remember that before you use EBStudio to turn the file into an EPWING dictionary, you need to convert the text into SJIS encoding and, obviously, as a text file, the filename should end with the ".txt" extension.

     A more flexible option is HTML. The format is more complicated, but you can do more with it. Here's a basic example:

<html>

<head>

<title>Title of your dictionary--whatever you like</title>

</head>

<body>

<dl>



<p><dt>重苦しい【おもくるしい】</dt></p>

<dd><p>heavy, gloomy, oppressive</p>

<p><dfn>胸が重苦しい</dfn> one's chest feels constricted</p>

<p><dfn>心が重苦しい</dfn> feel gloomy</p>

<p><dfn>重苦しい雰囲気</dfn> an oppressive atmosphere</p>

<p><dfn>重苦しい空模様</dfn> a gloomy [sullen] sky</p></dd>



<p><dt> 面影【おもかげ】 </dt></p>

<dd>countenance, vestige, trace</dd>

</dl>

</body>

</html>

Each entry in the dictionary consists of the keyword in <dt> tags followed by the definition in <dd> tags, then skip one line before the next entry (the skipped line may not be obvious in the above in your browser). As shown, write the keyword the way it's normally written first, then if there are yomigana, put those in the funny brackets as shown. You will then be able to look the word up by either the regular reading (kanji in these cases) or by the yomigana. Terms in <dfn> tags will also be indexed along with those in <dt> tags. In the first case above (from Kenkyusha's Intermediate J>E dictionary), this allows the end user to look up entries in a list of related terms after the main entry (probably the most common use for <dfn> tags). A lot of other standard HTML will also work--for examples, look in the EBStudio manual, 五 ファイル形式\本文:HTML形式\EBStudioでサポートするHTML要素. Note that each entry looks more cumbersome than those in PDIC1行テキスト形式 format, but these are just the source files, not the final EPWING files you'll install on your PDA. The converted EPWING file might actually be smaller since you don't have to have duplicate entries (kana first and kanji first). Finally, remember that the name of your source file should end with the ".html" extension. And that the encoding must be SJIS.

Using EBStudio to convert a file to EPWING or eBook file format

     In the step above, "format" referred to how the text in a file was arranged. However, "format" can also mean "file format," which is how the bits and bytes of the file itself are arranged. Each program can open and use only those file formats it was designed to work with; when it tries to open a format it doesn't understand, it sees millions of ones and zeros and has no idea what they mean. To make a dictionary file usable to EBPocket or any other EPWING reader, one must convert it to the only dictionary file format EBPocket understands, EPWING.

     First, create a new, empty folder for the dictionary you're about to make (call it whatever you like, but make it easy to find). Open EBSTudio and from the ファイル(F) menu in the main window, choose the first option 新規作成(N) ("Make a new book"); or click on the first icon in the taskbar, the blank sheet of paper. The Output Dictionary window will pop up:

1. What you want to call your dictionary (up to you--short and simple is nice)

2. Name for the subfolder in which the new EPWING files will be located (up to 8 alphanumeric characters)

3. Type of dictionary. 英和辞典 is English--->Japanese; 和英辞典 is Japanese--->English; 一般書物 is general book (if you're making an eBook instead of a dictionary). If you're using a file with both E>J and J>E entries, choose either 英和辞典 or 和英辞典.

4. Copyright information file (in HTML format). Push ellipsis button on right to browse for file. Good idea to create such a file if you're using copyrighted material (and you almost certainly are, even if it's free for use), but optional. As a model, see the copyright file for my adaptation of Edict.

5. Index term/keyword definition file (in XML format). Push ellipsis button on right to browse for file. Optional--if you know what this is, how to create it, and how to use it, have fun; otherwise, LEAVE THIS BLANK.

6. Gaiji information. Unless you've created Gaiji for a file, just leave this alone and don't change the defaults. (Gaiji are small image files of archaic characters that are not included in the JIS-X4081 encoding that EPWING uses.)

Click OK, and then the 入力ファイルの登録 (Input File) window pops up:

1. Name of input file (the one you want to turn into a dictionary). Ellipsis button to browse for file.

2. The name you want to appear in the menu of your dictionary reader software menu (anything is OK).

3. Format of the input file (the source dictionary you edited above).

Click OK, and now you're back to the main window:

1. Leave this alone.

2.入力ファイル Input file. You entered this already in Input File window above, so just leave it alone now.

3. Directory where all your input files are. Should be the folder where your input file entered in the 入力ファイルの登録 (input file chooser) is. As for every line with such a button, the ellipsis (3 dots) button to the right is to browse to the location.

4. Index definition file. Same as line 5 in the "make a new book" window (yup, you gotta enter it twice).

5. Copyright file. Same as line 4 in the "make a new book" window.

6-7. Gaiji font file and Gaiji file. If you know what these are and have made them, enter them here. Otherwise, just leave these blank.

8. Output path. Where you want the finished files to be created (the folder I asked you to make at the very beginning of this process).

9. Format of the the book/dictionary you're trying to create. Choose JIS-X4081(EPWING) for a dictionary, 電子ブック(EBXA) for an eBook.

10. Indeces to create. When you search for a term, the search engine searches the indeces rather than the entire file (this is why a huge dictionary can be searched almost instantly--actually searching the dictionary itself would take much longer). You must create an index for each type of search you want to allow.

If you are using the free, unregistered version of EBStudio, you can only create the first two types of search, for terms beginning with the search term. However, it would also be useful to be able to search for words ending with a particular character or characters--probably worth the 1000 yen it costs to register the program. Cross search is extremely useful if your dictionary includes phrases and example sentences as keywords, as Eijiro does. The more types of search you enable, the more flexible the final product will be, but the huge amount of indexing required will make the final dictionary take up more space on your memory card.

Now, set the options. Pull down the オプション(O) menu and choose the only thing there (or just click on the crossed-hammer-and-wrench icon in the taskbar) to pop-up the Option menu, below. First, click on the 本字 tab.

1. Check this

2. Specify the type of brackets aroung kana in your source file (【】 are a good choice because they're unlikely to appear anywhere else)

3. Is it 漢字【よみがな】 or よみがな【漢字】 Choose 表記【かな】 (and format your source file accordingly: this will also work for English entries and entries without yomigana, but the other option won't).

4. If you choose this and don't specify an index scheme (前方一致表記形見出し、etc.) then it’ll create a lightweight version indexed by just the keyword. Not sure how limited the ability to search would be if you did this..

5. If the input file has subsections, this option will pop up a box asking you which subsection you want when you try to choose this dictionary DON'T CHECK

6. Inserts an extra line between paragraphs (two line feeds instead of one, creating an empty line between the two paragraphs). OPTIONAL--See how display looks with and without it

7. If you check this, words in dfn tags within entries will also be indexed as keywords--UNNEEDED unless you've formatted your source file to use <dfn> tags.

8. Index keywords in dt tags; be sure to also check “見出し要素(<Hn><dt>)からインデックス作成” (probably under the INDEX tab) if you check this. CHECK THIS

9. Creates a menu, like a table of contents, from the headings in your file. Useful if you're making an eBook with chapters and settings with names in heading tags, not so much if you're making a dictionary.

10. If you check this, headings beginning with numbers, 第n, chapter-section numbers, and 丸数字 ( ) will NOT be indexed. For a dictionary, leave UNCHECKED. For an eBook, might be good to check both (9) and (10).

11. Number each paragraph within an entry OPTIONAL; maybe good for numbering subdefs

12. Use rubies (furigana) USELESS unless input file has rubies

13. Use the right arrow symbol for links; eBooks must choose (A); dics choose なし(N) unless they use links.

14. When searching a book, display results show characters and sections [章・節] or paragraphs [段落] containing the search term; for a dictionary, choose [章・節]; either is probably OK for an eBook.

Next, click the INDEX tab.

1, 6, 7, and 8 seem to be grayed out because I haven't registered yet; not sure, but I think checking 6-9 would result in much larger file sizes.

1. Don't index by these words (only in registered version).

2-4. Check (2) and (3). Characters within indexed terms to ignore when making the index (e.g., if the keyword in the dictionary actually is written "lib・er・ty" it will index it as though it were written "liberty" if you ignore the dots). (2) is to ignore spaces, (3) to ignore dots, apostrophes, and hyphens, and (4) to ignore whatever you enter here, up to 64 characters. However, the way it actually works is counterintuitive--unless you check (2) and (3), you won't be able to look up words or phrases with spaces, dots, or hyphens at all (however, you don't have to enter the spaces, hyphens, etc.--leave them out of the search term and you'll still find the word). In short, you MUST check (2) and (3). (4) is optional: if there's something else you want to be treated the same way as spaces, hyphens, and dots, put it here.

5. There is no five. Oops.

6. Automatically enter the results for a conditional search (only in registered version). If there are a lot of results for a search, automatically filter the results by quality. I'm not bothering to figure this one out--leave it blank unless you know how to use this.

7. Similar to (6) but lets you do a cross search of the results (only in registered version). Again, leave blank unless you know what to do with this.

8. Allow conditional search of text (two or more Japanese characters) within the text of the search results. Again, leave blank unless you know what to do with this.

9. Same as 8 but for an English word of two or more letters. Leave blank unless you know what to do with this.

10. Create indeces from the keywords and yomigana in <dt> and <hn> (heading) tags. CHECK THIS.

11. Show the search term in results of conditional searches. I leave it unchecked, but you can try it and see if you like it.

12. In the results, show the syllable separation dots in English keywords. Optional, but more geared toward Japanese--English speakers don't need this. It's checked by default, but I uncheck this.

13. For when you search by kana, set option. Set to Level 2 (this makes hiragana and katakana interchangeable in the index and searching--e.g., if you search for うさぎ you'll be able to find the word for rabbit, even the dictionary yomigana read "ウサギ," or vice versa). In Level 1, you can enter hiragana and find either hiragana or katakana (if you enter katakana you can find only katakana), and in Level 0, you must enter hiragana to find hiragana and katakana to find katakana. The manual says some EPWING viewers don't let you enter katakana, in which case you MUST choose level 1 or 2. Level 2 gives the most flexibility and error tolerance.

14. Change from previous advice! If you're making a dictionary with keywords in Japanese (e.g., a Japanese-to-English or Japanese-to-Japanese dictionary, you MUST check this box, otherwise some kana-only words won't show up in search results. Not necessary for dictionaries with only non-Japanese keywords (e.g., an English-to-Japanese or English-to-English dictionary).

Finally, click the 作業領域 (Buffer) tab.

The defaults are far too low for large dictionary files, so the program will stop working when it hits one of its limits. If you don't have any of the stuff mentioned in (6) to (10), just keep the defaults (or set them lower if you're having memory problems).

1. Buffer size. As it says, set it to be 150% or more larger than the combined sizes of the input files.

2. Word index. Set a big bigger than the number of entries in your dictionary.

3. Kana (yomigana) index. Set a big bigger than the number of entries in your dictionary.

4. Not sure what this is, but the size shown above seemed to work for me.

5. Number of gaiji (image files to display characters not included in the encoding EPWING uses). Unless you made gaiji files, you have zero gaiji, so the default is fine. Maximum is 8836.

6. How many image files (not including gaiji) do you have?

7. Ditto audio files?

8. Ditto anchors?

9. Ditto links?

10. How many items in the menu (table of contents--how many headings in your source file; this is more for eBooks than dictionaries)?

NOTE on the options on this window. These numbers tell EBStudio how much memory to set aside after you hit "GO." If you set them too high, the program will try to grab more memory than your computer has available, causing it to stop and display an error message reading "memory allocate error."

Finally, click OK to save and close all your options and go back to the main window.

Click the red exclamation point icon in the main window taskbar to start building your EPWING file. You can follow the status in the box in the bottom of the window; when it says "カタログを作成しました。処理を完了しました" it's done.

The final product is the folder you created at the very beginning of this process and entered in line 8 of the main window. Copy the entire folder to wherever you want to use it. It contains a file called "CATALOGS" and a subfolder (whose name you chose in another step) containing two subfolders, GAIJI and DATA, which in turn contain the main data files. If you didn't use any gaiji, the GAIJI folder will be empty, but don't delete it.

Congratulations. You've just made an EPWING dictionary.