J_Inflect Development Release 0.3.6 - July 21, 2000 Japanese Verb and Adjective Inflection/Analysis Engine by Collin McCulley (cmmcculley@charter.net) -------------------------------------------------------------------------------- Special thanks to everyone on sci.lang.japan who has contributed to my understanding of Japanese and in particular of verbs for the purpose of this project, including, but not limited to Gerald (Bart) Mathias and Sho Yamazaki. Particular thanks to Jim Breen, for creating and providing the EDICT file, which forms the basis of the data distributed with this software. Also thanks to the following people who have contributed suggestions, corrections, or bug reports: Nicolas Simonet -------------------------------------------------------------------------------- This code is a work in progress and has been made available for comment by potential users. Please send me questions, suggestions, bugs, content corrections, etc. In particular, I have no Japanese language support on my Macintosh so feedback on problems with the EUC output would be helpful. The current working version will be kept available at: http://www.epochrypha.com/japanese/ WHAT IT IS & WHAT IT SHALL BE ----------------------------- J_Inflect runs as a command-line program from a terminal emulator for either Windows or Mac. (The terminal emulator is built-in to the program, not something separate that the user needs.) [See the Special Note for Windows Users below] It is my intent to shore up the inner workings and data structures, and then build a GUI face to this program (though I will happily leave it in command-line form for unix-types). A web interface at a later date is also a possibility. The software and its command-line interface is written in 100% ANSI C++ (with the exception of a small amount of code controlling the SIOUX console environment). I will decide at a later time if it becomes slick enough to warrant a small shareware fee, but that's long term, and will only apply to GUI versions. Anything I release prior to that is freeware. SPECIAL NOTE FOR WINDOWS USERS ------------------------------ The PC/Win version of this program has been recast as a console (DOS-Window) application in order to make this release available while problems are worked out with the Metrowerks WinSioux environment. It is suggested that under the "Layout" tab in the Console control panel you set both width values to 85. Set the window height value to make the window as tall as possible while still fitting on your screen. Set as large a buffer height as is comfortable (500 lines or more) to enable the console scroll bar and enable you to see results that scroll off the screen. That being done, you should be able to double-click the application (.exe) file, as usual to start a console window with J_Inflect. WHAT IT CAN DO -------------- Currently J_Inflect can: * Forward conjugate Japanese verbs and adjectives into many different inflectional forms. * Reverse conjugate an inflected form, matching it against possible candidates from a dictionary file. Inflectional forms are defined in rules files which the user can edit to add new inflection cases, or change current cases to suit preferences. Results can be output with inflected kanji and kana to an EUC text file. For words listed in the dictionary files with certain additional information fields, fully realized English translations can be provided. For instance, "tabemasita" would be listed with the translation "I ate". The copula can be conjugated by specifying "da" or "desu" to the verb conjugator. J_Inflect, for the most part, will automatically recognize the type of verb given to it (as long as it's in dictionary form) and choose the correct conjugation for it. This includes godan verbs ending in -iru or -eru, however, if any have been missed, the user can force the conjugation to follow the desired type. The user can also add the omitted verb to Rules_VClass.dat, or add an entry to the Words_Verbs.dat dictionary file. The two standard irregular verbs kuru and suru are supported, as well as irregularities in verbs such as iku, gozaru, kureru, aru, etc. Verbs that conjugate with suru are supported, although for Sino-Japanese single-kanji verbs that conjugate with suru but have godan-like conjugations in certain tenses, irregularity rules may not be complete or correct. In the current version the ruleset contains exceptions for the potential and for all -nai forms. The user may edit the file Rules_VClass.dat to add verbs for which the anomalous conjugation should apply, or add appropriately typed entries to the Words_Verbs.dat dictionary file. PLANNED FUNCTIONALITY --------------------- * Verb and adjective dictionary lookup: J -> E and E -> J. * Verb analysis and dynamic rule explanation (i.e. "Tell me how you got that answer.") * Learning drills and exercises. Also, as stated above this software has a command-line-style interface, running in a console-emulator on Windows or Mac, or directly compiled for the command line on a Unix/Linux machine. While the current version supports writing kanji and kana inflected forms to an EUC coded file, the GUI version will have support for on-screen kanji and kana display. DISCLAIMER ---------- This software is provided AS IS. NO WARRANTIES are granted, either expressed or implied, and NO GUARANTEES are made for merchantability or fitness for a particular purpose. The author shall not be liable for any special, incidental, consequential or punitive damages arising out of the installation, performance or use of this software. The author makes no claim of expertise in the Japanese language, and has written this as part of a learning exercise and desire to have a learning tool. Follow the information produced by this tool at your own risk. COPYRIGHT --------- This software and its accompanying data files, except for Words_Verbs.dat and Words_Adj.dat are COPYRIGHT (c) 2000 by Collin McCulley. This current version is provided free of charge and may be freely distributed by private individuals by any means so long as no money is charged, but must remain in its original form and must be accompanied by this document. It may not be included in freeware/shareware archives of any type or be distributed by commercial enterprises without my written permission. The Words_Verbs.dat and Words_Adj.dat files use information extracted from EDICT, and as such their content is COPYRIGHT (c) 2000 by The Electronic Dictionary Research and Development Group, Monash University, Clayton, Victoria, Australia. This information is used under the terms and conditions specified in the Electronic Dictionary Research and Development Group General Dictionary Licence Statement, the current copy of which may be available at: http://www.csse.monash.edu.au/groups/edrdg/newlic.html. The version of this license statement in force at the time of this release is the appendix of this document. -------------------------------------------------------------------------------- BASICS OF THE COMMAND LINE INTERFACE ------------------------------------ The command line interface is provided by the author's CLAM_Shell library. Following is a general description of how command syntax works, but the reader way want to skip down and just read over the J_Inflect commands and examples. The CLAM_Shell command-line interface uses the following syntax: command_keyword [switches] [required_arguments] [optional_arguments] Command keywords are single words with no spaces and are case sensitive. Switches are used to specify or modify the behavior of the command or provide keyword input. All switches are optional and so may be left out of any command. Switches are preceded by a switch symbol, usually a dash (-), and are specified by a keyword, usually, but not necessarily, a single letter. For example: -a Defined switches may be included in a command in any order. Single-letter switches which do not require arguments may be appended to a single switch symbol, for example if "a", "b", and "c" are individual switches which do not require arguments, the following is valid, and tells the parser to apply all three switches to the command. -abc A switch may require an argument, or one may be optional. A switch which requires an argument, must appear alone after the switch symbol and its argument must directly follow. A switch which can optionally take an argument can appear with other switches only when an argument is not supplied. A switch which can take an optional argument appearing alone will assume that the next non-switch item in the command string, if any, belongs to itself. For example, if "a" requires an argument, "b" can take an optional argument, and "c" cannot have an argument: -a -bc (OK) -ab (Wrong: -a must appear alone.) -a -c (Wrong: -a requires an argument.) -b -c (OK) -a -b (OK, but note belongs to -b, and is not a command argument.) -c (OK, but is the first required/optional command argument and doesn't belong to -c.) -c -b (Probably wrong: -b will parse as a command argument, not a switch because all switches must come first.) A command argument is identified by its position in the command string, and must come after all switches. Any item in the command string after a non-switch (i.e. anything not preceded by a switch symbol) has appeared is assumed to be an argument. Any required arguments must come first, in order, followed by any optional arguments. To use a given optional argument, any optional arguments prior to it must also be supplied. Switch arguments and command arguments may contain no spaces unless surrounded by quotes. Without quotes, white space delimits arguments. Anything in quotes is assumed to comprise a single argument. -------------------------------------------------------------------------------- J_INFLECT COMMANDS -------------------- conj Conjugates a supplied verb or adjective. Syntax: conj [[-v ]|[-a ]] [-s ] [-p ] [-i|-g|-rs] [-f ] [[-m ]|[-t ]] Switches: -v Conjugate a verb. Requires an argument that is the dictionary form (i.e. plain indicative, or -u form). Must be in lowercase Nihon (waapuro-style) roomaji. -a Conjugate a true (-i) adjective. Adjective must be in the (-i) form. Must be in lowercase Nihon (waapuro-style) roomaji. Note: Either -v or -a may be specified, but not both. If neither is specified, the command will produce output for the current word. A current word is the last word conjugated, or if a single result is produced from a conjugation, the last result. The system indicates when a current word available. The current word may be sent back for further conjugation if applicable (using the previous word or the result of the last operation was a verb or true adjective), and not all conjugations may be available. For instance, the program tries to stop nonsense like "taberarerareraremasu" (repeated conjugation using the potential inflection), but this may not be bug-free or complete. In the case of sending a result back in for a nonapplicable conjugation, the system will tell you there is no rule match. -s Limit to a given sense. The sense is "affirmative" or "negative" (e.g. "I did" vs. "I didn't", respectively). Requires an argument that is any amount of the words "affirmative" or "negative" (e.g. -s neg or -s n will work). -p Limit to a given politeness level. Politeness levels are defined as "abrupt", "plain", or "polite". This switch requires an argument which is any non-ambiguous amount of one of these words (e.g. -p pl or -p pol will work, but -p p will not). Note: Not all inflections have cases for all senses or politeness levels, particularly: "abrupt" applies only to the imperative. Furthermore, results which do not have an inherent sense or politeness level (such as inflection cases which produce another verb) are screened out when applying these switches. Leaving out the switches -s or -p will show all available senses or politeness levels, respectively. -f Output the result of the conjugation to an EUC-encoded text file. This output will include kanji and kana inflections in addition to what is displayed on the screen. Supplying the filename is an optional argument to this switch. If no filename is specified, the filename will be the same as the roomaji, suffixed by ".euc". Output to this file is formatted for 80-column monospaced text, so depending on the viewing environment the user will probably want to reformat to a 9-point Courier font for proper viewing. -i Force regular itidan conjugation. -g Force regular godan conjugation. -rs Force "regular" suru conjugation. The program recognizes that certain Sino-Japanese verbs using the auxiliary -suru have several irregularities in their conjugation, i.e. in certain inflections they conjugate as a root+su godan. For roomaji spellings which may be one of these verbs, the program may also trap homophonic verbs that do not have these characteristics, unless they are explicitly specified in the dictionary file. This switch allows the user to force the conjugation back to the normal suru pattern. It can only be used with verbs that end in suru in the first place. Note: Switches -i, -g and -rs are mutually exclusive and cannot be used together. Additionally, they can only be used when switch -v is used, or a conjugable verb is the current word. As far as the -i and -g switches, the inflection engine automatically detects verb type, including irregular verbs, and verbs with occasional irregularities, and should typically not be overridden except in cases where a verb in dictionary form could work with either conjugation (e.g. "kiru"). One of these switches _must_ be specified in such a case, unless a match is found and chosen from the dictionary file. The -i and -g switches may also be useful if the user has found an -iru/-eru godan that has not already been specified in the system, or the system does not recognize that such a verb could also be itidan. In either case, please report this to the author so it may be corrected. You can also self-correct this condition by editing the Rules_VClass.dat file, or explicitly adding cases to the Words_Verbs.dat file. Specifying one of the forcing switches will bypass dictionary lookup (also disabling kanji/kana output for the result). -m Present a menu of available inflection cases and allow the user to choose which is used. Takes an optional argument that is a list delimited by vertical bars (|) which correspond to choices from the menu. If specified, the menu will be skipped and the named choices will automatically be used. If no argument is specified, the menu of cases is presented. The user may enter one or more choices from the menu (delimited by whitespace). In either case, menu choices may be specified in any order, and results from inflection cases will be output in that order. -t Directly specify the inflection case to be used. The argument must be the exact, case-sensitive name of the inflection case as it exists in the rules file: first letters capitalized. Put multi-word names in quotes (e.g. "Past Indicative"). Note: Switches -m and -t are mutually exclusive. Only one should be used for a given command. Usage Examples: conj -v taberu Conjugates taberu, using results from all applicable rules. conj -a oisii -s aff Conjugates adjective oisii, using all rules producing affirmative senses. conj -v kiku -s neg -p pol Conjugates kiku, using all rules producing negative senses at the normal polite level of speech. conj -v hanasu -p pol -t Indicative Conjugates hanasu into the indicative and showing only normal polite results. conj -a uresii -t "Past Indicative" Conjugates adjective uresii into the past indicative. conj -v iku -m Conjugates iku into the conjugation to be selected from the menu. conj -v iku -m 2|13|19 Conjugates iku into the conjugation's corresponding to inflection cases 2, 13 and 19 from the menu (which is not presented). conj -v kiru -i Conjugates kiru as an itidan verb, and use results from all applicable rules. conj -v kiru -gmf Conjugates kiru as a godan verb into the conjugation to be selected from the menu, and outputs the result of the conjugation to a file encoded as EUC in addition to displaying it on the screen. conj -v kiru -s af -p pl -gm Conjugates kiru as a godan verb into the conjugation to be selected from the menu, only showing results that are affirmative and in the plain mode of speech. conj -v taberu -t Potential conj -p pol -t Indicative First command conjugates taberu to the potential, producing taberareru, which becomes the current word. The second command conjugates taberareru to the polite indicative producing taberaremasu and taberaremasen. ---------------------------------------- rconj Reverse conjugates a supplied inflected form back to the dictionary form, provided that it exists in one of the dictionary files. Syntax: rconj [-a] [-b] Operation: By default this command searches on verbs only. Note that the command can take as long as a minute depending on the verb it is given. The author is working on efficiency issues and making this routine more intelligent. Note that it will find _every_ word in the searched dictionary for which one of the available rules will produce the supplied inflection. Switches: -a Search for a match to an adjective form. -b Search both dictionaries for a match. Optional Argument: inflected_form The inflected word for which the user wants to find the dictionary form. If none is specified, the current word, if available, is used. The supplied inflected form does not need to match the spacing characteristics of the inflected form that would be reported by J_Inflect, e.g. "ohanasinasaru" will find "hanasu", even though a conjugation of "hanasu" would have produced "ohanasi nasaru". Usage Examples: rconj taberarenai (finds taberu) rconj "ai shimasu" (finds aisuru) rconj -a oisikunai (finds oisii) rconj -b tabeyasui (finds taberu from the verbs dictionary, probably produces no additional hit from the adjective dictionary) ---------------------------------------- cw View or set the current word. Syntax: cw [-h|-r] Operation: If no switch is specified this command simply displays the current word. Switches: -h Set the current word to one of the last 10 words that were conjugated. -r Set the current word to one of the words returned as the result of the last conjugation. Note: Either -h or -r may be specified, but not both. ---------------------------------------- dtest Execute a comprehensive test of one or both of the dictionary files. Syntax: dtest [-v] [-a] Operation: This test can take up to 20 minutes to run for the verbs file. The test involves conjugating every word using every applicable rule. The word's roomaji form, along with the result of the last rule executed is displayed. Note that failure at some point in the test is likely to manifest itself as a program crash (via an unhandled exception). Such failure can result from improper editing of dictionary OR rules files as well as bugs remaining in the program. If this test fails for the version of the dictionary and rules files AS PROVIDED WITH THE SOFTWARE please report it, along with the last word successfully conjugated. Otherwise, reinstall the original data files and try the test again to determine if the error was caused by user editing. Switches: -v Test the verbs file Words_Verbs.dat. -a Test the adjectives file Words_Adj.dat. -------------------------------------------------------------------------------- ROOMAJI NOTE ------------ The roomaji used in the program is a modified Nihon roomaji. Nihon is more amenable than the Hepburn or Kunrei systems because there is a (mostly) one-to-one correspondence between the roomaji representation and the kana characters, and because beginning letters do not change across rows. A future version may support other systems though internal translation. One modification made is that direct kana correspondence is made. For instance, output of lengthened vowels looks like "ou" not "oo": "tabemasyou" not "tabemasyoo" (UNLESS a word has "oo" in the kana, as in the adjective "ookii"). This is sometimes called "waapuro roomaji". All input and any words in the rule or dictionary files should follow this modified Nihon system. To make syllables distinct, some other differences may occur in katakana-only syllables. The following is the table of supported roomaji representations for kana characters. If there are omissions of kana that may occur in modern words, please inform the author. Note that in parsing from the rules files any roomaji equivalent not on the following table is likely to crash the program. The user interface traps such mistakes, however, so don't worry about mistyping in the program itself. Note also that these are case sensitive, for instance wo and WO refer to two different kana representations. Refer to the file "kana_system.euc" for a pairing of kana and roomaji. a i u e o ka ki ku ke ko ga gi gu ge go sa si su se so za zi zu ze zo ta ti tu te to da di du de do na ni nu ne no ha hi hu he ho ba bi bu be bo pa pi pu pe po ma mi mu me mo ya yu yo ra ri ru re ro wa wi we wo n kya kyu kyo gya gyu gyo sya syu syo zya zyu zyo tya tyu tyo nya nyu nyo hya hyu hyo bya byu byo pya pyu pyo mya myu myo rya ryu ryo katakana only: ye WI WE WO kwa kwi kwe kyo gwa gwi gwe gyo sye zye tye tji tju twu dji dju dwu dya dyu dyo tsa tsi tse tso fa fi fe fo va vi vu ve vo vyu katakana vowel elongation bar is a hyphen: - katakana word separator dot is a single quote: ' AVAILABLE INFLECTION CASES -------------------------- The inflection cases currently supported by the tool are named as follows. Remember that these are case-sensitive with respect to the -t switch of the conj command. Indicative* Imperative Presumptive I Presumptive II Provisional Gerund Past Indicative* Past Presumptive I Past Presumptive II Conditional Alternative Passive Potential Causative Causative Passive Honorific Humble Progressive Easy to ~ Cases marked by * are also currently supported for adjectives. Inflection cases can be added or altered by careful editing of the rules files. Documentation will be made available at a later date, however, there are some notes at the bottom of these files the user may find instructive. NOTE: The program may produce some inflections even when they make no sense. For some regular verbs, some of the inflections may not ever be used, even though the rule can be applied to form them. A blocking rule can be added to the respective rule file to prevent a given verb or adjective from conjugating into a specific tense, even if it is otherwise regular. A blocking rule overrides any other rule that is applicable to a given word for a given inflection. CONJUGATION RESULTS ------------------- Results are displayed in 80-column lines and each separate rule that is applied produces output showing the conjugated form, the description of the inflection case, the part of speech of the inflected result, and the English pseudo-translation. For the EUC output option, kanji and kana inflections are also included. Format: -------------------------------------------------------------------------------- (kanji inflection) (kana inflection) (roomaji inflection) (inflection description) (Part of Speech) (English translation) -------------------------------------------------------------------------------- Example: -------------------------------------------------------------------------------- (kanji) (kana) tabemasu Negative Polite Indicative (Verb Inflec.) X does not [do], X will not [do] -------------------------------------------------------------------------------- The part of speech displayed for the result is one of: (Itidan) an itidan verb (Godan) a godan verb (Suru Aux) a verb that conjugates using the auxiliary "suru" (Irreg.) an irregular verb (or verb composed with an irregular auxiliary like "suru") (Irreg. Itidan) a mostly regular itidan verb with some irregularities (Irreg. Godan) a mostly regular godan verb with some irregularities (Irreg. Suru) a suru verb which has godan-like conjugations in some tenses (like "aisuru") (True Adj.) an -i (conjugable) adjective Any of the above results may be reprocessed to give further conjugations. The remaining three are terminal results: (Verb Inflec.) an inflected form (e.g. tabemasu). Note that a result such as "taberu" for the plain indicative is considered an inflection and is not capable of being reprocessed. (Quasi Adj.) an adjectival noun, or "-na adjective". (Noun) a noun NOTE: I have marked negative verb forms (such as "nomanai", "nomanakatta", etc.) as "verb inflections" even though I realize they are adjectival forms. This is done because their conjugations are (supposed to be) fully specified within the system, so they can be thought of as the negative inflection of the verb for a particular tense, rather than inflections of nai. It is also less confusing not to have to go through the negative indicative to get at the negative conditional, etc. However, the system is customizable by careful editing of the rules file, so that the negative indicative could be reported back and subsequently reprocessed as an adjective, if that is the way the user wants to do it. Or the user may manually send the -nai form into the adjective conjugator. In either case, however, the program will also produce negative forms from the negative, which it ain't got no business doing. ATTITUDE AND COLLOQUIALITY MARKERS ---------------------------------- In addition to sense, politeness level and part of speech, words in J_Inflect can be marked by attitude and colloquiality. Attitude refers to words which are honorific, humble or neither (neutral). Colloquial refers to words which may fall outside of formally correct language but are nevertheless in use. Both of these markers are "sticky" in the sense that once a word is inflected to a form that produces a non-neutral attitude or a colloquial result, words conjugated from those results retain those qualities. For example, if a verb is conjugated to the humble inflection and one of the results is subsequently conjugated to the affirmative polite indicative, that result will be marked as a humble affirmative polite indicative. There is no filtering switch for attitude as for politeness level, because there is only one inflectional form that produces each attitude. ENGLISH TRANSLATIONS -------------------- For all words that can be found in one of the two dictionary files, dynamic English translations can be provided for each of the inflected forms. Provided translations are a loose guideline, and may not be appropriate to all situations. Furthermore, the author is not an expert and not even remotely experienced with some of these tenses, so some of the translations may have room for improvement. If the user can suggest better alternatives, please e-mail them to the address at the top of this document. Translations will particularly not be accurate when a verb is reprocessed, e.g. in converting tabemasu to taberareru, then to taberaremasu, the English translation will still reflect the basic indicatives, (X eats) not potentialized indicatives (X can eat). The user must bear this in mind. A later version will give some indication that the translation should be read based on a previous inflection. English translations for verbs have been made using X as a generic subject placeholder. When conjugating verbs for which there is no dictionary entry, some form of "to do" in brackets appears in the translations. For a given verb, replace the bracketed expression with the equivalent form of the verb you are conjugating to get the English meaning. For example, oyogu (to swim) with the present indicative translation of "X [does], X will [do]" for oyogimasu gives "X swims, X will swim", of course modified appropriately for agreement with whatever the English subject is (e.g. "I swim"). For adjectives, the translation is some form of "to be" with [X] representing where the English adjective should be substituted (and is dynamically substituted for words appearing in the adjective dictionary file). Note that dynamic translations can only be provided if several additional fields appear for entries in the dictionaries that have not been provided in the current version. For verbs, this includes the infinitive (without "to"), 3rd person present (-s form), present participle (-ing form), plain past, and past participle (-ed form) of the primary English translation of the verb. These are directly substituted for [do], [does], [doing], [did], and [done] in the translation template, respectively. For adjectives, a single additional field in the dictionary specifies the primary English translation to be substituted for [X] in the rules file. The dictionary files are tab-delimeted (NOT whitespace- delimited) files. To include the additional entries, use an editor that won't break the lines or mangle the EUC codes and use tabs between entries and between the end of the current line and the start of the new fields. In the case of verbs, if you include the fields you must include all of them, or else the program will likely crash trying to read the file. Do not place a tab at the end of the last field. The author will work on supplying files that have this information included, but this is a lot of manual work and will take time. DATA FILES ---------- All data files may be edited by the user to allow the system to produce desired results, add new inflection cases, and correct oversights in the data preparation. Full format information and examples will be provided at a later time, although some of the files contain some information at the bottom, below the lines of data. Be aware that improper editing can result in program lockups, and possibly crashes. Note that all formats use numbers to indicate how many records are available to read. If you add or delete records, be sure to change the corresponding number. Also, while the user interface will trap errors in roomaji input, the modified Nihon system AS PRESENTED ABOVE _must_ be used in data files or a lockup or crash could result. All of the files, with the exception of Rules_VClass.dat contain EUC encoded text, so the user should be careful to only open these files with an editor that won't mangle it. If changes are to be made, such Microsoft tools as NotePad, WordPad and MS Word are NOT good choices, since Microsoft has unilaterally decided that everything should be converted to Unicode. Note that specifications for auxiliary-based words should write the roomaji string without any space between the main word(s) and the auxiliary. For instance, always "benkyousuru" and never "benkyou suru". The program will automatically recognize either one from the command line, but only if the former is what appears in the files. Following are the five data files (.dat extension) currently used by J_Inflect. Rules_Verbs Specifies rules for conjugating from the dictionary form to many different inflection cases. Rules are classified based on inflection case, sense, politeness, irregularity characteristics, etc, allowing the inflection engine to select appropriate ones for the verb under consideration. Rules_VClass Classifies verbs by certain characteristics useful to the system. Allows these verbs to be classified into the correct conjugation pattern. NOTE: The verb "suru", by itself, should never, under any circumstances, appear in this file. Suru is fully treated in the internal code since its function, not unlike "to do" in English, is so integral to Japanese. Rules_Adj Specifies rules for conjugating -i adjectives to many different inflection cases. Rules are classified based on inflection case, sense, politeness, etc, allowing the inflection engine to select appropriate ones for the adjective under consideration. Words_Verbs Optional dictionary file containing a listing of verbs with kanji, kana reading, part of speech indicator (i.e. to select proper conjugation), inherent attitude and English meaning. It also can contain information needed for dynamic English translation. This file is necessary for kanji/kana inflection file output and for reverse conjugation. If the file is available, input words are screened against the dictionary, and available matches are presented to the user for a selection. If there are no matches, or if the user chooses, the automatic classifier routine is used instead to select the conjugation (and any file output is kana only). The Words_Verbs.dat file provided with the software was constructed using data from the EDICT file created by Jim Breen. See the copyright statements at the top and bottom of this document. Words_Adj Optional dictionary file containing a listing of adjectives with kanji, kana reading, roomaji, and English meaning. It also can contain information needed for dynamic English translation. This file is necessary for kanji/kana inflection file output and for reverse conjugation. If the file is available, input words are screened against the dictionary, and available matches (if more than one) are presented to the user for a selection. The Words_Adj.dat file provided with the software was constructed using data from the EDICT file created by Jim Breen. See the copyright statements at the top and bottom of this document. -------------------------------------------------------------------------------- APPENDIX -------- MONASH UNIVERSITY SCHOOL OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING ELECTRONIC DICTIONARY RESEARCH AND DEVELOPMENT GROUP GENERAL DICTIONARY LICENCE STATEMENT Copyright (C) 2000 The Electronic Dictionary Research and Development Group, Monash University. Introduction In March 2000, James William Breen assigned ownership of the copyright of the dictionary files assembled, coordinated and edited by him to the The Electronic Dictionary Research and Development Group at Monash University (hereafter "the Group"), on the understanding that the Group will foster the development of the dictionary files, and will utilize all monies received for commercial use of the files for the further development of the files, and for research into computer lexicography and electronic dictionaries. This document outlines the licence arrangement put in place by The Group for usage of the files. It replaces all previous copyright and licence statements applying to the files. Application This licence statement and copyright notice applies to the following dictionary files, the associated documentation files, and any data files which are derived from them. JMDICT - Japanese-Multilingual Dictionary File EDICT - Japanese-English Electronic DICTionary File ENAMDICT - Japanese Names File COMPDIC - Japanese-English Computing and Telecommunications Terminology File KANJIDIC - File of Information about the 6,355 Kanji in the JIS X 0208 Standard (special conditions apply) KANJD212 - File of Information about the 5,801 Supplementary Kanji in the JIS X 0212 Standard Copyright over the documents covered by this statement is held by James William BREEN and The Electronic Dictionary Research and Development Group at Monash University. Usage, Copying and Distribution a.General Permission Any person or organization in possession of a copy of any of the files covered by this statement, whether they have received the copy via free distribution or purchase: i. may use the file for personal purposes such as to assist with reading texts, research, translation services, etc. ii. may, subject to the restriction detailed below, supply extracts or small portions of the files to other persons or organizations in the form of written documents, electronic mail, etc. Such an activity may result in a fee, e.g. in the case of professional translation services. iii. must as part of the receiving of the copy undertake to be bound by all the conditions below relating to the distribution of the files. b. Free-of-charge distribution Permission is granted to make and distribute verbatim copies of these files provided the full documentation of the files and this copyright notice and permission notice is distributed with all copies. Any distribution of the files must take place without a financial return, except a charge to cover the cost of the distribution medium. Permission is granted to place copies of these files on WWW and ftp servers for subsequent distribution under the same conditions listed above. Permission is granted to make and distribute extracts or subsets of the EDICT file under the same conditions applying to verbatim copies. Permission is granted to translate elements of the files into other languages, and to make and distribute copies of those translations under the same conditions applying to verbatim copies. c. Incorporation into Free Software and Server Systems Permission is granted to use these files as part of software which is distributed free-of-charge, provided all the conditions associated with free-of-charge distribution as described above are met, and provided that full acknowledgement of the source of the files is made in the software documentation. Permission is granted to incorporate the files into any WWW or similar server system which operates without charge or commercial advertising, provided all the conditions associated with free-of-charge distribution as described above are met. In addition, full acknowledgement of the source of the files must be prominently displayed on the server, and links must be made available on the "front" page of the particular server to the full documentation and copyright statements of the files. d. Shareware Software Permission is granted to use these files as part of software which is distributed on a "shareware" basis, and for which a charge of no more than $US100 is levied, on the following conditions: i. the distributors of the software notify the Group prior to the release of the package. ii. the distributors undertake to remit to the Group 10% of the gross receipts for the software, or such other amount agreed by the Group prior to the release of the software. The remittance is to take place at least once each 3 months. iii. the full documentation of the files and this copyright notice and permission notice is distributed with all copies. iv. full acknowledgement of the source of the files is made in the software documentation. e. Commercial WWW Servers The contents of these file must not be made available via WWW services for which a charge is levied, or which make use of commercial advertising without the prior permission of the Group. Permission for such usage will normally be granted in return for a fee based on a small proportion of either the advertising revenue or the subscription charges. The other conditions applying to free WWW servers also apply to commercial WWW servers. f. Linux Operating System Permission is granted for these files to be included free-of-charge in distributions of the GNU/Linux operating system and associated utilities and packages, e.g. the Red Hat, Debian, Turbo-Linux, etc. distributions, provided the conditions stated above under "Incorporation into Free Software and Server Systems" are met. g. Commercial Software Packages The contents of these files must not be incorporated into a software package that is sold commercially or used exclusively within a company or organization without the prior permission of the Group. Permission will normally be granted for a non-exclusive licence to incorporate the files in return for either a one-off payment to The Group or a royalty on sales. The other conditions applying Shareware software apply. h. Published Dictionaries The contents of these files must not be published on paper or other media such as CD-ROM without the prior permission of the Group. Prior Permission All permissions for use of the files granted by James William Breen prior to March 2000 will be honoured and maintained, however the placing of the KANJD212 and EDICTH files under the GNU GPL has been withdrawn as of 25 March 2000. Enquiries All enquiries to: The Electronic Dictionary Research and Development Group (Attn: Assoc. Prof. Jim Breen) School of Computer Science and Software Engineering Monash University CLAYTON VIC 3168 Australia (jwb@csse.monash.edu.au) [NOTE: The portion of this license statement pertaining to KANJIDIC has been omitted, since it does not pertain to J_Inflect which makes no use of, nor distributes any portion of that file. At the time of this release, the full text may be found at: http://www.csse.monash.edu.au/groups/edrdg/newlic.html]