J_Inflect
Development Release 0.3.6 - July 21, 2000
Japanese Verb and Adjective Inflection/Analysis Engine

by Collin McCulley (cmmcculley@charter.net)

--------------------------------------------------------------------------------

Special thanks to everyone on sci.lang.japan who has
contributed to my understanding of Japanese and
in particular of verbs for the purpose of this project,
including, but not limited to Gerald (Bart) Mathias and
Sho Yamazaki.

Particular thanks to Jim Breen, for creating and providing
the EDICT file, which forms the basis of the data distributed
with this software.

Also thanks to the following people who have contributed
suggestions, corrections, or bug reports:
   Nicolas Simonet

--------------------------------------------------------------------------------


This code is a work in progress and has been made available
for comment by potential users.  Please send me questions,
suggestions, bugs, content corrections, etc.  In particular,
I have no Japanese language support on my Macintosh so
feedback on problems with the EUC output would be helpful.

The current working version will be kept available at:
   http://www.epochrypha.com/japanese/


WHAT IT IS & WHAT IT SHALL BE
-----------------------------
J_Inflect runs as a command-line program from a terminal
emulator for either Windows or Mac.  (The terminal emulator
is built-in to the program, not something separate that the
user needs.)  [See the Special Note for Windows Users below]
   It is my intent to shore up the inner workings and data
structures, and then build a GUI face to this program (though
I will happily leave it in command-line form for unix-types).
A web interface at a later date is also a possibility.  The
software and its command-line interface is written in 100%
ANSI C++ (with the exception of a small amount of code
controlling the SIOUX console environment).
   I will decide at a later time if it becomes slick enough
to warrant a small shareware fee, but that's long term, and
will only apply to GUI versions.
   Anything I release prior to that is freeware.


SPECIAL NOTE FOR WINDOWS USERS
------------------------------
The PC/Win version of this program has been recast as a
console (DOS-Window) application in order to make this
release available while problems are worked out with the
Metrowerks WinSioux environment.  It is suggested that
under the "Layout" tab in the Console control panel you
set both width values to 85.  Set the window height value
to make the window as tall as possible while still fitting
on your screen.  Set as large a buffer height as is
comfortable (500 lines or more) to enable the console
scroll bar and enable you to see results that scroll off
the screen.  That being done, you should be able to
double-click the application (.exe) file, as usual to
start a console window with J_Inflect.


WHAT IT CAN DO
--------------
Currently J_Inflect can:

*  Forward conjugate Japanese verbs and adjectives into
   many different inflectional forms.  

*  Reverse conjugate an inflected form, matching it
   against possible candidates from a dictionary file.

Inflectional forms are defined in rules files which the
user can edit to add new inflection cases, or change current
cases to suit preferences.  Results can be output with
inflected kanji and kana to an EUC text file.
   For words listed in the dictionary files with certain
additional information fields, fully realized English
translations can be provided.  For instance, "tabemasita"
would be listed with the translation "I ate".   The copula
can be conjugated by specifying "da" or "desu" to the
verb conjugator.
   J_Inflect, for the most part, will automatically
recognize the type of verb given to it (as long as it's
in dictionary form) and choose the correct conjugation
for it.  This includes godan verbs ending in -iru or -eru,
however, if any have been missed, the user can force the
conjugation to follow the desired type.  The user can also
add the omitted verb to Rules_VClass.dat, or add an entry
to the Words_Verbs.dat dictionary file.
   The two standard irregular verbs kuru and suru are
supported, as well as irregularities in verbs such as iku,
gozaru, kureru, aru, etc.  
   Verbs that conjugate with suru are supported, although
for Sino-Japanese single-kanji verbs that conjugate with
suru but have godan-like conjugations in certain tenses,
irregularity rules may not be complete or correct.  In the
current version the ruleset contains exceptions for the
potential and for all -nai forms.  The user may edit the
file Rules_VClass.dat to add verbs for which the anomalous
conjugation should apply, or add appropriately typed entries
to the Words_Verbs.dat dictionary file.   


PLANNED FUNCTIONALITY
---------------------

* Verb and adjective dictionary lookup: J -> E and E -> J.
* Verb analysis and dynamic rule explanation (i.e. "Tell me
  how you got that answer.")
* Learning drills and exercises.

   Also, as stated above this software has a command-line-style
interface, running in a console-emulator on Windows or Mac,
or directly compiled for the command line on a Unix/Linux
machine. While the current version supports writing kanji
and kana inflected forms to an EUC coded file, the GUI
version will have support for on-screen kanji and kana
display.


DISCLAIMER
----------
   This software is provided AS IS.  NO WARRANTIES are
granted, either expressed or implied, and NO GUARANTEES
are made for merchantability or fitness for a particular
purpose.  The author shall not be liable for any special,
incidental, consequential or punitive damages arising out of
the installation, performance or use of this software.
   The author makes no claim of expertise in the
Japanese language, and has written this as part of a
learning exercise and desire to have a learning tool.
Follow the information produced by this tool at your
own risk.


COPYRIGHT
---------
   This software and its accompanying data files, except for
Words_Verbs.dat and Words_Adj.dat are COPYRIGHT (c) 2000 by 
Collin McCulley.
   This current version is provided free of charge and may
be freely distributed by private individuals by any means so
long as no money is charged, but must remain in its original
form and must be accompanied by this document.  It may not be
included in freeware/shareware archives of any type or be
distributed by commercial enterprises without my written
permission.
   The Words_Verbs.dat and Words_Adj.dat files use information
extracted from EDICT, and as such their content is COPYRIGHT
(c) 2000 by The Electronic Dictionary Research and Development
Group, Monash University, Clayton, Victoria, Australia. This
information is used under the terms and conditions specified in
the Electronic Dictionary Research and Development Group General
Dictionary Licence Statement, the current copy of which may be
available at: http://www.csse.monash.edu.au/groups/edrdg/newlic.html.
The version of this license statement in force at the time of
this release is the appendix of this document.

--------------------------------------------------------------------------------


BASICS OF THE COMMAND LINE INTERFACE
------------------------------------
   The command line interface is provided by the author's
CLAM_Shell library.  Following is a general description of
how command syntax works, but the reader way want to skip
down and just read over the J_Inflect commands and
examples.


The CLAM_Shell command-line interface uses the following
syntax:

command_keyword [switches] [required_arguments] [optional_arguments]

   Command keywords are single words with no spaces and are
case sensitive.
   Switches are used to specify or modify the behavior
of the command or provide keyword input.  All switches are
optional and so may be left out of any command. Switches are
preceded by a switch symbol, usually a dash (-), and are
specified by a keyword, usually, but not necessarily, a single
letter.  For example:

      -a

Defined switches may be included in a command in any order.

Single-letter switches which do not require arguments may
be appended to a single switch symbol, for example if "a",
"b", and "c" are individual switches which do not require
arguments, the following is valid, and tells the parser to
apply all three switches to the command.

      -abc

A switch may require an argument, or one may be optional.
A switch which requires an argument, must appear alone after
the switch symbol and its argument must directly follow. A
switch which can optionally take an argument can appear with
other switches only when an argument is not supplied.  A
switch which can take an optional argument appearing alone
will assume that the next non-switch item in the command
string, if any, belongs to itself.  For example, if "a"
requires an argument, "b" can take an optional argument,
and "c" cannot have an argument:

   -a <arg> -bc               (OK)
   -ab                        (Wrong: -a must appear alone.)
   -a -c                      (Wrong: -a requires an argument.)
   -b <arg> -c                (OK)
   -a <arg1> -b <arg2>        (OK, but note <arg2> belongs to -b,
                               and is not a command argument.)
   -c <arg>                   (OK, but <arg> is the first
                               required/optional command argument
                               and doesn't belong to -c.)
   -c <arg> -b                (Probably wrong: -b will parse as
                               a command argument, not a switch
                               because all switches must come first.)

A command argument is identified by its position in the command string,
and must come after all switches.  Any item in the command string
after a non-switch (i.e. anything not preceded by a switch symbol)
has appeared is assumed to be an argument.

Any required arguments must come first, in order, followed by
any optional arguments.  To use a given optional argument, any
optional arguments prior to it must also be supplied.

Switch arguments and command arguments may contain
no spaces unless surrounded by quotes.  Without quotes, white space
delimits arguments. Anything in quotes is assumed to comprise a
single argument.
--------------------------------------------------------------------------------

J_INFLECT COMMANDS
--------------------

conj   Conjugates a supplied verb or adjective.

Syntax:     conj [[-v <verb>]|[-a <adj>]] [-s <sense>] [-p <politeness>]
                 [-i|-g|-rs] [-f <filename>]
                 [[-m <numlist>]|[-t <inflection-type>]]

Switches:
   -v  Conjugate a verb.  Requires an argument that is the
       dictionary form (i.e. plain indicative, or -u form).
       Must be in lowercase Nihon (waapuro-style) roomaji.

   -a  Conjugate a true (-i) adjective.  Adjective must be
       in the (-i) form.
       Must be in lowercase Nihon (waapuro-style) roomaji.
 
   Note: Either -v or -a may be specified, but not both.  If
         neither is specified, the command will produce output
         for the current word. A current word is the last word
         conjugated, or if a single result is produced from a
         conjugation, the last result.  The system indicates when a
         current word available.
            The current word may be sent back for further conjugation
         if applicable (using the previous word or the result of the
         last operation was a verb or true adjective), and not all
         conjugations may be available.  For instance, the program
         tries to stop nonsense like "taberarerareraremasu" (repeated
         conjugation using the potential inflection), but this may not
         be bug-free or complete.  In the case of sending a result
         back in for a nonapplicable conjugation, the system will
         tell you there is no rule match.
   
   -s  Limit to a given sense.  The sense is "affirmative" or
       "negative" (e.g. "I did" vs. "I didn't", respectively).
       Requires an argument that is any amount of the words
       "affirmative" or "negative" (e.g. -s neg or -s n will work).
   
   -p  Limit to a given politeness level.  Politeness levels are
       defined as "abrupt", "plain", or "polite". This switch requires
       an argument which is any non-ambiguous amount of one of
       these words (e.g. -p pl or -p pol will work, but -p p will not).
   
   Note: Not all inflections have cases for all senses or
         politeness levels, particularly: "abrupt" applies only
         to the imperative.  Furthermore, results which do not
         have an inherent sense or politeness level (such as
         inflection cases which produce another verb) are screened
         out when applying these switches.
            Leaving out the switches -s or -p will show all
         available senses or politeness levels, respectively.
   
   -f  Output the result of the conjugation to an EUC-encoded text
       file.  This output will include kanji and kana inflections
       in addition to what is displayed on the screen.  Supplying
       the filename is an optional argument to this switch. If no
       filename is specified, the filename will be the same as the
       roomaji, suffixed by ".euc".
          Output to this file is formatted for 80-column
       monospaced text, so depending on the viewing environment
       the user will probably want to reformat to a 9-point
       Courier font for proper viewing.

   -i  Force regular itidan conjugation.
   
   -g  Force regular godan conjugation.

   -rs Force "regular" suru conjugation.  The program recognizes
       that certain Sino-Japanese verbs using the auxiliary -suru
       have several irregularities in their conjugation, i.e. in
       certain inflections they conjugate as a root+su godan.  For
       roomaji spellings which may be one of these verbs, the
       program may also trap homophonic verbs that do not have
       these characteristics, unless they are explicitly specified
       in the dictionary file.
          This switch allows the user to force the conjugation
       back to the normal suru pattern.  It can only be used with
       verbs that end in suru in the first place.

   Note: Switches -i, -g and -rs are mutually exclusive and cannot
   be used together.  Additionally, they can only be used when switch
   -v is used, or a conjugable verb is the current word.
      As far as the -i and -g switches, the inflection engine
   automatically detects verb type, including irregular verbs, and
   verbs with occasional irregularities, and should typically not be
   overridden except in cases where a verb in dictionary form could
   work with either conjugation (e.g. "kiru"). One of these switches
   _must_ be specified in such a case, unless a match is found and
   chosen from the dictionary file.
      The -i and -g switches may also be useful if the user has found
   an -iru/-eru godan that has not already been specified in the system,
   or the system does not recognize that such a verb could also be
   itidan. In either case, please report this to the author so it may
   be corrected.  You can also self-correct this condition by
   editing the Rules_VClass.dat file, or explicitly adding cases to
   the Words_Verbs.dat file.
      Specifying one of the forcing switches will bypass dictionary
   lookup (also disabling kanji/kana output for the result).

   -m  Present a menu of available inflection cases and allow the user
       to choose which is used.  Takes an optional argument that is a
       list delimited by vertical bars (|) which correspond to
       choices from the menu.  If specified, the menu will be skipped
       and the named choices will automatically be used.  If no argument
       is specified, the menu of cases is presented.  The user may
       enter one or more choices from the menu (delimited by whitespace).
         In either case, menu choices may be specified in any order, and
       results from inflection cases will be output in that order.

   -t  Directly specify the inflection case to be used.  The argument
       must be the exact, case-sensitive name of the inflection case
       as it exists in the rules file: first letters capitalized.
       Put multi-word names in quotes (e.g. "Past Indicative").

   Note: Switches -m and -t are mutually exclusive. Only one should be
         used for a given command.


Usage Examples:

   conj -v taberu
      Conjugates taberu, using results from all applicable
      rules.

   conj -a oisii -s aff
      Conjugates adjective oisii, using all rules producing
      affirmative senses.

   conj -v kiku -s neg -p pol
      Conjugates kiku, using all rules producing negative senses
      at the normal polite level of speech.

   conj -v hanasu -p pol -t Indicative
      Conjugates hanasu into the indicative and showing only
      normal polite results.

   conj -a uresii -t "Past Indicative"
      Conjugates adjective uresii into the past indicative.

   conj -v iku -m
      Conjugates iku into the conjugation to be selected from
      the menu.

   conj -v iku -m 2|13|19
      Conjugates iku into the conjugation's corresponding to
      inflection cases 2, 13 and 19 from the menu (which is
      not presented).

   conj -v kiru -i
      Conjugates kiru as an itidan verb, and use results from
      all applicable rules.

   conj -v kiru -gmf
      Conjugates kiru as a godan verb into the conjugation to be
      selected from the menu, and outputs the result of the
      conjugation to a file encoded as EUC in addition to
      displaying it on the screen.

   conj -v kiru -s af -p pl -gm
      Conjugates kiru as a godan verb into the conjugation to be
      selected from the menu, only showing results that are
      affirmative and in the plain mode of speech.

   conj -v taberu -t Potential
   conj -p pol -t Indicative
      First command conjugates taberu to the potential, producing
      taberareru, which becomes the current word. The second command
      conjugates taberareru to the polite indicative producing
      taberaremasu and taberaremasen.

----------------------------------------

rconj
   Reverse conjugates a supplied inflected form back
   to the dictionary form, provided that it exists in
   one of the dictionary files.

Syntax:
   rconj [-a] [-b] <inflected_form>

Operation:
   By default this command searches on verbs only.
   Note that the command can take as long as a minute
   depending on the verb it is given.  The author is
   working on efficiency issues and making this routine
   more intelligent.  Note that it will find _every_ word
   in the searched dictionary for which one of the
   available rules will produce the supplied inflection.

Switches:
   -a  Search for a match to an adjective form.

   -b  Search both dictionaries for a match.

Optional Argument:

   inflected_form
      The inflected word for which the user wants to find
      the dictionary form.  If none is specified, the current
      word, if available, is used.  The supplied inflected
      form does not need to match the spacing characteristics
      of the inflected form that would be reported by
      J_Inflect, e.g. "ohanasinasaru" will find "hanasu",
      even though a conjugation of "hanasu" would have
      produced "ohanasi nasaru".


Usage Examples:

   rconj taberarenai
      (finds taberu)

   rconj "ai shimasu"
      (finds aisuru)

   rconj -a oisikunai
      (finds oisii)

   rconj -b tabeyasui
      (finds taberu from the verbs dictionary,
       probably produces no additional hit from the
       adjective dictionary)
     

----------------------------------------

cw
   View or set the current word.

Syntax:
     cw [-h|-r]

Operation:
   If no switch is specified this command simply displays the
   current word.

Switches:
   -h  Set the current word to one of the last 10 words
       that were conjugated.

   -r  Set the current word to one of the words returned as
       the result of the last conjugation.
 
   Note: Either -h or -r may be specified, but not both.


----------------------------------------

dtest
   Execute a comprehensive test of one or both of the
   dictionary files.

Syntax:
     dtest [-v] [-a]

Operation:
   This test can take up to 20 minutes to run for the verbs
   file.  The test involves conjugating every word using
   every applicable rule.  The word's roomaji form, along
   with the result of the last rule executed is displayed.
   Note that failure at some point in the test is likely to
   manifest itself as a program crash (via an unhandled
   exception).  Such failure can result from improper editing
   of dictionary OR rules files as well as bugs remaining in
   the program.  If this test fails for the version of the
   dictionary and rules files AS PROVIDED WITH THE SOFTWARE
   please report it, along with the last word successfully
   conjugated.  Otherwise, reinstall the original data files
   and try the test again to determine if the error was caused
   by user editing.

Switches:
   -v  Test the verbs file Words_Verbs.dat.

   -a  Test the adjectives file Words_Adj.dat.

--------------------------------------------------------------------------------

ROOMAJI NOTE
------------
   The roomaji used in the program is a modified Nihon roomaji.
Nihon is more amenable than the Hepburn or Kunrei systems
because there is a (mostly) one-to-one correspondence between the
roomaji representation and the kana characters, and because
beginning letters do not change across rows.  A future version
may support other systems though internal translation.
   One modification made is that direct kana correspondence is made.
For instance, output of lengthened vowels looks like "ou" not "oo":
"tabemasyou" not "tabemasyoo" (UNLESS a word has "oo" in the kana,
as in the adjective "ookii").
   This is sometimes called "waapuro roomaji". All input and any
words in the rule or dictionary files should follow this modified
Nihon system.  To make syllables distinct, some other differences
may occur in katakana-only syllables.
   The following is the table of supported roomaji representations
for kana characters.  If there are omissions of kana that may
occur in modern words, please inform the author.
   Note that in parsing from the rules files any roomaji
equivalent not on the following table is likely to crash the
program.  The user interface traps such mistakes, however, so don't
worry about mistyping in the program itself.
   Note also that these are case sensitive, for instance wo and
WO refer to two different kana representations.  Refer to the
file "kana_system.euc" for a pairing of kana and roomaji.

   a       i       u       e       o
   ka      ki      ku      ke      ko
   ga      gi      gu      ge      go
   sa      si      su      se      so
   za      zi      zu      ze      zo
   ta      ti      tu      te      to
   da      di      du      de      do
   na      ni      nu      ne      no
   ha      hi      hu      he      ho
   ba      bi      bu      be      bo
   pa      pi      pu      pe      po
   ma      mi      mu      me      mo
   ya              yu              yo
   ra      ri      ru      re      ro
   wa      wi              we      wo
   n
   kya             kyu             kyo
   gya             gyu             gyo
   sya             syu             syo
   zya             zyu             zyo
   tya             tyu             tyo
   nya             nyu             nyo
   hya             hyu             hyo
   bya             byu             byo
   pya             pyu             pyo
   mya             myu             myo
   rya             ryu             ryo

katakana only:

                           ye
           WI              WE      WO
   kwa     kwi             kwe     kyo
   gwa     gwi             gwe     gyo
                           sye
                           zye
                           tye             
           tji                             tju
                   twu
           dji                             dju
                   dwu
   dya             dyu             dyo
   tsa     tsi             tse     tso
   fa      fi              fe      fo
   va      vi      vu      ve      vo      vyu

katakana vowel elongation bar is a hyphen: -
katakana word separator dot is a single quote: '


AVAILABLE INFLECTION CASES
--------------------------
   The inflection cases currently supported by the
tool are named as follows. Remember that these are
case-sensitive with respect to the -t switch of the
conj command.

Indicative*
Imperative
Presumptive I
Presumptive II
Provisional
Gerund
Past Indicative*
Past Presumptive I
Past Presumptive II
Conditional
Alternative
Passive
Potential
Causative
Causative Passive
Honorific
Humble
Progressive
Easy to ~

Cases marked by * are also currently supported for adjectives.

Inflection cases can be added or altered by careful editing
of the rules files.  Documentation will be made available
at a later date, however, there are some notes at the
bottom of these files the user may find instructive.


NOTE:   The program may produce some inflections even when
        they make no sense.  For some regular verbs, some of
        the inflections may not ever be used, even though the
        rule can be applied to form them.  A blocking rule
        can be added to the respective rule file to prevent
        a given verb or adjective from conjugating into a
        specific tense, even if it is otherwise regular. A
        blocking rule overrides any other rule that is applicable
        to a given word for a given inflection.


CONJUGATION RESULTS
-------------------
Results are displayed in 80-column lines and each separate rule
that is applied produces output showing the conjugated form,
the description of the inflection case, the part of speech of the
inflected result, and the English pseudo-translation.  For the
EUC output option, kanji and kana inflections are also included.

Format:
--------------------------------------------------------------------------------
(kanji inflection)   (kana inflection)
(roomaji inflection)     (inflection description)               (Part of Speech) 
(English translation)
--------------------------------------------------------------------------------

Example:
--------------------------------------------------------------------------------
(kanji)              (kana)
tabemasu                 Negative Polite Indicative             (Verb Inflec.) 
X does not [do], X will not [do]
--------------------------------------------------------------------------------

The part of speech displayed for the result is one of:

(Itidan)                 an itidan verb
(Godan)                  a godan verb
(Suru Aux)               a verb that conjugates using the auxiliary "suru"
(Irreg.)                 an irregular verb (or verb composed with an
                         irregular auxiliary like "suru")
(Irreg. Itidan)          a mostly regular itidan verb with some irregularities
(Irreg. Godan)           a mostly regular godan verb with some irregularities
(Irreg. Suru)            a suru verb which has godan-like conjugations in
                         some tenses (like "aisuru")
(True Adj.)              an -i (conjugable) adjective
 
  Any of the above results may be reprocessed to give
further conjugations.  The remaining three are terminal results:

(Verb Inflec.)           an inflected form (e.g. tabemasu).
                         Note that a result such as "taberu" for the plain
                         indicative is considered an inflection and is not
                         capable of being reprocessed.
(Quasi Adj.)             an adjectival noun, or "-na adjective".
(Noun)                   a noun


NOTE:      I have marked negative verb forms (such as "nomanai",
           "nomanakatta", etc.) as "verb inflections" even though
           I realize they are adjectival forms.
              This is done because their conjugations are
           (supposed to be) fully specified within the system, so they
           can be thought of as the negative inflection of the verb
           for a particular tense, rather than inflections of nai.
           It is also less confusing not to have to go through the
           negative indicative to get at the negative conditional, etc.
              However, the system is customizable by careful editing of
           the rules file, so that the negative indicative could be
           reported back and subsequently reprocessed as an adjective,
           if that is the way the user wants to do it.  Or the user
           may manually send the -nai form into the adjective conjugator.
           In either case, however,  the program will also produce
           negative forms from the negative, which it ain't got no
           business doing.


ATTITUDE AND COLLOQUIALITY MARKERS
----------------------------------
   In addition to sense, politeness level and part of speech,
words in J_Inflect can be marked by attitude and colloquiality.
Attitude refers to words which are honorific, humble or neither
(neutral).  Colloquial refers to words which may fall outside of
formally correct language but are nevertheless in use.  Both of
these markers are "sticky" in the sense that once a word is
inflected to a form that produces a non-neutral attitude or a
colloquial result, words conjugated from those results retain
those qualities.  For example, if a verb is conjugated to the 
humble inflection and one of the results is subsequently conjugated
to the affirmative polite indicative, that result will be marked as
a humble affirmative polite indicative.
   There is no filtering switch for attitude as for politeness
level, because there is only one inflectional form that produces
each attitude.


ENGLISH TRANSLATIONS
--------------------
   For all words that can be found in one of the two dictionary
files, dynamic English translations can be provided for each
of the inflected forms.  Provided translations are a loose
guideline, and may not be appropriate to all situations.
Furthermore, the author is not an expert and not even remotely
experienced with some of these tenses, so some of the translations
may have room for improvement.  If the user can suggest better
alternatives, please e-mail them to the address at the top of
this document.
   Translations will particularly not be accurate when a verb is
reprocessed, e.g. in converting tabemasu to taberareru, then to
taberaremasu, the English translation will still reflect the basic
indicatives, (X eats) not potentialized indicatives (X can eat).
The user must bear this in mind.  A later version will give some
indication that the translation should be read based on
a previous inflection.
   English translations for verbs have been made using X as
a generic subject placeholder.  When conjugating verbs for
which there is no dictionary entry, some form of "to do" in
brackets appears in the translations. For a given verb, replace
the bracketed expression with the equivalent form of the verb
you are conjugating to get the English meaning.
   For example, oyogu (to swim) with the present indicative
translation of "X [does], X will [do]" for oyogimasu gives
"X swims, X will swim", of course modified appropriately for
agreement with whatever the English subject is (e.g. "I swim").
   For adjectives, the translation is some form of "to be" with
[X] representing where the English adjective should be
substituted (and is dynamically substituted for words appearing
in the adjective dictionary file).
   Note that dynamic translations can only be provided if several
additional fields appear for entries in the dictionaries that have
not been provided in the current version.  For verbs, this includes
the infinitive (without "to"), 3rd person present (-s form), present
participle (-ing form), plain past, and past participle (-ed form)
of the primary English translation of the verb. These are directly
substituted for [do], [does], [doing], [did], and [done] in the
translation template, respectively. For adjectives, a single
additional field in the dictionary specifies the primary English
translation to be substituted for [X] in the rules file.
	The dictionary files are tab-delimeted (NOT whitespace-
delimited) files.  To include the additional entries, use an editor
that won't break the lines or mangle the EUC codes and use tabs
between entries and between the end of the current line and the
start of the new fields.  In the case of verbs, if you include the
fields you must include all of them, or else the program will likely
crash trying to read the file.  Do not place a tab at the end of
the last field.
	The author will work on supplying files that have this
information included, but this is a lot of manual work and will
take time.


DATA FILES
----------
   All data files may be edited by the user to allow the system
to produce desired results, add new inflection cases, and
correct oversights in the data preparation.
   Full format information and examples will be provided at a
later time, although some of the files contain some information
at the bottom, below the lines of data.
   Be aware that improper editing can result in program
lockups, and possibly crashes.  Note that all formats use
numbers to indicate how many records are available to read.  If
you add or delete records, be sure to change the corresponding
number.  Also, while the user interface will trap errors in
roomaji input, the modified Nihon system AS PRESENTED ABOVE
_must_ be used in data files or a lockup or crash could result.
   All of the files, with the exception of Rules_VClass.dat
contain EUC encoded text, so the user should be careful to only
open these files with an editor that won't mangle it.  If
changes are to be made, such Microsoft tools as NotePad,
WordPad and MS Word are NOT good choices, since Microsoft has
unilaterally decided that everything should be converted to
Unicode.
   Note that specifications for auxiliary-based words should
write the roomaji string without any space between the main
word(s) and the auxiliary.  For instance, always "benkyousuru"
and never "benkyou suru".  The program will automatically
recognize either one from the command line, but only if the
former is what appears in the files.
   Following are the five data files (.dat extension) currently
used by J_Inflect.

Rules_Verbs     Specifies rules for conjugating from the dictionary
                form to many different inflection cases.  Rules
                are classified based on inflection case, sense,
                politeness, irregularity characteristics, etc, allowing
                the inflection engine to select appropriate ones for
                the verb under consideration.

Rules_VClass    Classifies verbs by certain characteristics useful
                to the system.  Allows these verbs to be classified
                into the correct conjugation pattern.
                NOTE: The verb "suru", by itself, should never,
                under any circumstances, appear in this file.  Suru
                is fully treated in the internal code since its
                function, not unlike "to do" in English,  is so
                integral to Japanese.

Rules_Adj       Specifies rules for conjugating -i adjectives
                to many different inflection cases.  Rules
                are classified based on inflection case, sense,
                politeness, etc, allowing the inflection engine to
                select appropriate ones for the adjective under
                consideration.

Words_Verbs     Optional dictionary file containing a listing of 
                verbs with kanji, kana reading, part of speech
                indicator (i.e. to select proper conjugation),
                inherent attitude and English meaning. It also can
                contain information needed for dynamic English
                translation.  This file is necessary for kanji/kana
                inflection file output and for reverse conjugation.
                   If the file is available, input words are screened 
                against the dictionary, and available matches are presented
                to the user for a selection. If there are no matches, or if
                the user chooses, the automatic classifier routine is
                used instead to select the conjugation (and any file output
                is kana only).
                   The Words_Verbs.dat file provided with the software
                was constructed using data from the EDICT file created
                by Jim Breen.  See the copyright statements at the top
                and bottom of this document.

Words_Adj       Optional dictionary file containing a listing of
                adjectives with kanji, kana reading, roomaji, and
                English meaning. It also can contain information needed
                for dynamic English translation.  This file is necessary
                for kanji/kana inflection file output and for reverse
                conjugation.
                   If the file is available, input words are screened
                against the dictionary, and available matches (if more
                than one) are presented to the user for a selection.            
                   The Words_Adj.dat file provided with the software
                was constructed using data from the EDICT file created
                by Jim Breen.  See the copyright statements at the top
                and bottom of this document.

--------------------------------------------------------------------------------

APPENDIX
--------

MONASH UNIVERSITY  SCHOOL OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING
ELECTRONIC DICTIONARY RESEARCH AND DEVELOPMENT GROUP
GENERAL DICTIONARY LICENCE STATEMENT

Copyright (C) 2000 The Electronic Dictionary Research and Development
Group, Monash University. 

Introduction 

In March 2000, James William Breen assigned ownership of the copyright
of the dictionary files assembled, coordinated and edited by him to
the The Electronic Dictionary Research and Development Group at Monash
University (hereafter "the Group"), on the understanding that the Group
will foster the development of the dictionary files, and will utilize
all monies received for commercial use of the files for the further
development of the files, and for research into computer lexicography
and electronic dictionaries. 

This document outlines the licence arrangement put in place by The
Group for usage of the files. It replaces all previous copyright and
licence statements applying to the files. 

Application 

This licence statement and copyright notice applies to the following
dictionary files, the associated documentation files, and any data
files which are derived from them. 

   JMDICT - Japanese-Multilingual Dictionary File 

   EDICT - Japanese-English Electronic DICTionary File 

   ENAMDICT - Japanese Names File 

   COMPDIC - Japanese-English Computing and Telecommunications
             Terminology File 

   KANJIDIC - File of Information about the 6,355 Kanji in the
              JIS X 0208 Standard (special conditions apply) 

   KANJD212 - File of Information about the 5,801 Supplementary
              Kanji in the JIS X 0212 Standard 

Copyright over the documents covered by this statement is held
by James William BREEN and The Electronic Dictionary Research and
Development Group at Monash University. 

Usage, Copying and Distribution 

a.General Permission 

Any person or organization in possession of a copy of any
of the files covered by this statement, whether they have received
the copy via free distribution or purchase: 

   i.   may use the file for personal purposes such as to assist
        with reading texts, research, translation services, etc. 

   ii.  may, subject to the restriction detailed below, supply
        extracts or small portions of the files to other persons or
        organizations in the form of written documents, electronic mail,
        etc. Such an activity may result in a fee, e.g. in the case
        of professional translation services. 

   iii. must as part of the receiving of the copy undertake to be
        bound by all the conditions below relating to the distribution
        of the files. 

b. Free-of-charge distribution 

Permission is granted to make and distribute verbatim copies
of these files provided the full documentation of the files
and this copyright notice and permission notice is distributed
with all copies. Any distribution of the files must take place
without a financial return, except a charge to cover the cost
of the distribution medium. 

Permission is granted to place copies of these files on WWW and
ftp servers for subsequent distribution under the same conditions
listed above. 

Permission is granted to make and distribute extracts or subsets
of the EDICT file under the same conditions applying to verbatim
copies. 

Permission is granted to translate elements of the files into other
languages, and to make and distribute copies of those translations
under the same conditions applying to verbatim copies. 

c. Incorporation into Free Software and Server Systems 

Permission is granted to use these files as part of software which
is distributed free-of-charge, provided all the conditions associated
with free-of-charge distribution as described above are met, and
provided that full acknowledgement of the source of the files is
made in the software documentation. 

Permission is granted to incorporate the files into any WWW or
similar server system which operates without charge or commercial
advertising, provided all the conditions associated with
free-of-charge distribution as described above are met. In addition,
full acknowledgement of the source of the files must be prominently
displayed on the server, and links must be made available on the
"front" page of the particular server to the full documentation and
copyright statements of the files. 

d. Shareware Software 

Permission is granted to use these files as part of software which
is distributed on a "shareware" basis, and for which a charge of no
more than $US100 is levied, on the following conditions: 

   i.   the distributors of the software notify the Group prior to the
        release of the package. 

   ii.  the distributors undertake to remit to the Group 10% of the
        gross receipts for the software, or such other amount agreed
        by the Group prior to the release of the software. The remittance
        is to take place at least once each 3 months. 

   iii. the full documentation of the files and this copyright notice
        and permission notice is distributed with all copies. 

   iv. full acknowledgement of the source of the files is made in the
       software documentation. 

e. Commercial WWW Servers The contents of these file must not be made
   available via WWW services for which a charge is levied, or which
   make use of commercial advertising without the prior permission of
   the Group. Permission for such usage will normally be granted in
   return for a fee based on a small proportion of either the advertising
   revenue or the subscription charges. The other conditions applying to
   free WWW servers also apply to commercial WWW servers. 

f. Linux Operating System 

Permission is granted for these files to be included free-of-charge in
distributions of the GNU/Linux operating system and associated utilities
and packages, e.g. the Red Hat, Debian, Turbo-Linux, etc. distributions,
provided the conditions stated above under "Incorporation into Free
Software and Server Systems" are met. 

g. Commercial Software Packages 

The contents of these files must not be incorporated into a software
package that is sold commercially or used exclusively within a company
or organization without the prior permission of the Group. Permission
will normally be granted for a non-exclusive licence to incorporate
the files in return for either a one-off payment to The Group or a
royalty on sales.
   The other conditions applying Shareware software apply. 

h. Published Dictionaries 

The contents of these files must not be published on paper or other
media such as CD-ROM without the prior permission of the Group. 

Prior Permission 

All permissions for use of the files granted by James William Breen
prior to March 2000 will be honoured and maintained, however the
placing of the KANJD212 and EDICTH files under the GNU GPL has been
withdrawn as of 25 March 2000. 

Enquiries 

All enquiries to: 

The Electronic Dictionary Research and Development Group 
(Attn: Assoc. Prof. Jim Breen) 
School of Computer Science and Software Engineering 
Monash University 
CLAYTON VIC 3168 
Australia 
(jwb@csse.monash.edu.au) 


[NOTE: The portion of this license statement pertaining to
KANJIDIC has been omitted, since it does not pertain to J_Inflect
which makes no use of, nor distributes any portion of that file.
At the time of this release, the full text may be found at:
http://www.csse.monash.edu.au/groups/edrdg/newlic.html]