J_Inflect
Development Release 0.3.6 - July 27, 2000
Japanese Verb and Adjective Inflection/Analysis Engine
This is a program I wrote as a learning exercise and to
create a learning tool for inflective forms (verbs and
adjectives) in Japanese. It is a work in progress, and
is a command-line tool at the moment. I'm
making it available to solicit comments and corrections.
Please try it out and let me know if you like it.
As I develop, I will keep the latest working version here.
I plan to get back to development work on it soon, but
I can't make any promises.
What it does:
- Forward conjugate Japanese verbs and adjectives into
many different inflectional forms. Results can be output
with inflected kanji and kana to an EUC text file.
For words listed in the dictionary files, fully
realized English translations are provided.
- Reverse conjugate an inflectional form, matching it
against possible candidates from a dictionary file.
Downloads
Current as of July 27, 2000
Windows (95/98/NT) v.0.3.6: J_Inflect.zip (417 KB)
If you have a problem downloading this, please e-mail me at
cmmcculley AT charter DOT net,
and I can send you the file through e-mail.
Note [4/28/01]: I have packaged in a new rules file that
contains a number of new inflections and has been
better organized. The original is still there, and you can
rename the files to use it instead if you want to.
I did not remake the Mac distribution file, but the
new verb rules file from this archive will work for
Mac also. (If you don't have something that will read
a ZIP file, e-mail me and I will send the rules file to you).
Macintosh PPC v.0.3.6 pre-release: J_Inflect.sit (360 KB)
Documentation for v.0.3.6
(Note, both download files have documentation included)
Version 0.3.6 for Windows has been released as a
DOS console application.
The PC/Win version of this program was using a console
emulator which unfortunately had some bugs. J_Inflect has
been recast as a
console (DOS-Window) application in order to make this
release available.
On Win98 there isn't much you can do to make life easier
as far as information scrolling off the screen, except
perhaps limiting queries to one tense at a time, or
sending output to a file.
On WinNT (or 2000, presumably), it is suggested that under
the "Layout" tab in the
Console control panel you set both width values to 85. Set the
window height value
to make the window as tall as possible while still fitting
on your screen. Set as large a buffer height as is
comfortable (500 lines or more) to enable the console
scroll bar and enable you to see results that scroll off
the screen. That being done, you should be able to
double-click the application (.exe) file, as usual to
start a console window with J_Inflect.
The new command "exit" will stop the program.
Version 0.3.6 contains fully
functional reverse conjugation, provided by the inclusion of
full-sized verb and adjective dictionary files extracted from
Jim Breen's EDICT file. Here is the full list of changes:
- Changed the format of the Words_Verbs.dat and Words_Adj.dat
files so that they no longer contain a redundant roomaji
entry. The program now converts internally to roomaji
from kana. In addition, dynamic English translation
entries are optional. If an entry doesn't have them, the
standard [to do] form translation is used instead.
- Distilled from EDICT new Words_Verbs.dat and Words_Adj.dat
files. There are now 8777 available verbs (including about
3000 verbs using suru as an auxiliary), and 799 entries in
the adjectives file. Because EDICT does not mark true
adjectives, these were cut by hand from a list of words
ending in the syllable -i. It is possible (probable in fact)
that some words cut were actually adjectives with
un-adjective-like definitions, or else that words that
were left are not actually adjectives.
Please report any omissions or non-adjectives that you
find so that corrections can be made to the next version of
the file.
- Added the command 'dtest' to do a full conjugation of the
contents of either of the dictionary data files. This has
been used to find and fix problems conjugating certain forms,
test the coherence of the data files, and discover problems
with the internal kana/roomaji table.
- Fixed problems with the kana/roomaji table, including missing
syllables, support for katakana, and support for
combinations and punctuation peculiar to katakana.
- Added a trap in the roomaji to kana translator to prevent
invalid roomaji strings from being translated. This prevents
a misapplied rule from crashing the program, producing
instead a message in the kana reading that the rule application
is invalid. (It should be obvious to someone reading the
roomaji, e.g. "sosnasaimase" that the rule produced a
nonsensical result).
- Added a help command to give the user on-line reminders
of available commands, command syntax and available switches.
Bugs Fixed/Features Added in Currently Available Version
0.3.5 released June 11, 2000:
- Removed the different infinitive/inflection kanji
entries from the dictionary, as it turns out these
are unnecessary.
- Added dynamic English translation. For verbs in the
Words_Verbs.dat file, translations are now given based
directly on the verb being conjugated. For instance,
conjugating "taberu" gives "I ate" for the past
indicative. To support this, the data file now
contains entries for the infinitive (without "to"),
3rd person present, present participle, plain past,
and past participle of the primary English translation
of the verb. These are directly substituted for [do],
[does], [doing], [did], and [done] in the translation
template, respectively. For adjectives, an additional
field in the dictionary specifies the primary English
translation to be substituted for [X] in the rules file.
- Restructured the verb rules file to remove redundancy
and make the file easier to read and edit.
- Added command rconj which does reverse conjugation.
Given an inflection as an argument it will find all
words in the dictionary files that can conjugate to
that inflection, and present them to the user.
It searches verbs by default, but can take switch -a
which will search the adjective dictionary, or
switch -b which will search in both. If a single result
is found, it becomes the current word. If multiple
results, the user can choose one using command cw with
switch -r.
NOTE: The dictionary files are only test versions
that are woefully incomplete, so although this functionality
works, it requires more available data to be really useful.
Previous Changes
|