home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: sci.lang.japan
- Path: sparky!uunet!cs.utexas.edu!usc!howland.reston.ans.net!bogus.sura.net!udel!gatech!swrinde!emory!sol.ctr.columbia.edu!The-Star.honeywell.com!umn.edu!i9!molenda
- From: molenda@i9.msi.umn.edu (Jason Molenda)
- Subject: Re: ``Kanji Education'' paper available by anonymous-ftp
- Message-ID: <C1G51H.8nG@news2.cis.umn.edu>
- Sender: news@news2.cis.umn.edu (Usenet News Administration)
- Nntp-Posting-Host: i9.msi.umn.edu
- Organization: University of Minnesota
- References: <C131II.2Cq@news2.cis.umn.edu> <C14FDI.8F0@news2.cis.umn.edu> <HUTTAR.93Jan21185653@hp750.itg.ti.com>
- Date: Tue, 26 Jan 1993 05:39:16 GMT
- Lines: 154
-
- huttar@hp750.itg.ti.com (Lars Huttar) writes:
-
- > As you said, the kind of writing done on Usenet is an "unusual
- >mix of spoken and written Japanese." As such, it may not well
- >represent the mainstream of written Japanese. It may be
- >representative of the kind of writing in manga, but not in newspapers.
- >(I don't know -- I'm just raising the question.) I think we need to
- >address what reading level, or style of document, the student is
- >aiming at being able to read. Then we can ask, how much study does
- >one need to be able to read 95% of a newspaper? a comic? a novel?
-
- Unfortunately, (as you guess later in the message), usenet was the only
- available on-line source of Japanese that I know of. In all honesty, I
- did this paper for fun originally and it just happened that I wanted to
- graduate so I used it as my senior project. If I had had time to stay
- at the University I had a much more interesting project I wanted to
- do. Anyway, since I did it for fun I wasn't about to go through the
- pain and agony of typing in any other sources of Japanese. A book
- would be great but I'd probably spend a zillion years typing it all
- in.
-
- As I mention in the paper, there was one study about the frequency of
- kanji in newspapers back in the early seventies which found similar
- overall numbers. I do not have the kanji list they came up with,
- though, so I'm unable to compare these two lists. I'll be living in
- Japan in about two weeks, so maybe I'll have a chance to contact the
- National Language Research Inst (?) and find that paper. (I think the
- full paper is like 90 pages so it probably includes a list of the
- kanji found)
-
- >This leads to another question: how efficient are the various written
- >media in terms of reinforcing common kanji, boosting confidence,
- >learning new kanji, learning spoken idioms, etc.? These criteria are
- >somewhat at odds.
-
- Empirically, you will see the same kanji and jukugo used over and over
- if you stick to a particular domain. Headline news will use `oil' and
- `economy' many, many times. `zebra' (or whatever) is not going to
- occur nearly as often.
-
- Although I didn't include this in the paper, I found that when I
- restricted the scan to a particular sub-group (e.g. fj.bikes), the
- graph had the same basic curve but the 95th percentile fell to around
- 700 or so kanji with a total of maybe 1300 unique kanji (this is from
- memory, I can dig up/regenerate the data if you're interested).
-
- > To facilitate research on these questions, are there other
- >electronic sources of Japanese text, which are more representative of
- >the mainstream, or at least biased differently?
-
- I think the newspaper study is the best I can cite. Someone sent me a
- mail message (I've been deluged recently, sorry I can't find it right
- now!) which noted a similar results in Beginning Technical Japanese for
- a physics text. I think I have a copy of BTJ around here somewhere if
- someone would like me to look that up.
-
- > Looking at the appendix where the kanji [3396 unique kanji
- >found in a Usenet sample] are listed, I notice that the first two are
- >$BBg(B (`dai', big) and $B3X(B (`gaku', school).
- [...]
- >boosted in these samples by the many names of universities and
- >commercial research institutions in the writers' addresses. So again
- >we have to ask how accurate these orderings are...
-
- to be truthful, I found dai and gaku being the first two pretty
- suspicious myself. I'm thinking about trying to toss .signatures
- before doing the analysis. It would still get all those "konnichi ha,
- matsumoto@wherever-daigaku desu" at the beginning of messages but
- shikata ga nai desho^.
-
- > To get some indication of the precision of the samples, I
- >would like to compare the kanji-frequency orders gained from the two
- >samples taken 6 months apart.
-
- pretty easy to do. I did that, in fact. I think I noted that in my
- paper but I might have edited it out.
-
- In fact, after I did my last scan (late November), I changed my news
- configuration and now I have about 100 days of fj news instead of 50
- (as I used to have). If you'd be interested in seeing the output of
- another run, I'd be happy to re-run it (although I think most of
- November would still be on disk) and send you the output. It's really
- rather painless to run and takes maybe 30-40 minutes to process the
- ~125MB of fj I have around.
-
- > Do any kanji vary widely as to their
- >place in the list? (I suppose a statistician could come up with a
- >nice metric on this.) What if a student learned the first 1000
- >characters in sample 1, and tried to read sample 2 -- what percentage
- >would be readable?
-
- This is an interesting idea. I'm not sure about how I would go
- computing this. Maybe add up the absolute value of number of places
- each character moved and divide it by the total number of characters.
- After I get to Japan I'll have to find time to try this. (anyone
- have a better idea? this is just off the top of my head).
-
- the idea is that the really frequent characters will be pretty static
- in their location. As you get farther down the frequency list, the
- probability (I would think) of a character moving is higher. All
- bets are off for the 1,000 (?) or so that occur only once.
-
- >lists with ones already published. The only one I know of off-hand is
- >the one of the Joyo kanji in the back of Halpern's Shin-kanei-jiten.
- >You probably know of others. These lists are probably based on more
-
- Hm. I hadn't seen anything like this. Please don't assume I know any
- great deal about this -- I really only did it for fun. I have never
- used Halpern's kanei-jiten. I didn't know it had any frequency stuff
- in there.
-
- I really think -- for the individual studying japanese -- more
- important than focusing on any one particular list (as it is easy and
- presuasive to do), it is important to realize the underlying theme: In
- each study I've seen, whether it's an unrestricted domain analysis of
- fj, restricted to just fj.bikes (or any one particular newsgroup), or
- newspaper stuff or even a physics book, all kanji lists showed the same
- characteristics (which most of our teachers have been harping on all
- along :-): Pick something of interest and focus on it. There may be
- 1,945 characters and a few thousand others that will turn up in
- literature, but you won't need more than 700 or so to read a motorcycle
- magazine. For me, at least, that is enough.
-
- For non-individuals, I think the most important thing, more important
- any silly frequency lists, is to maintain some type of consistency in
- whatever list is picked over a multi-year program. This was lacking
- at the University of Minnesota (although there will be some pretty
- major changes here as of next year). It takes more than a raw list
- of kanji to teach students; a teacher needs some kind of comprehensive
- program with sample texts, characters that will build into easily
- taught jukugo, etc. I am not an educator and know nothing about this.
-
- > Initially, however, I think it's
- >important to learn to write some kanji well, and to write a lot of
- >kanji at least once, so that the student can learn kanji elements, and
- >thus be able to recognize them, and look them up, even when they don't
- >appear in the neatest book font.
-
- I agree. I think there is an important base of maybe 200-400, but
- after that maybe it isn't so important to remember that in `tsukau' (to
- use), on the right side the vertical line starts above the top
- horizontal line but in BENRI no BEN (nelson #451), it starts exactly on
- the horizontal line. (yes, I was marked off a point for this in
- first year Japanese and it always pissed me off. :)
-
- I should state a caveat: I'm just another student of Japanese who
- happens to also be a computer nerd. I have no experience as an
- educator and have only been studying Japanese for a little over three
- years. Although the numbers I have in my paper are pretty factual,
- please take the discussion and recommendations and such with a grain of
- salt.
- --
- Jason Molenda, University of Minnesota, Supercomputer Inst., Technical Support
- SGI Iris Admin molenda@jason.msi.umn.edu DoD #1867 '77 CB750F
-