home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: Java
/
Java.zip
/
sse99067.zip
/
JSSEARCH.INF
(
.txt
)
< prev
Wrap
OS/2 Help File
|
1999-03-08
|
16KB
|
436 lines
ΓòÉΓòÉΓòÉ 1. Introduction ΓòÉΓòÉΓòÉ
Introduction
This is a client side javascript search engine for smallish sites that don't
have access to server side CGI scripts.
The reason that it is not suitable for large sites is the size of the page that
needs to be downloaded. The search specification is also fairly primative.
There are options to limit the words that are used, you would not expect
someone to want to search on the word "the" for example!
Note that some search engines such as HotBot allow you to restrict searches to
your site. The problem with these are that you can only search on what was
there as at the last indexing the search robot performed, on a frequently
updated site this is not good enough. One possibility I'd like to suggest is
that you supply both options. One powerful but based on past index and a less
power up to date search.
If you have used a previous version of this program then I'd recommended you
examine the change history section to determine what changes might effect you.
Please see my web page "http://www.ozemail.com.au/~dbareis". for the latest
copy of this program or contact me (Dennis Bareis) via email (db0@anz.com).
I'm open to any reasonable suggestions.
Please have a look at the proposed changes section and after using this program
for a while tell me what you think.
A condition of use of this program is that you have a link to my homepage as
inserted by default. If you wish to use my "TEXTEDIT" program to automatically
reformat the generated html you must leave in my link. The link should be
visible showing the wording as generated.
ΓòÉΓòÉΓòÉ 1.1. Operating System Status ΓòÉΓòÉΓòÉ
Operating System Status
You can only use OS/2 at this stage to create the search HTML.
ΓòÉΓòÉΓòÉ 1.2. Proposed Changes ΓòÉΓòÉΓòÉ
If I've got the time I like to think about the best way of implementing
something (as well as questioning whether or not it should be). If I have a
requirement I like to implement as generic a solution as possible. This is my
"longer" term think about it list..
Proposed Changes - Rough Priority
1. Better searching.
2. Fix problem with pressing "ENTER" on input field.
3. Allow easier customisation of look and feel of generated code (append
option or template based).
4. If found ".WRD" file for ".HTM" file then add its words to those of
file.
5. Spell checker?
ΓòÉΓòÉΓòÉ 1.3. Change History ΓòÉΓòÉΓòÉ
Change History
Note that none of these changes have been done due to feedback from users
(apart from myself!), can I assume everyone is happy?
1. Version 99.067
Fixed bug which could cause some java script errors such as
"octal escape too large".
2. Version 98.365
Now use variable cookie name (unlikely to clash between multiple
search pages).
The '~' character is no longer a special character in an input
mask, a round bracketed form is now used to determine what
should be prepended to URLs. This is simplier to code for,
easier for user to understand and is much more flexible.
Now handles paths containing ":", "." & ".." better.
3. Version 98.356
Fixed bug where a line could be generated much too long. Some
buggy browsers (such as OS/2 netscape 2.02E) will fail on these
lines.
Now points to my new web site (new ISP).
Other minor improvements.
4. Version 98.254
New switch to allow valid word chars to be defined.
5. Version 98.180
Files are now sorted (case insensitive).
Occurs array compressed (where possible) using new offset based
technique.
Other relatively minor improvements.
6. Version 98.176
Much more compact, overhead of each array element reduced.
Tried to prevent reload() still can't do so, not sure I can even
though my early testing seemed OK (a required routine probably
marks page as requiring a reload). At moment I'm concentrating
on getting page size down.
Am testing new compression technique which will get us an extra
15% or more compression. I've not documented this yet as I'm
not 100% sure it will work.
Some minor bugs/features fixed.
ΓòÉΓòÉΓòÉ 1.4. Bugs ΓòÉΓòÉΓòÉ
Currently Known Bugs
1. You can't specify parameters (such as filenames) containing spaces on
the command line.
Reporting Bugs or Suggestions
If reporting bugs please supply:
1. All files involved (input, output and any Batch files used to run the
preprocessor). You have hopefully trimmed out everything which is
not required to reproduce the problem.
2. A detailed description of the problem.
The easier you make it for me the faster I will be able to come up with a fix
or tell you what your doing wrong etc.
ΓòÉΓòÉΓòÉ 2. JSSEARCH.CMD Command Line ΓòÉΓòÉΓòÉ
JSSEARCH.CMD Command Line
JSSEARCH[.CMD] [whitespace]InputMask [whitespace][Options[:parms]][whitespace]
The "InputMask" can be the name of a single file or a filemask containing the
normal wildcard characters "?" & "*".
The Input mask may be followed by a bracketed html prefix such as "(html\)" in
which case the path "html\" would be prepended to any html URL. Note that the
case of the prepended URL is never modified, you must supply it exactly as you
require it.
Options
You may specify one or more "Options" separated by whitespace. Options in the
optional environment variable "JSSEARCH_OPTIONS" are processed before any
specified on the command line. Valid options are:
1. /#
2. /EXCLUDEFILES
3. /EXCLUDEWORDS
4. /JUSTWORDS
5. /LOWER
6. /MAXPERCENT
7. /NOBUTTON
8. /NOLISTONNOMATCH
9. /OKINWORD
10. /OUTPUT
11. /PRETTY
12. /S
13. /TARGET
14. /UPPER
RETURN CODES
A return code of 0 indicates success.
Any other value indicates an error occurred.
EXAMPLE BATCH FILE
/********************************/
/* A Simplistic TEST batch file */
/********************************/
/*--- Initialization --------------------------------------------------------*/
address cmd '@echo off'
OutputFile = "out\SEARCH.HTM"
CloseRc = stream(OutputFile, 'c', 'close');
DosDelRc = SysFileDelete(OutputFile);
/*--- Generate start of HTML ------------------------------------------------*/
call GenerateLine '<HTML>'
call GenerateLine '<HEAD>'
call GenerateLine "<TITLE>Dennis Bareis' SITE SEARCH Page</TITLE>"
call GenerateLine '<meta Name="description" Content="Search Dennis Bareis` site">'
call GenerateLine '<meta Name="keywords" Content="Dennis, Bareis, Site, Search, Engine">'
call GenerateLine '</HEAD>'
call GenerateLine '<BODY BACKGROUND="os2warp.jpg">'
CloseRc = stream(OutputFile, 'c', 'close');
/*--- Work out what HTML files we don't want included in searches -----------*/
ExcludeFiles = '/excludefiles:C:\tmp\homepage.tst\sitesrch.htm /excludefiles:C:\tmp\homepage.tst\ssrchcmp.htm';
/*--- Combine all options we wish to apply ----------------------------------*/
AllOptions = '/pretty /nobutton /Output:+' || OutputFile || ' ' || ExcludeFiles;
/*--- Now start JSSEARCH.CMD to append to the html we began above -----------*/
signal S_014019; /*CommentBlock*/ /* (Friday 01/01/1999, 10:40:19, by Dennis_Bareis) */
//+--------------------------------------------------------------------------
//!FullOs2Cmd = 'out\jssearch.cmd out\*.htm(\html\) ' || AllOptions;
//+--------------------------------------------------------------------------
S_014019: /* (Friday 01/01/1999, 10:40:19, by Dennis_Bareis) */
FullOs2Cmd = 'out\jssearch.cmd out\*.htm ' || AllOptions;
say FullOs2Cmd;
address cmd 'cmd.exe /c ' || FullOs2Cmd;
ExitRc = Rc;
/*--- Generate End of HTML --------------------------------------------------*/
call GenerateLine '</BODY>'
call GenerateLine '</HTML>'
CloseRc = stream(OutputFile, 'c', 'close');
exit(ExitRc);
/*===========================================================================*/
GenerateLine:
/*===========================================================================*/
TheLine = translate(arg(1), "'", "`"); /* Restore Single quotes (coded as "`") */
call lineout OutputFile, TheLine;
return;
ΓòÉΓòÉΓòÉ 2.1. /ExcludeFiles ΓòÉΓòÉΓòÉ
Switch /ExcludeFiles:FileMask
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
This command can be used multiple times if required to specify files that
should not be processed. You can use this to prevent "hidden" html pages from
turning up in a search!
EXAMPLE
JsSearch.CMD C:\PROJECTS\HTML\~*.HTM /ExcludeFiles:C:\PROJECTS\HTML\NoShow*.HTM
ΓòÉΓòÉΓòÉ 2.2. /ExcludeWords ΓòÉΓòÉΓòÉ
Switch /ExcludeWords:FileContainingWords
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
This command can be used multiple times if required to specify files that
contain lists of words to be excluded. You would place words in this list which
a user is unlikely to select (or wish to match).
The file should contain one word per line with blank lines and lines starting
with ";" being ignored.
Note that the "/MAXPERCENT" switch can also be used to exclude common words
(and does do so by default).
You can get a list of words using the "/JUSTWORDS" switch.
EXAMPLE
JsSearch.CMD C:\PROJECTS\HTML\~*.HTM /ExcludeWords:C:\PROJECTS\HTML\WORD.LST
ΓòÉΓòÉΓòÉ 2.3. /JustWords ΓòÉΓòÉΓòÉ
Switch /JustWords
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
This command causes the complete word list that would normally be imbedded in
the javascript to be generated. This could be used to create a list of words
you are not interested in (to be loaded with /ExcludeWords) or to perform a
spell check.
ΓòÉΓòÉΓòÉ 2.4. /Lower ΓòÉΓòÉΓòÉ
Switch /Lower
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
Normally filenames are left in the case that they appear on the filesystem.
This switch forces all filenames to be in lower case. This could be required if
you force lower case when you ftp (to upload) your website.
ΓòÉΓòÉΓòÉ 2.5. /MaxPercent ΓòÉΓòÉΓòÉ
Switch /MaxPercent:Percentage
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
There are two ways to exclude words, the first is to list the word in a file
included with the "/ExcludeWords" switch the other is this switch.
If a word appears in all files there seems little point in having it available
in the word list (particularly if you don't use the /NOLISTONNOMATCH switch).
If it appeared in 80% of the files there is probably still little point.
This switch allows you to specify the percentage of files a word is allowed to
appear in before its dropped. To drop no words you would need to specify "0".
EXAMPLE
JsSearch.CMD C:\PROJECTS\HTML\~*.HTM /MaxPercent:80
ΓòÉΓòÉΓòÉ 2.6. /NoButton ΓòÉΓòÉΓòÉ
Switch /NoButton
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
By default the search dialog has a search button. It does not really need one
as simply pressing enter will cause a search to be performed.
ΓòÉΓòÉΓòÉ 2.7. /NoListOnNoMatch ΓòÉΓòÉΓòÉ
Switch /NoListOnNoMatch
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
By default if a search fails to find a match all the files are listed. This
switch prevents this from happening.
ΓòÉΓòÉΓòÉ 2.8. /OkInWord ΓòÉΓòÉΓòÉ
Switch /OkInWord:ExtraCharList
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
Letters and digits '0' to '9' are always valid in words. You may supply an
addition list of characters.
EXAMPLE
JsSearch.CMD C:\PROJECTS\HTML\~*.HTM /OkInWord:#@/-
ΓòÉΓòÉΓòÉ 2.9. /Output ΓòÉΓòÉΓòÉ
Switch /Output:[+]FileName
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
This option specifies the name of the generated html. The default is
"JsSearch.htm".
Unless "+" was specified the file is deleted before anything is written to it,
otherwise code is appended. You would wish to append if you want specific
backgrounds etc to be used for the page.
EXAMPLE
JsSearch.CMD C:\PROJECTS\HTML\~*.HTM /Output:+C:\TMP\SEARCH.HTML
ΓòÉΓòÉΓòÉ 2.10. /Pretty ΓòÉΓòÉΓòÉ
Switch /Pretty
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
By default the generated code is fairly compact (approx 20% compressed), if you
wish to see it in a "prettier" format you need to use this switch.
ΓòÉΓòÉΓòÉ 2.11. /S ΓòÉΓòÉΓòÉ
Switch /S
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
By default when scanning for files we don't look in subdirectories. This
option says to look in subdirectories.
ΓòÉΓòÉΓòÉ 2.12. /Target ΓòÉΓòÉΓòÉ
Switch /Target:WindowName
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
When you select a file link the default target for the html's page is "_top",
this may or may not be suitable in your environment.
You can prevent the use of the "target" tag altogether by specifying "" for a
window name. You could also start a new browser or load the html into a frame
which you name.
ΓòÉΓòÉΓòÉ 2.13. /Upper ΓòÉΓòÉΓòÉ
Switch /Upper
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
Normally filenames are left in the case that they appear on the filesystem.
This switch forces all filenames to be in upper case.
ΓòÉΓòÉΓòÉ 2.14. /# ΓòÉΓòÉΓòÉ
Switch /#
This is a JSSEARCH.CMD command line switch. You can set up your own default
switches in the "JSSEARCH_OPTIONS" environment variable.
By default a word which is a decimal number is not considered to be a word.
Use this switch to have them show up in searches.