Introduction

The phpMySearch search engine system is a complete world wide web indexing and searching system for a small domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Webcrawler and AltaVista. Instead it is meant to cover the search needs for a single company, campus, or even a particular sub section of a web site.

Search Engine utilizes PHP, MySQL , CURL library, and Adobe PDF to HTML converting gateway.

 

System Requirements

*nix or Windows 95/98/NT system environment

PHP 4.0.2 or higher

MySQL 3.23.32 or higher

CURL 7.0.2 or higher  (Usually comes with php, but should be enabled in php.ini)

 

 
Installation instructions
  1. Unzip archive se.zip to the folder where you’d like to place search engine.
  2. Edit file DataBaseSettings.inc.php and put appropriate values for the following variables:

·         $DBName            Name of MySQL database which will be used by search engine

·         $DBUser              MySQL username

·         $DBPassword     MySQL password

·         $DBHost              Host where your MySQL server resides

NOTE All required tables will be created automatically.

 

 

Installing CURL and CURL support in php [Unix]

Download the latest version of CURL from http://curl.haxx.se/download.html

Current version of CURL is located at http://curl.haxx.se/download/curl-7.8.tar.gz

 

Uncompress it:

gunzip curl-7.x.tar.gz

tar xvf curl-7.x.tar.gz

 

NOTE!!! You have to make a patch in order libcurl has a correct version number

Find a file lib/Makefile.am and make the following changes:

Line #2

# $Id: Makefile.am,v 1.22 2001/05/29 19:17:03 bagder Exp $

change with

# $Id: Makefile.am,v 1.22 2001/06/11 12:30:57  bagder Exp $

Line #19

libcurl_la_LDFLAGS = -version-info 2:0:1

change with

libcurl_la_LDFLAGS = -version-info 2:1:0

 

Install CURL with

./configure

make

make install

 

Compile CURL support in php

Go to the php sources folder and

./configure --with-apxs --with-curl

make

make install

 

make a symbolic link to libcurl.so.1 in /usr/lib

ln  -s /usr/local/lib/libcurl.so.1 /usr/lib/libcurl.so.1

 

If it is your first installation of php – recompile apache to use php as DSO module

cd ../apache_1.3.x

 ./configure --activate-module=src/modules/php4/libphp4.a

make

make install

 

Install php as CGI/Commandline to enable execution of spider as it is executed from the command line

cd ../php-x.x.x

./configure --with-curl

make

make install

 

Don’t worry php still will be executed as apache module too.

 

 

Enabling CURL support in php [Win32]

Uncomment the following line in php.ini

;extension=php_curl.dll

Find the following libraries in your php distribution

php4ts.dll,SSLEAY32.dll,php_curl.dll,MSVCRT.dll, libeay32.dll

and copy them to Windows\System

 

 

 

Administration Interface

To access administrative interface run script admin.php

 

Default login and password are:

login:                admin

password:       admin

 

RECOMMEND TO CHANGE

 

Table 1-1: Fields and buttons on the Administration page.

Field or button

Description

DB Main table name

Name of the table, which will be used, for storing and indexing documents. The default name is _Pages

DB Settings table name

Name of the table, which will be used, for storing search engine settings. The default name is settings

DB Spider state table name

Name of the table, which will be used, for storing spider working state. The default name is spider

Search start URL's:

A list of URLs from which crawler will start to gather information. To add new URL to the list type it in the field below and push ADD button. To remove any of the URLs check the checkboxes near URLs you’d like to delete and push REMOVE button.

Not indexed URL's (Black list)

List of URLs, which will be ignored by crawler. . To add new URL to the list type it in the field below and push ADD button. To remove any of the URLs check the checkboxes near URLs you’d like to delete and push REMOVE button.

Document extensions to index

A list of document extensions which spider should try to index. To add new document extension to the list type it in the field below and push ADD button. To remove any of the extensions check the checkboxes near extensions you’d like to delete and push REMOVE button.

Search depth

Search depth tells spider how much iteration he should follow links from the pages and proceed with crawling.

 

0 - don't follow any links

1 - follow links only from the first page

2 - follow links from the first page + 1

3 - follow links from the first page + 2

Reparse all

If check box is checked spider will clean database and start to ramble, otherwise it will parse only pages that were updated. By default it is unchecked

Automatic spider start

Check this box if you have not access to crontab tool at *nix systems or task scheduler in Windows system. If it is checked each time visitor use search script, it will check whether it is time to start the crawler. If it founds that it is time to start the time or crawler was not started at specified time script will start the spider script.

 

It is recommended that you use system scheduling utilities to start the spider.

Start time

Time to start the spider

Start spider each (days)

Period in days to restart the spider (in days)

Force crawling

Click on Start Spider button to start spider immediately

Number of links per page

Specify here number of links, which should be displayed on a single page.

Max pages block

Specify how many pages should be visible in pages menu

Search Engine log file name

Enter path to the file where search engine will log its work

Spider Engine log file name

Enter path to the file where spider will log its work

Admin Tool log file name

Enter path to the file where will be logged all changes in admin tool

Templates path

Path to template files

Admin Login

Badminton login name of administrator

Admin Password

Administrator’s password

Confirm Password

Confirmation of administrator’s password

Submit

By clicking on Submit button you’ll save all changes

 

 

 

 

 

 

Searching

If you want to search for a single word it is simple. Just type the word you’d like to search for and press Submit.

 

You also can use Boolean logic to narrow search. See table below for operators allowed.

 

 

 

 

Table 2-1:Search Boolean logic.

Operator

Description

AND

Finds documents containing all of the specified words or phrases. Peanut AND butter finds documents with both the word peanut and the word butter.

OR

Finds documents containing at least one of the specified words or phrases. Peanut OR butter finds documents containing either peanut or butter. The found documents could contain both items, but not necessarily.

AND NOT

Excludes documents containing the specified word or phrase. Peanut AND NOT butter finds documents with peanut but not containing butter. NOT must be used with another operator, like AND. Search engine does not accept 'peanut NOT butter'; instead, specify peanut AND NOT butter.

OR NOT

Finds documents containing one of the specified words or phrases or not containing other word. Peanut OR NOT butter finds documents which contain Peanut or not containing butter

“”

Quotation marks are used to denote exact phrases. For example, a search on "New York Times" will match only documents containing the words as an exact phrase. It will not find pages with the words used in a different order, such as "New times in York!"

{ }

Braces are used to denote folders. For example, a search on "CPAN/objects" will match only documents stored in www.servername.com/currentlocation/CPAN/objects

 

 

You also can navigate through the site folder structure. In the dropdown box before Submit button you will see list of subfolders of the current folder. By selecting the folder name you localize search to this folder and its subfolders. ‘.. allows you to go one step up