To import a literature database into Bibliographix, the Bibliographix Import Filter will lead you through a four step approach.
And do not be frustrated if you cannot find the Import Format File for the file you want to import. There are literally thousands of databases available. And we do not know which ones you need. So instead of producing hundreds of Import Format Files (probably including quite a few no single user of Bibliographix will ever use) and still missing out several more hundreds, we decided to handle it differently. Whenever a user of Bibliographix decides a database is important to him, we will try our best to add the relevant Import Format File. This means if your data comes from a publicly available database and there is no Import Format File available at www/bibliographix.com/listofiff yet, let us do the work. Just email us an example of this database to import@bibliographix.com and tell us the database's name and where to find it and we will develop the Import Format File and make it available to all Bibliographix users. Just give us some time to do so.
So there are three ways how to handle the declaration of the structure of an import file:
@ARTICLE{Agarwal1998a,If you have manuscripts with ready-to-print bibliographies, they don't look that structured, although they also have a structure. They might look like this:
author = {Rajshree Agarwal},
year = 1998,
title = {Evolutionary Trends of Industry Variables},
journal = {International Journal of Industrial Organization},
volume = 16,
pages = {511-525},
keywords = {Evolutionary Theory,}
}
Agarwal, R. "Evolutionary Trends of Industry Variables", in: International Journal of Industrial Organization 1998, 16, 511-252.The import module of Bibliographix can process both kinds. Once you have declared the format of formatted databases, you just have to push a button to import. With a "ready-to-print" Bibliography you first need to spend some work on the data itself. Read the chapter Ready-to-Print-Bibliographies for details.
Two more general remarks .
When using the Bibliographix Import Filter in combination with an Import Format File, importing data into Bibliographix is a simple four step approach:
You can play around with the Bibliographix Import Filter using the sample you received together with Bibliographix. In the folder where you installed Bibliographix you will find a folder called import containing the following files:
These format files can be found in the subdirectory \import in the Bibliographix directory. If an import database uses more than one publication type, you may find multiple files for the same format covering different publication types. Each format has a small sample.
Please also note that in some cases different formats for an indentical database exists. This for example means that "Medline" exists in a variety of dialects. If you select the "wrong" Medline filter, the importing will not work. This is even more important for formats that do not come from a commercial provider but rather from individual users. From our experience we can tell that to each BibTeX user there is a special dialect. BibTeX seems to be a standard nobody sticks to :) So you may need to use our filter as a starting point if your format deviates from our format as shown in the sample.
If you do not have an Import Format File available yet, you obviously can't open any. But do not forget to save your settings as new Import Format File at a later time. You can do that at any time by pushing the button save and following a standard Windows dialog.
Now select the file that contains the actual data by pushing the button select import file on tabsheet Record, the only tabsheet available so far. In another standard Windows dialog you will be able to choose the file you want to import. Notice how the first lines of the import file are shown in the area in the bottom of your Bibliographix Import Filter once you are finished. Depending on the structure of your import file it could look this:
\Sort{The import filter can digest files with up to 5000 records. If your file contains more records, only the first 5000 are processed.
Mode{on}
Collation{mixed}
SortTypeOrder{key,name}
NameOrder{ascending}
Key{{author,editor}}
KeyOrder{ascending,nulls first}
}@ARTICLE{Agarwal1998a,
author = {Rajshree Agarwal},
year = 1998,
title = {Evolutionary Trends of Industry Variables},
journal = {International Journal of Industrial Organization},
volume = 16,
pages = {511-525},
keywords = {Evolutionary Theory,}
}
@BOOK{Aghion1998a,
author = {Philippe Aghion and Peter Howitt},
year = 1998,
title = {Endogenous Growth Theory},
publisher = {The MIT Press},
address = {Cambridge: MA},
keywords = {Croissance, Innovation, Investment, Emploi,}
}
The next important step is to tell the filter which chunks of text are to be considered a single record. You need to set up a rule here. In our example (this format is called BibTeX by the way) every new record starts with a line containing "@" and "@" is not contained anywhere else. So just enter @ and push the button divide import file in datasets. The Bibliographix Import Filter will divide the import file into different records and will inform you about the number of records found in the data file using this rule.
Do not worry if the number the Bibliographix Import Filter tells exceeds the correct number. The dataset you want to import may not contain records only but has some heading too. The Bibliographix Import Filter might understand this heading as one or more records at this point of time. When choosing a publication type (see chapter 3) this problem will usually be solved.
After pushing the button divide import file in datasets two more things will happen.
The basic idea of the Bibliographix import module is to process only one publication type at a time. You have to declare which publication type you refer to. If all record in the data file are of the same type, declare this type. If not, you have to declare which publication type you want to process now and how this type is labelled in your data. The Bibliographix Import Filter does so on tabsheet PubType.
If you opened an Import Format File you can push the button Assign publication types right away since all necessary information is entered already. If not you will have to enter all information manually (and save it at any time for later usage).
If the database you want to import to Bibliographix contains one publication
type only you just have push the first radio button an choose the wanted
kind of publication type. If the database contains more than one publication
type than choose the second radio button. It is now necessary to tell the
Bibliographix Import Filter how to recognize what kind of publication type
a single record is. Let's go back to the old example but let us take care
of the records only:
@ARTICLE{Agarwal1998a,
author = {Rajshree Agarwal},
year = 1998,
title = {Evolutionary Trends of Industry Variables},
journal = {International Journal of Industrial Organization},
volume = 16,
pages = {511-525},
keywords = {Evolutionary Theory,}
}
@BOOK{Aghion1998a,
author = {Philippe Aghion and Peter Howitt},
year = 1998,
title = {Endogenous Growth Theory},
publisher = {The MIT Press},
address = {Cambridge: MA},
keywords = {Croissance, Innovation, Investment, Emploi,}
}
Here we have at least two kinds of publication types: articles and
books. Let us assume we want to start with the publication type article.
In our literature file this publication type is labelled ARTICLE (all capital letters). The Bibliographix label for this is Article. So type ARTICLE (all capital letters) in the first box and choose Article in the second one.
Now tell the Bibliographix Import Filter where to find the information on the publication type. In this case it is between the initial character @ and the final character {. It is necessary to tell the Bibliographix Import Filter the initial character, i.e. behind which single letter or combination of letters to start looking for the publication type. In our case this was the initial character @. It is not always necessary to add the final character, in our example {, as well. If no final character is entered, the Bibliographix Import Filter will just start after the initial character and take the rest of the line.
A third information needed is whether Bibliographix will find the letters ARTICLE and ARTICLE only between initial character @ and the final character{ (choose radio button exactly matched)or whether there can be found other letters as well (choose radio button contained anywhere).Why these two buttons? The first button (exactly matched) should be used if one is looking for publication type BOOK but not for publication type BOOK-CHAPTER. The second button (contained anywhere) can be used if the data structure is so that you are not able to exactly single out the publication type. In our case one is able to single out the publication type exactly and therfore button exactly matched should be pushed.
After entering these informations push the button Assign publication types. Bibliographix will now look through all records and check whether it can find ARTICLE (and ARTICLE only) between @ and the first { afterwards. If so it will consider the record of type Article, otherwise it will not.
After the Bibliographix Import Filter went through all records it will inform you about the number of records which are of the relevant publication type. Only these records will be shown in the bottom part of the Bibliographix Import Filter from now on.
Now the filter displays up to twenty tabsheets. These are all tabsheets relevant to understand the structure of the publication type chosen. The number and title of the tabsheed vary with publication type, since a book obviously has other and different data from a journal article.
If you do not have a Import Format File which does the work for you, you will have to go through all of these tabsheets. Fortunately they are all made the same way and therefore if you understand the idea one you understand them all.
If you want to handle another publication type, let us say publication
type BOOK, you will
have to run the Bibliographix Import Filter one more time. You will have
to do the same steps as with publication type ARTICLE.
Fortunately you can save quite some work. Since the structure of different
publication types within one database is usually very similar, it makes
sense to load the Import Format File for publication type Book first and
then change it whenever necessary. Don't forget to save this second filter
under a different name to avoid overwriting your old settings.
This chapter will explain you what to do. If you there is an import filter available for your database, explaining the structure it is no work at all. In this case just skip this chapter and go straight to tabsheet Import (explained in chapter 5).
After you pushed button Assign publication types on tabsheet Pub.Type (see chapter 3) quite a few new tabsheets appeared. The number of tabsheets depends on the publication type your are working with.These tabsheets are necessary to understand the structure of the publication type you are working with and you will have to go through all but the last one Import in this step. Since all tabsheets are very similar, we explain the first one only, i.e. tabsheet Author. To do so assume the record shown in the bottom of the Bibliographix Import Filter is the following:
@ARTICLE{Androutsopoulos-etal95,
author = {I. Androutsopoulos and G. D. Ritchie and
P. Thanisch},
year = 1995,
title = {Natural Language Interfaces to Databases-an
introduction},
journal = {Journal of Language Engineering},
volume = 1,
number = 1,
When entering initial and final character it is possible to copy the
signs from the field in the bottom of the Bibliographix Import Filter into
the relevant fields of the tabsheets. Just use the Strg +
c
and Strg + v combinations to copy and paste
respectively. To do so makes sense since one cannot recognize easily wether
there are blanks or tabs between the two signs
=
and
{ in the initial character author
= {. By using copy and paste the right letters will
be used automatically although they may look differently. For example if
a tab is is used, the beginning field will look like author =|{
because a tab is shown as |.
I. Androutsopoulos and G. D. Ritchie and P. Thanisch.but it does not know how names are written and how they are seperated. In our example names are written like a. John F. Kennedy and seperated by and. This chapter will explain why. Just one remark in advance: It makes sense not to rely on one example only to define the structure of the database. We therefore recommend that when defining the structure you look at several records by using the buttons Previous and Next.
Names can be written in quite a few different ways:
|
|
|
|
|
a. John F. Kennedy |
|
|
|
|
b. John F Kennedy |
|
|
|
|
c. JF Kennedy |
|
|
|
|
d. Kennedy, John F. |
|
|
|
|
e. Kennedy, John F |
|
|
|
|
f. Kennedy, JF |
|
|
|
|
g. Kennedy John F. |
|
|
|
|
h. Kennedy John F |
|
|
|
|
i. Kennedy JF |
|
|
|
|
In our example
I. Androutsopoulos and G. D. Ritchie and P. Thanisch.
names are written like:
One more point to regarding to c. JF Kennedy, f. Kennedy, JF and i. Kennedy JF. Here initials are written like JF, i.e. there is no blank between J and F. If in the original database a blank is entered between initials, the Bibliographix Import Filter will understand the structure as well, so do not be worried about blanks between initials.
The information concerning the seperation of authors is necessary to know how many authors are contained in one string. In our example the authors are seperated by and. There are two seperators included so there are three different authors within the string. When entering a seperator consisting of letters only, it makes sense to enter a blank in front of and behind the seperator, i.e. entering blank and blank instead of and only. Ohterwise the Bibliographix Import Filter would understand a single author such as Peter F. Gewandler as the two authors Peter F. Gew and ler becaus the string and is contained in the family name Gewandler.
In our example we entered all the information needed. Pushing the button test would show us, that we understood the structure of our database correctly. The formatted result would be
Androutsopoulos, I.; Ritchie, G. D.; Thanisch, P.and this is exactly the way Bibliographix requires it:
%a De Leeuw, GHere there are four authors but all in different lines.
%a Davidson, K L
%a Gathman, S G
%a Noonkester, R V
%t Modeling of aerosols in the marine mixed layer
%j SPIE Proceedings
%v 1115
%d 1989
%y amj
%z 2020
%k Manual entry
%X SPIE Conference on Propagation Engineering
%Y 28-30 March
%Z Orlando, FL
%D Conference Proceeding
You can process these authors the following way:
Now we do have a string where
De Leeuw, G.; Davidson, K. L.; Gathman, S. G.; Noonkester, R. V.and that is the way Bibliographix requires authors to be written.
I.-Androutsopoulos and G.-D.-Ritchie and P.-Thanisch,i.e. there is a - used instead of a blank in names. This is a quite popular format in online databases. Entering a - in the first field will solve this problem and lead to the result wanted.
These fields allow you to get rid of any additional information you do not need. Let us assume you are looking for the editors of a book and between beginning and ending sign the following string is contained
I. Androutsopoulos (ed.) and G. D. Ritchie (ed.) and P. Thanisch (ed.).Here you want to get rid of (ed.)which is not part of the authors name but additional information Bibliographix stores differently. If you do not remove it, the Bibliographix Import Filter will convert the names to
(ed.),I. Androutsopoulos; (ed.), G. D. Ritchie; (ed.), P. Thanisch,i.e. it will understand (ed.) as the family name and everything else as first names or initials. So just enter (ed.) in one of the fields and the problem is solved.
AN: 2000-13010-004
DT: Journal-Article
TI: Attractor dynamics in word recognition: Converging evidence from errors by normal subjects, dyslexic patients and a connectionist model.
AU: McLeod,-Peter; Shallice,-Tim; Plaut,-David-C.
SO: Cognition. 2000 Jan; Vol 74(1): 91-113
IS: 0010-0277
PY: 2000
AB: Demonstrates a link between errors in the identification of written words by normal Ss, neurological patients, and a connectionist model.
In this case the field starting behind SO:
contains multiple informations: Journal - year - month - volume - number
- pages. From a database point of view this is a very awksome way to handle
information. The golden rule is: "one information one field". We don't
know why very popular databases such as medline use these mixed fields.
The conspiration-theory-answer would be that they use it because they don't
want people to make further use of the data.
So if you enter SO: as the beginning sign and do not enter an ending sign, the string the Bibliographix Import Filter will analyze is
Cognition. 2000 Jan; Vol 74(1): 91-113.Let us stay away from the authors for a minute and assume we are interested in the pages.
To figure out the pages 91-113, more structure has to be explained. It is necessary to check the box in front of Field contains more information than ... only. Once we did that, two more areas are available to structure the data. In the first area we are asked where the relevant item begins. In our case pages are not in the beginning of the mixed field but after the first ":". Therefore it is necessary to click the second radio button and to enter : number 1. Now we have to enter where the pages end. In our example the pages are the last item of the mixed field. Therefore it is necessary to click the first radio button. Now the Bibliographix Import Filter has all the necessary information to figure out the pages.
One remark regarding mixed fields. They might cause trouble. Especially
if the database is not well maintained and if the mixed field contains
quite a few different informations, the result of mixed fields might not
be the right one for every single record you want to import. Whenever it
is necessary that all information is without any error we therefore recommend
you to test the structure with a few records and/or check every single
record the way it is described in chapter 5.
SO - Surgery 1965 Dec;58(6):969-78In this example the year of publication is 1965. Using mixed fields (Year is first item of mixed field and Year ends in front of ;) one is able to single out the following string only:
Surgery 1965 DecChecking the box Select numbers only will correctly reduce this string to:
1965This chapter 4 (except for current chapter 4.9) describes how to handle tabsheet Authors. If you do not have an inport format file available you must go through all tabsheets except for Import before you can import your data. All these tabsheets are build the same way but might not contain as many fields as tabsheet Authors (e.g. it does not make sense to ask for the way names are written if you are looking for the year). Usually you will not be able to fill all tabsheets since your database might not provide you with information let's say on the abstract. In this case just let the tabsheet Abstract empty.
And one final remark. As with every work you are doing on your PC: Don't
forget to save your import filter from time to time.
Once you click on tabsheet Import you will see how record number one (at the bottom) is converted into a Bibliographix record (in the center). You can go through all records by clicking on the buttons Previous and Next. This way you can check every single record and perform any changes you want in the center. Whenever you you are clicking on one of the buttons Previous, Next or Finish all your changes to the current record will be accepted, i.e. the way you changed the record will be the way the record will be saved and once you return to this record your changes will still be there. If you do not consider these checks necessary you can click on the button Finish all. This way all records will be converted according to the rules and the last dataset will be shown.
In order to be accepted by Bibliographix, a record must contain information on at least author, title and year. Whenever there is a record that does not contain information in any of these fields, the Bibliographix Import Filter will warn you about it. This warning occurs independent of whether you are going through all the records via buttons Previous and Next or via Finish all. Whenever you are receiving a warning you must either add the missing information manually before clicking on one of the buttons Previous, Next or Finish all or the record will be discarded (By the way, you can use this to throw away any record you like. Just remove the entry behind either author, title or year).
Another point regarding author and title is publication type edited volumes. This publication type does have neither author nor title but editor volume and editor title. In order to be accepted by Bibliographix nevertheless, the Bibliographix Import Filter will copy the information on editor volume and editor title into author and title.
When going through the records you will sometimes realize that there is a need to change the records on one or more of your tabsheets regarding the structure of your database. You can do that by clicking on these tabsheets and changing the records. You should just keep one thing in mind: When you are going back to tabsheet Import you will start with the first record again. This is done so that all records will be converted according to the new rules you just declared. But this additionally means that all the changes you already manually entered to records are lost.
Once you went through all the records and you are confident with the quality of the data you generated, it is time to save it as a Bibliographix-database. This is done either by clicking on button Create new db or by clicking on button Append to db. If you choose Create new db a new database containing your data will be created. In case you choose to save it under the name of an existing database, this old database will be overwritten with new data, i.e. the old data is lost. If you do not want do loose your old data but want to add new data do an old database choose Append to db instead. This way new data will be added to an old database.
Saving the data is the point where we differentiate between different
kinds of users. While you could perform all steps until now independent
of whether you are a user of Bibliographix Basic, Light or Pro, you will
be able to save your data only if you are a user of Bibliographix Pro.
(Do not confuse this with the import format file, i.e. the file which contains
information regarding the structure of your database. This file can be
saved by all users). This way you are able to play around with the Bibliographix
Import Filter and check, whether it will work with your database, but you
are not able to use it unless you are a Pro user. Fair enough we think.
Let's assume we have the following short bibliography (which was created using Bibliographix and style American Economic Review):
Akerlof, George A., "The Market for 'Lemons': Quality Uncertainty and the Market Mechanism, " Quarterly Journal of Economics, 1970, 488-500.The first thing you have to do is to save it as text only with your word processor (Choose save as and choose file type: text only). This way if you open it from your Windows explorer it looks like:
Axelrod, Robert, The Evolution of Cooperation, New York: Random, 1984.
Dekel, Eddie, Scotchmer, Suzanne, "On the Evolution of Optimizing Behavior, " Journal of Economic Theory, 1992, 392-406.
Wilson, Edward O., Sociobiology, Cambridge, Massachusetts: Harvard University Press, 1980.
Akerlof, George A., "The Market for 'Lemons': Quality Uncertainty and the Market Mechanism, " Quarterly Journal of Economics, 1970, 488-500.Boldface, italic and so forth have vanished but this information was superfluous anyway.
Axelrod, Robert, The Evolution of Cooperation, New York: Random, 1984.
Dekel, Eddie, Scotchmer, Suzanne, "On the Evolution of Optimizing Behavior, " Journal of Economic Theory, 1992, 392-406.
Friedman, James W., Game theory with applications to economics, Oxford: 1986.
Wilson, Edward O., Sociobiology, Cambridge, Massachusetts: Harvard University Press, 1980.
Now it is neccessary not only to explain the structure but to add some structure. The Bibliographica Import Filter must be able to recognize where a single record starts, what publication type every single record is and what kind of information can be found. The structure necessary for this task must be added manually in the text only document. One way to do so ist the following:
%1 Article
%2 Akerlof, George A.
%3 "The Market for 'Lemons':
Quality Uncertainty and the Market Mechanism, " Quarterly Journal of Economics,
1970, 488-500.
%1 Book
%2 Axelrod, Robert
%3 The Evolution of
Cooperation, New York:
%4 Random, 1984.
%1 Article
%2 Dekel, Eddie
%2 Scotchmer, Suzanne
%3 "On the Evolution
of Optimizing Behavior, " Journal of Economic Theory, 1992, 392-406.
%1 Book
%2 Wilson, Edward O.
%3 Sociobiology, Cambridge,
Massachusetts:
%4 Harvard University
Press, 1980.
This is enough structure for the Bibliographica Import Filter to understand the data:
And finally: Do not get scared by the sheer amount of tabsheets. Play around with the Bibliographix Import Filter and you will realize, it is not that difficult. For us programming even turned out to be fun. So we hope you have fun as well.