Index


RISC World

Data Analysis

Paul Webb examines the options for data analysis on RISC OS.

We live in a world that is awash with data. One only has to think of the importance of School League Tables or the FTSE index to appreciate that information of all types has an increasingly prominent role to play in our modern lives.

But do we have to accept the 'spin' on data which is presented to us in the media? Is there any way in which we can systematically evaluate accepted practices?.

Fortunately, RISC OS users can answer 'no' to the first question and 'yes' to the second question because of the availability of 1st from Serious Statistical Software (SSS) and Analysis from Giovanni Lo Conti - two contrasting but complementary data analysis packages.

The Quantitative and Qualitative Divide

So why look at two packages rather than one? Well, 1st and Analysis meet different needs. 1st enables the user to examine data which is usually quantitative or numerical in nature whereas Analysis can also deal with qualitative or textual data. Think for example of a person who tells a researcher how many hours he works each week and what he 'feels' like when he is working. The former piece of information - number of hours worked - is an example of quantitative data whereas the latter piece of information - a description of his emotions - exemplifies qualitative or textual data.

Of course the quantitative/qualitative distinction does not always apply. The quantitative data analyst may occasionally deal with examples of categorical data like social class or gender and the qualitative analyst may obtain numerical information from text.

But, in general, the distinction remains a useful one to bear in mind as you read this article.

First

1st - or Fully Interactive Regression STatistics to give the applicaton its full title - arrives on two discs. The Program Disc contains the application itself whilst the Manual Disc contains an interactive manual and a demonstration program which shows off 1st's capabilities.

Installation is simplicity itself and merely entails creating a directory on your hard disc as a prelude to copying the contents of both discs into it. Users without a hard disc are also able to run 1st from the Program Disc.

The Sheet

Activating 1st is very straightforward. The user simply double-clicks on the 1st icon and the application's icon appears on the icon bar. CSV and 1st files can then be dragged onto the 1st icon at which point 1st displays your data in the form of a data matrix or sheet. The size of the sheet can also be specified where necessary. Entering data is likewise very straightforward. All you need do is click on an individual cell to enter a value. Figure one below shows a sample sheet which was generated with one of the data sets supplied with 1st.

Figure one: a sample 1st sheet

Statistics

1st seems to be able to perform an extremely comprehensive range of statistical calculations on any data set. One can for example scrutinise one variable at a time or plump for a more ambitious multivariate analysis. What 1st will not do however is generate meaningful results from data which are meaningless. It therefore pays to plan your study very carefully before actually collecting and analysing any data. 'Garbage in garbage out' is an expression which the budding data analyst would do well to remember.

1st's statistical facilities can be accessed by clicking the middle mouse button over the sheet which reveals a 'Data Ctrl' menu from which a range of statistics can be selected (see figure two below).

Figure two: selecting statistics in 1st

1st also generates textual or graphical report windows on the basis of the analysis that the user selects. Figure three (a Draw file) shows a regression plot which was generated after following one of the supplied tutorials whilst the information in report one was produced by generating a textual report window and saving it as a text file.

Analysis

Analysis is, in contrast, essentially a textual analysis program. It supports a wide range of analytic procedures including concordances, keywords-in-context (kwic) and the identification of co-occurrences. Figure four below shows the main window which is used to drive the application.

Figure four: the main window used to drive Analysis

Installation is just as straightforward as it is with 1st. Analysis is simply placed in its own directory on your hard disc and run in the usual RISC OS way. The user basically double-clicks on the Analysis icon and the application's icon appears on the icon bar.

Analysis is then activated by dragging and dropping the plain text file that you wish to study onto the icon bar icon. The main window consequently appears and the user selects the required analysis from the available menu before clicking on the 'OK' button. The application finally produces an output file which contains the results of the analysis.

Usage

As Analysis makes such an eclectic range of techniques available to the researcher, its target audience will consequently be varied although it will be particularly useful to anyone interested in linguistics. However, researchers who deal with qualitative data can make use of its facilities.

Think, for example, of the researcher who wishes to investigate the values which a group of people hold. After interviewing each person the researcher would carefully read through each interview transcript in order to identify points of commonality. Re-occurring themes would then be noted in the margin of each transcript before considering how they were interrelated. In this way, the purely qualitative researcher would be able to construct a picture of the 'conceptual universe' of the respondents by discovering key categories or values.

Moreover, Analysis could assist in this endeavour by making use of its 'List Frequency of Words' option. By counting word frequencies, Analysis allows the researcher to discover points of commonality between interview transcripts in a much more effective way than would be possible by traditional pen and paper methods. Figure five below illustrates the output which is produced after applying this option to a sample file.

Figure five: example output from Analysis

Evaluation

So how do 1st and Analysis measure up?

1st

  • Ease of use: 1st is, without doubt, a very easy package to use. It is not entirely intuitive but this is not a particularly valid criticism when one bears in mind that it is such a highly specialised piece of software. In any case, it is certainly more user-friendly than the Statistical Package for the Social Sciences (SPSS) - the main stats package for Windows 98 and Mac OS - or its rival Minitab. As for comparisons with the Linux stats program xlispstat, I would say that 1st is also preferable, although this superiority could disappear as Linux distributions and packages become easier to install.

  • Facilities: 1st indisputably provides an extremely comprehensive range of statistical facilities. One only has to make a cursory examination of the options which can be accessed from the 'Statistics' menu to appreciate that 1st is very powerful. The social scientist will for example appreciate the availability of Cohen's kappa, Cramer's phi and the ever-useful Chi-Square whilst the oncologist will marvel at the availability of Kaplan Meier Survival Analysis.

    In short, even the most discerning statistician should be satisfied with 1st's facilities.

  • Help: 1st also provides an excellent on-line manual which can either be accessed directly by the user or driven interactively from the main program as the user selects a particular statistical test. Figure six below provides an illustration of the manual in action. The manual is also very comprehensive and has the added plus point of being extensible should the user wish to add more information. I for one can see that this feature might prove invaluable in an educational setting. I do however, prefer a paper manual and the makers of 1st have again met this criticism by supplying a 38-page hard-copy manual. Indeed, the paper manual is a good place to start for the newcomer to 1st because it contains two easy-to-follow tutorials.

    Figure six: the manual in action

    If all this were not enough, Serious Statistical Software additionally supply sample data sets in 1st and CSV formats as well as some interesting articles on introductory statistics and school performance data.

    I consequently cannot be more succinct than to say that 1st comes with a quite brilliant set of instructional tools.

  • Service: with regard to customer service, I can unreservedly say that it is excellent. On receipt of the program I was delighted to find a letter which addressed my own statistical needs. Serious also offer an advisory service for users of the software. This is an extremely valuable facility when one considers the expense of hiring a statistician. Finally, Mr Edwards of SSS reports that the program is under continuous development and that some statistical routines have been included in the program as a result of the suggestions of clients.

    Again, hats off to Serious Statistical Software.

  • Sub-sets: of course, a fully-featured stats package may be just a little bit too daunting for your requirements. SSS seem to have taken this eventuality into account by developing 1st Junior and 1st Elementary - two restricted versions of their main product. The former app is designed to meet the needs of A-Level students whilst the latter app is designed for the pre A-Level ability range. Indeed both products should be in great demand when one considers that the statistical analysis of data plays an increasingly important part in all school subjects.

  • Pricing: The price of 1st compares very favourably with its Windows cousins. A single non-educational copy of 1st costs £189 which is an absolute bargain when it is borne in mind that a stand-alone, non-networkable copy of Minitab 12 for Windows currently sells for $975.

    So you make a considerable saving if you stick with RISC OS and it's even more cost-effective if you only have need of one of 1st's sub-sets. A full price list is given at the end of this article.

Analysis

  • Ease-of-Use: in general, Analysis is relatively easy to use although there are a number of points which the novice user should take into account. Where the researcher wishes to compare a piece of text with a dictionary (which may be loaded into the program), it is possible that Analysis will report an 'inadequate space error'. This can be rectified by exiting the Desktop but this may be a bit too intimidating for the non-techie user. Moreover, the wimpslot may need to be increased for heavy-duty analysis and this can similarly seem a little offputting. That said, Analysis is a very intuitive piece of software, provided that you know what you want to accomplish.

  • Facilities: Analysis provides a very interesting mix of features although I personally would have liked a qualitative hypothesis testing module. It cannot therefore compete with programs like NUD.IST and Ethnograph for Windows. Nevertheless, the choice of features is a valuable one and includes an index of readability measure and lemmatisation. The program's also available for Windows and Digital Unix - cross-platform availability is, without doubt, another plus point in the program's favour.

  • Help: Analysis is bundled with a help file in Impression format which is excellent, but assumes a certain level of background knowledge.

  • Pricing: Analysis is freely available for trial purposes from HENSA, and there's a copy in the SOFTWARE directory on this CD-ROM. For further details contact the author.

Conclusion

I have no hesitation in recommending 1st or Analysis to the readers of RISC World. 1st is a superb piece of software which provides all the statistical facilities that the professional researcher or student is ever likely to need. If the RISC OS market recovers (and I for one remain optimistic) it should meet the statistical needs of many a high school and university student. One has only to think of the popularity of A-Level subjects like Maths with Statistics or Sociology and Psychology to appreciate that there is a potential market ready to be tapped. I am however, slightly more circumspect in my praise for Analysis. Although it has the makings of an excellent textual analysis package, some important analytic techniques are missing.

But it seems uncharitable to be too critical. After all, both programs are obviously a testament to the commitment of RISC OS programmers to our platform. So if you are thinking of getting into Data Analysis for RISC OS, check out 1st and Analysis. You won't regret it.

Product details

Product: 1st Elementary, 1st Junior and 1st
Supplier: Serious Statistical Software
Standard pricing: Product Single use Site licence
1st Elementary £65 £199
1st Junior £99 £299
1st £189 £499
Educational use pricing: Product Single use Site licence
1st Elementary £49 £149
1st Junior £79 £249
1st £149 £349
Address: 19 Station Road, Blackwell, Bromsgrove B60 1QB
Tel: 0121-445 6887
E-mail sss@argonet.co.uk
Web www.serious-stats.co.uk

Product: Analysis
Supplier: Giovanni Lo Conti
Address: Via G. Bizzozero 7, 00123 Roma, Italy
Tel: Int +39 6 30 31 16 11
E-mail: mc4386@mclink.it
WWW: http://micros.hensa.ac.uk/cgi-bin/browser/local/riscos/textprocess/

Paul Webb

 Index