Descriptive Statistics


Descriptive and summary statistics

The output generated by this tool provides a variety of measures of central tendency and data dispersion for each column of sample data. This is advantageous before conducting further analysis in the form of hypothesis tests. The overall nature of the sample data gathered from this output may be influential in choice of subsequent parametric or non-parametric tests. For further details follow this linked section on statistical analysis.

Script operation

This tool operates in much the same way as most of the others with no specific departures from the usual methods needed.

Click here for information about general script usage.

Raw sample data may be entered as single or multiple samples, the data being arranged in columns. Typical output is shown below. Note that the statistics are computed for the three example columns and these have been labelled in the output under the headings 'Group A', 'Group B' and 'Group C'. Notice how the original samples were labelled in the same manner and that these titles have been incorporated in the output by inclusion in the input data range.

 
Raw data:

 Group A   Group B   Group C
      22       6.2        18
      15       5.0        16
       9       8.9        31
       7       7.8         8
       4       6.4         2
      45      11.4        36
      19       5.8        12
      26       6.2        16
      35       7.1        47
      49      10.4        22
	       9.7        37
	       8.7        52
	       6.4
	       7.7
	       7.8
	       7.4


 Spreadsheet output:

 Statistics  		 Group A     Group B         Group C
 Count: 			   10          16              12
 Sum:   			  231       122.9             297
 Mean(Arith.):  		 23.1      7.6813           24.75
 Mean(Geo.):    	      17.7873      7.4992         18.8119
 Mean(Harm.):   	      12.8471      7.3259         11.3004
 Mean(Quad.):   	      27.4645      7.8688         28.9698
 Mode:  			 None  Multi_(See_below)       16
 First Quartile:		    9         6.4              16
 Median:			 20.5        7.55              20
 Third Quartile:		   35         8.9              37
 Range: 			   45         6.4              50
 Maximum:       		   49        11.4              52
 Minimum:       		    4   	5       	2
 Std. Error:    	       4.9519      0.4409          4.5396
 Std. Deviation:	      15.6592      1.7638         15.7256
 Mean Deviation:		12.52      1.3688         13.2083
 Variance:      	     245.2111       3.111        247.2955
 Skewness:      	       0.4588      0.5742           0.355
 Kurtosis:      	      -1.0607     -0.4642          -1.016
 Confidence Level (95%)-low:  13.3943       6.817         15.8524
 Confidence Level (95%)-high: 32.8057      8.5455         33.6476
 Confidence Level (99%)-low:  10.3242      6.5436         13.0378
 Confidence Level (99%)-high: 35.8758      8.8189         36.4622

					     Mode  Count
					      6.2      2
					      6.4      2
					      7.8      2

Interpretation

There is much useful information contained within the output and the following points are particularly worth noting:

  1. Several mean values are calculated for each sample. Of these, the arithmetic mean is the usual, most widely used value required. The others have specialised applications (i.e., the geometric mean is used in averaging ratios and % change and is inappropriate if any data element is a negative value).
  2. In calculation of the median (the middle measurement in a set of data) the sample is divided into two groups. The sample may be further fractionally split into four equal parts, represented by the upper (3rd) and lower (1st) quartiles.
  3. In this particular example output there appears to be a significant difference in the variances of particular samples and there are also differences in modality. At first glance it would appear that increased sample sizes are necessary. If in this particular case this was not possible or if the samples still exhibited the same characteristics after further sampling, then the sample data could be assumed to violate the assumptions required by parametric testing.

    In this case it is preferable to employ a non-parametric testing approach.

    Related tools:
    Goodness of fit (x²) for normality: Detect normality using chi-square goodness of fit test.
    F-Ratio test: Variance ratio test.

  4. Skewness (the measure of degree of asymmetry in data distributions) and kurtosis (the shape of the data distribution) are also calculated. Neither calculations have units of measurement but their relationship to zero is important:

    Related tools:
    Goodness of fit (x²) for normality: Detect normality using chi-square goodness of fit test.
    Frequency histograms: Analyse data frequency distributions.



Back to Main Document