| WINKS Manual Index | Help | Home | Tutorials |

WINKS Online Manual


Chapter 4 Part 1

Detailed Statistics and Histogram

This option calculates the mean, standard deviation, median, standard error of the mean, minimum, maximum, sum, variance and other descriptive statistics for a single variable (field) from a set of data.

If your data is already in a database, perform the analysis using the following steps. For example, suppose you want to calculate statistics for the TIME1 field in the EXAMPLE database.

Step 1: Choose Open Database from the File menu. (File icon) Select the EXAMPLE database.

Step 2: From the Analyze menu, choose the "Detailed/One Variable" option from the Descriptives sub-menu. (or click the X-bar icon).

Step 3: Choose the TIME1 field to analyze. The results appear in the viewer. Example output is shown below:

 
   ---------------------------------------------------------------------------
   Descriptive Statistics                             
   ---------------------------------------------------------------------------
   Variable Name is TIME1

   N         = 50                    Missing or Deleted = 0
   Mean      = 21.268                     St. Dev (n-1) = 1.71696
   Median    = 21.30                        St. Dev (n) = 1.6997
   Minimum   = 17.00                             S.E.M. = 0.24281
   Maximum   = 24.20                          Variance  = 2.94793
   Sum       = 1063.40                       Coef. Var. = 0.08073
   ---------------------------------------------------------------------------
   Percentiles:                                Tukey Five Number Summary:
   0.0%          = 17.00   Minimum             Minimum  = 17.00
   0.5%          = 17.00                       25th     = 20.15
   2.5%          = 17.3025                     Median   = 21.30
   10.0%         = 18.91                       75th     = 22.60
   25.0%         = 20.15   Quartile            Maximum  = 24.20
   50.0%         = 21.30   Median
   75.0%         = 22.60   Quartile
   90.0%         = 23.50   
   97.5%         = 24.1175 
   99.5%         = 24.20                       Test for normality results:
   100.0%        = 24.20   Maximum             D = .093     p >= 0.20
 



   Five number summary consists of the 0, 25, 50, 75 and 100th percentiles.

   Confidence Intervals about the mean:
   ---------------------------------------------------------------------------
   80 % C.I. based on a t(49) critical value of 1.3 is (20.95234, 21.58366)
   90 % C.I. based on a t(49) critical value of 1.68 is (20.86007, 21.67593)
   95 % C.I. based on a t(49) critical value of 2.01 is (20.77994, 21.75606)
   98 % C.I. based on a t(49) critical value of 2.41 is (20.68282, 21.85318)
   99 % C.I. based on a t(49) critical value of 2.68 is (20.61726, 21.91874)

   The normality test suggests that the data are approx. normally distributed.
   The test for normality is a modified Kolmogorov-Smirnov test based on
   papers by Lilliefors and Dallal & Wilkinson. References in latenews.txt.

When the output screen appears, click the Graph button to display a histogram of the data. An example screen of the graph for this example is shown in  Example 1 in Chapter 1. 

Here are some definitions concerning the statistics reported:

C. I. -  Confidence interval  - A range that describes (with  some confidence -- usually 95% confidence) where the actual mean of the population from which the data are drawn probably lies. That is, the true mean (in the example above) is somewhere between 20.79 and 21.23, with 95% confidence.

MAXIMUM - The largest number.

MEAN - A measure of central tendency. The arithmetic average. For  example if you average the three grades 82, 100 and 88 (82+100+88)/3 = 90  -- the average (or mean) is 90.

MEDIAN - A measure of central tendency. The mean is a statistic such that 50% of all numbers in the sample are above the mean and 50% are below the mean. For example, in the  list 1, 2, 3, 4, 5 the median would be 3.

MINIMUM - The smallest number.

MISSING - Reports how many numbers had a missing value code.

N - How many numbers were used to calculate the statistics.

PERCENTILES - Tells you what percent of numbers are lower than the  percentile. For example, the 50th percentile is the median. 

S.E.M. - The Standard Error of the Mean measures the precision of the sample mean as an estimate of the population mean.

ST. DEV. - Standard Deviation - measure of the spread of the data around the mean value. It is calculated two ways, using n-1 as a divisor and using n as a divisor. Usually, most people use the n-1 version.

SUM - The total of all the numbers added together

TEST FOR NORMALITY - (Professional edition) - This is a test that the data are drawn from a normal distribution. The test statistic is D. If the p <= 0.05, there is evidence that the data are NOT from a normal distribution.

TUKEY 5 NUMBER SUMMARY - Essentially, the 0th, 25th, 50th, 70th and 100th percentile. See the Hoaglin, et al. reference.

VARIANCE - A measure of the spread of the data -- the square of the st. dev.  

Normal Probability Plot

To display a Normal Probability plot first display the Graph from the Detailed Statistics screen, then click on the “Distribution button” (second from left, next to Options button). This button causes the display to cycle through several views of the data, including a probability plot.


Summary Statistics on a Number of Variables

This option allows you to calculate statistics on several variables (sample size, mean, standard deviation, minimum, maximum, and standard error of the mean). If you have a grouping variable in your database, you may request output of summary statistics by group.

Suppose you want to know the means of all the quantitative variables (AGE, TIME1, TIME2, TIME3, TIME4, STATUS) within each of the three groups (A, B, C) in the EXAMPLE database. Follow these steps:

Step 1: Choose Open Database from the FILE menu. (file icon) Select the EXAMPLE database. Step 2: Choose the Descriptives option from the ANALYZE menu.

Step 3: Choose the "Summary/Several Variables" option from the Descriptives menu.

Step 4: Choose the data fields to analysis: Select the fields AGE, TIME1, TIME2, TIME3, TIME4.

Step 5: Select STATUS as the Group field.

The results viewer will appear displaying summary statistics by STATUS on the group of variables you have selected.

If you do not already have your data in a file, choose New Database from the FILE menu. Then, select the database type you want to create, enter and save the data, and perform the analysis as in the above example. 

Detailed Statistics from Data Entered by Counts

If your data is grouped so that you know how many of each number you have (i.e., you have 12  people 13 years old, 5 people 14 yrs old, 6 people 15 yrs old, etc.) you can enter the data by counts.

If your data is already in a database, perform the analysis using the following steps. For example, suppose you want to calculate statistics for the Value field in the COUNTS database.

Step 1: Choose Open Database from the FILE menu. Select the COUNTS database.

Step 2: Choose the Descriptive Statistics option from the Analyze menu.

Step 3: Choose the "Detailed Statistics by Counts" option from the Descriptive Statistics sub-menu.

Step 4: Choose Value as the data field, and Counts as the count field.

If you do not already have your data in a file, choose New Database from the FILE menu. Then, select the database type you want to create (usually "Statistics From Count Data"), enter and save the data, and perform the analysis as in the above example.

The same output as for the Detailed Statistics option is displayed. When the output screen appears, click the Graph button to display a histogram of the data.

Stem and Leaf Display

The Stem and Lead Display is a graph created from a series of numbers. The Stem part of the display is the leading digit for the data (such as 5 in 54) and the leaf is the trailing digit (such as the 4 in 54). When larger numbers are used, the rightmost digits are often ignored. For example, if the numbers range from 241 to 845, the stem might be the 2 to 8, representing 200 to 800, and the leaf would be 0 to 9, representing the 10's. The 1's place would be ignored. WINKS gives you options for choosing the magnitude of the stem and leaf values. The display below shows a display for DEFLATOR in the LONGLEY database:

4      8|3889
7      9|689
(4)   10|0148
    11|02456

Numbers to the left of the vertical bar "|" are stem values. Digits to the right of the | represents a leaf. The 3 on the first line represents 83. The scale to the left of the stem reports a cumulative count until the stem containing the median is found. The (4) reports which stem contains the median. In this display, we know the median is between 100 and 108. Following the median, the cumulative values count the number of values from the bottom of the display to the median.

For example:

Step 1: Choose Open Database from the FILE menu. (file icon) Select the LONGLEY database.

Step 2: Choose the Descriptives option from the ANALYZE menu.

Step 3: Choose the "Stem and Leaf Display" option from the Descriptives menu.

Step 4: Choose DEFLATOR as the data field to analyze.

Step 5: A dialog box will appear Select 1 for the "letter or digit" and leave the "Split stem value in half" option blank (unchecked).

A Stem and Leaf as show above will be displayed.

Approximate p-value Determination

This option calculates p-values for four test statistics: normal (z), student's t, F, chi-square. Enter the statistic, degrees of freedom and the calculated value of the statistic, and the program will tell you the p-value associated with that statistic.  The p-value associated with a calculated test statistic is the probability of obtaining a test statistic as extreme as or more extreme than the calculated value if the null hypothesis is true. The p-values calculated for the F and chi-square statistics are one-tailed because these tests are one-tailed when used in ANOVA or crosstabulation procedures. The p-values calculated for the z and t-statistics are two-tailed. If you want the p-value for a one-tailed z or t‑test, you must divide the two-tailed p-value in half. To calculate a p-value, follow these steps:

Step 1: From the Analyze pull-down menu, choose the Descriptive Statistics option.

Step 2: Select Approximate p-value determination.

Step 3: A dialog box will appear. Select what kind of value you want to calculate (t, z, Chi-Square or F)  For example, if you want to find the p-value for a t-statistic with 20 degrees of freedom that equals 2.0, select the t-statistic option.

 Step 4: For the Statistic Value, enter 2 (press tab) then enter 20 as the degrees of freedom.

 Step 5: Select calculate, and the p-value will appear.

 Step 6: Select Exit to end this procedure.  

 

Probability Calculator 

This option allows you to create tables for specific alpha (significance) levels for the z, t, F and Chi-Square distribution.

For example, select the probability calculator from the Analyze/Descriptives menu. Select the "Table" button. This will display the Critical Value calculator. Choose the Student-t option. Notice the significance level text box. The default values are 0.1 and 0.05; however, you can enter any levels here you want to calculate.

Also you can enter what degrees of freedom to use in the table. Click on Start and the table specified by the significance levels and degrees of freedom is calculated and displayed. To save a table, first click the "Capture" button, then "View." A table like the following will be displayed:

Critical values for Students t-test (one-tail).

DF    .010     .050
----  -----    ----
1     31.83    6.31
2     6.96     2.92
3     4.54     2.35
4     3.75     2.13
5     3.37     2.01
10    2.76    1.81
15    2.60    1.75

This table will match closely the critical values table in the back of a statistics text. Thus, you can now create your own tables tailored for your own use.

Continue to Chapter 4. Part 2. (Graphs and Charts.)  

     


| Previous Page | Next Chapter | WINKS Manual Index | Help | Home | Tutorials |