Friday, January 7, 2011

Setting Up The Excel Sheet For Calculating P Values

Enter data into the cells. The following are some recommended layouts if functions are used. If the Tool-Pac routines are used, you only need to be able to identify the ranges of the existing data.
ONE SAMPLE TESTS
MEAN
Name
μ
Mean
St. Dev
n
df
t
Probability
(1)
(2)
(3)
(4)
(5)
(6)
(7)
FOR A HYPOTHESIS ON ONE POPULATION MEAN (KNOWN), NORMALLY DISTRIBUTED, ONE TAIL TEST
POPULATION σ KNOWN
(1) From Claim
(2) Function AVERAGE of data
(3) From Claim
(4) Function COUNT on data
(5) Cell(4) - 1
(7) Function NORMDIST( (2), (1), (3), TRUE )
z = (⎯X − μ) / ( σ / √ n)
σ NOT KNOWN, LARGE SAMPLE
(1) From Claim
(2) Function AVERAGE on data
(3) From STDEV on data
(4) Function COUNT on data
(5) Cell(4) - 1
(7) Function NORMDIST( (2), (1), (3), TRUE )
z = (⎯X − μ) / ( s / √ n)
σ NOT KNOWN, SMALL SAMPLE
(1) From Claim
(2) Function AVERAGE on Data
(3) From STDEV on data
(4) Function COUNT on data
(5) = (4) – 1
(6) = ( ( (1) – (2) ) ) / ( (4) / SQR (3) )
(7) Function TDIST( (6), (5), 1 or 2 from Claim )
t = (⎯X − μ) / ( s / √ n)
FOR A HYPOTHESIS ON ONE POPULATION MEAN OR MEDIAN (KNOWN), NORMALLY DISTRIBUTED, TWO TAIL TEST
POPULATION σ KNOWN
Name
μ
Sigma
Probability
(1)
(2)
(3)
(1) From Claim
(2) From Claim
(3) Function ZTEST( range of data, (1), (2) )
POPULATION σ UNKNOWN, LARGE SAMPLE
Name
μ
Probability
(1)
(2)
(1) From Claim
(2) Function ZTEST( range of data, (1) ..leave blank.. )
MEDIAN
Name
Population Median
# of + signs
# of – signs
n
Smaller of a or b
z Value
Probability
(1)
(2)
(4)
(3)
(5)
(6)
(7)
FOR A HYPOTHESIS ON ONE POPULATION MEDIAN VALUE, NONPARAMETRIC, LARGE SAMPLES (N > 25)
(1) Function MEDIAN on data
(2) Manually count the number of data values greater than or equal to (1)
(3) Function COUNT on data
(4) = (3) – (2)
(5) Smaller of (2) or (4)
(6) = ( (5) + 0.5 – (3) / 2) / ( SQR(3) / 2 )
(7) Function NORMSDIST( (6) )
z = [ x + 0.5 – n / 2 ] / [ √ n / 2 ]
FOR A HYPOTHESIS ON PROPORTION
Name
p
q
n
np
nq
Number of successes
p-hat
z
Probability
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
FOR A HYPOTHESIS ON ONE POPULATION PROPORTION, LARGE SAMPLES WHERE NP>5 AND NQ>5
(1) From Claim
(2) = 1 – (1)
(3) From Claim
(4) = (1) * (3)
(5) = (2) * (3)
(6) From Claim
(7) = (6) / (3)
(8) = ( (7) – (1) ) / SQR( (1) * (2) / (3) )
(9) Function NORMSDIST( (8) )
z = (p-hat − p) / √ (p × q / n)
FOR A HYPOTHESIS ON VARIANCE
Name
Population σ2
Variance
n
df
Chi
Probability
(1)
(2)
(3)
(4)
(5)
(6)
FOR A HYPOTHESIS ON ONE POPULATION VARIANCE
(1) From Claim
(2) From VAR on data
(3) Function COUNT on data
(4) = (3) - 1
(5) = (4) * (2) / (1)
(6) Function CHIDIST( (5), (4) )
χ2 = ( n − 1 ) × s2 / σ2
FOR A HYPOTHESIS ON CORRRELATION COEFFICIENT
Name
Correlation Coefficient r
n
df
t
Probability
(1)
(2)
(3)
(4)
(5)
(1) Function CORREL(Range of x data, Range of y data) or from Claim
(2) Function COUNT(Range of x) or from Claim
(3) = (2) - 2
(4) = (1) / SQR( (1 – (1) * (1) ) / (3) )
(5) Function TDIST( (4), (3), 1 or 2 from Claim )
t = r / √ ( (1 – r2 ) / (n – 2) )
TWO SAMPLE TESTS
FOR A HYPOTHESIS ON TWO POPULATIONS
MEANS, TWO INDEPENDENT SAMPLES
(1) Range of data set 1
(2) Range of data set 2
(3) Tails, either 1 or 2
1 is for a one tailed test
2 is for a two tailed test
(4) Characteristics of the two data sets
1 is for paired data values, having equal numbers of values (no missing values)
2 is for the equal variance (homoscedastic) characteristic
3 is for the unequal variance (heteroscedastic) characteristic
(5) Probability = TTEST( (1), (2), (3), (4) )
(6)
DIFFERENCES, PAIRED DEPENDENT SAMPLES
t = (⎯d − μd ) / ( sd / √ n)
Equal Variance
t = {(⎯X1 −⎯X2 ) −(μ1 − μ2)} / sm
sm = sp × √ ( 1/ n1 + 1/ n2 )
sp = √ (pooled variance)
(pooled variance) = { (df × s2 )1 + (df × s2)2 } / dfTotal
Unequal Variance
t = {(⎯X1 −⎯X2 ) −(μ1 − μ2)} / sm
sm = √ { (s2 / n)1 + (s2 / n)2 }
PROPORTIONS, RANDOM INDEPENDENT LARGE SAMPLES
(n1p>5, n1q>5, n2p>5 and n2q>5)
Subscript
Name
p
q
n
np
nq
Number of successes
p-hat
SD
z
Probability
1
(1)
(2)
(3)
(4)
(5)
(6)
(7)
2
(8)
(9)
(10)
(11)
(12)
(13)
(14)
Combined
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(1) From Claim
(2) = 1 – (1)
(3) From Claim
(4) = (1) * (3)
(5) = (2) * (3)
(7) From Claim
(8) = (6) / (3)
(8) From Claim
(9) = 1 – (1)
(10) From Claim
(11) = (1) * (3)
(12) = (2) * (3)
(13) From Claim
(14) = (6) / (3)
(17) = (3) + (10)
(18) = (6) + (13)
(15) = (18) / (17)
(16) = 1 – (15)
(19) = SQR( (15) * (16) * ( 1 / (3) + 1 / (10) ) )
(20) = ( (7) – (14) – (1) + (8) ) / (19)
(21) Function NORMSDIST ( (20) )
z = {(p-hat1 − p-hat2 ) – (p1 – p2 )} / σm
p = (x1 + x2) / (n1 + n2)
q = 1 – p
σm = √ { (p × q / n1 ) + (p × q / n2 }
VARIANCES, TWO INDEPENDENT SAMPLES For a hypothesis on two population variances
(1) Range of data set 1
(2) Range of data set 2
(3) Probability = FTEST( (1), (2) )
F = s21 / s22
The FTEST function can be used here, but it gives incorrect p values.
DATA ANALYSIS ROUTINE OUTPUTS
The data analysis routines output the following table:
df
Degrees of freedom
t Stat
Calculated t value
P(T<=t) one-tail P1 t Critical one-tail P(T<=t) two-tail P2 t Critical two-tail You have to translate the P1 and P2 values to determine the probability of the given hypothesis being true. The output table is confusing here. Hypothesis Symbol Actual Data Values B>A
Actual Values B=A
Actual Data Values BA
1 – p1
p1
p1
2
B>=A
1 – p1
p1
p1
3
B α: Fail to reject H0
ii. If Probability ≤ α: Reject H0
d. If the test is a true hypothesis test, the calculated p value is NOT reported, only the decision is reported.
e. Decision: Reject H0:
i. H0 is the claim: There is enough evidence at … to reject the claim of …
ii. Ha is the claim: There is enough evidence at … to support the claim of ...
f. Decision: Fail to Reject H0:
i. H0 is the claim: There is not enough evidence at … to reject the claim of …
ii. Ha is the claim: There is not enough evidence at … to support the claim of …
g. Changing a hypothesis “in mid stream” presents problems in assigning an appropriate p value under the Neyman-Pearson concept. This requires now an inductive approach to the problem, evaluating multiple hypotheses. This is beyond Excel. Goodman (1999) argues then that a Bayesian approach be taken, in which the likelihood ratios of each hypothesis on the data be calculated and combined with prior probability structures for obtaining the correct probability values. Excel does not have the capability to obtain likelihood ratios from a set of data.

Author : David Heiser [dheiser594@gmail.com]

Followers

Recent Comments

  © Blogger template 'Perfection' by Ourblogtemplates.com 2008

Back to TOP