The number that occurs most often in a set of data; Not effected by extreme values
A measure of central tendency for one variable indicating the point or score at which half the cases are higher and half are lower.
Hypothesis that predicts NO relationship between variables. The aim of research is to reject this hypothesis
type 1 error
False Positive; (alpha) Rejecting the Null Hypothesis when it is true
type 2 error
Shows there is not an effect or difference when one exists: Stating that the null hypothesis is true, when it is in fact false
Categorical variables (qualitative)
Nominal (blood type A,B.AB,O), Ordinal (GCS 3-15)
Numerical variables (Metric, Quantitative) Discrete (counts – frequency of taking medicine), Continuous (measures – weight, height)
Ordinal variables NO UNITS – order (or rank) data in terms of degree. They do not establish the numeric difference between data points. They indicate only that one data point is ranked higher or lower than another
NO UNITS – Set of categories that have different names “Having to do with names” Measurements on a nominal scale label characterize observations But do not make quantitative distinctions between observations – examples include: Race Gender etc Always discrete
COUNTED UNITS – can be placed in separate “bins” (ex: # of tv’s in a household, years spent in college, # of three-shots made)
MEASURED UNITS – can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals.
Characteristics of nominal variables
no unit of measurement, ARBITRARY ORDERING OF
blood type, often with frequencies – number of type A, AB
Characteristics of ordinal variables
e.g., allocating 90 pts on a table based on their Glasgow Coma Scale DATA has NO UNITS OF MEASUREMENT
Other main points about ordinal categorical variables GCS –
The difference between any pair of adjacent scores is NOT NECESSARILY the same as the difference between any other pair of adjacent scores
Example of continuous metric variables
weight in kg of 6 individuals
Example of discrete metric variables
The number of times that a group of 6 children with asthma used their inhalers in the past 24 hours
Nominal variables have no units of measurement, e.g., kg, cm, inches etc
Ordering of nominal variables…
There IS NO ORDERING as it is meaningless!!!!!!!!!!!!!!!!!!!!!! eg HAIR COLOR
Ordinal variables…main point is
They are in order…They are FREQUENCIES, they have NO UNITS OF MEASUREMENT..cannot be placed on a number line…basically just frequencies.
Unlike Nominal variables, the ordering is
NOT random, now we can order the categories in a MEANINGFUL way – severity of brain injury GCS
Metric continuous variables have the same
INTERVALS…blood pressure, birth weight, transaminase levels, bmi, peak flow..etc with UNITS OF MEASUREMENTS
eg of ordinal variables
degrees of pain, 1-10, socio economic status
is disease stage on the ordinal scale? Developmental stage? Yes, Yes
Is blood groups ordinal?
no, it is NOMINAL
DICRETE DATA def.
always whole NUMBERS…not 2.1 Data that can only take certain values.
For example: the number of students in a class (you can’t have half a student).
(Opposite of Continuous Data).
Counted units for
Ordinal – commonly with frequencies
Number of children
Age at birth of first child
age at menarche
age at menopause
lifetime use of oral contraceptives
No of years taking oral contraceptives
Metric continuous…b/c decimals possible
No of months breastfeeding
Lifetime use of hormone replacement therapy
Mean years of hormone replacement therapy
metric discrete b/c no decimals
Family history of ovarian cancer
metric discrete b/c no decimals
History of benign breast disease
metric discrete b/c no decimals
Family history of breast cancer
metric discrete b/c no decimals
units of alcohol/week (%)
Ordinal for the ordering, but metric discrete for the units of alcohol
No of cigarettes/day
metric discrete b/c it is a frequency
Body mass index (kg/m3)
Simple bar chart is appropriate when
only one variable is to be shown, eg, hair color of children receiving Malathion in the nit lotion study – equal width and spaces between bars
Cholinergic agonist, indirect acting
clustered bar chart
present more than one group, good for comparing relative sizes of the groups WITHIN each category. EG, sex of children receiving Malathion, according to hair color.
stacked bar chart
good to compare total number of subjects in one group, not good for category sizes between groups
What chart is used for continuous metric data?
The histogram (but must first group the values)
Frequency histogram points –
frequency plotted on the vertical axis, and group size on the horizontal axis, NO GAPS between the bars – continuous nature of underlying variable
Limitation of histograms
can present only ONE variable at a time
A display that shows the distribution of values in a data set separated into four equal-sized groups. A box plot is constructed from the five summaries of the data. – Displays a data set along a number line using medians.
The only measure of location that could be used for categorical data
Divide the distribution into 50% below and 50% above it
For an odd number of observations, how to calculate median?
(n+1)/2 =value of kth observation. Sample size is 5, median is 5+1/2 = value of 3rd observation
for even number of observations, how to calculate median?
is the average of the middle 2 scores = midpoint of (n)/2 & (n/2)+1. Sample size = 6, median is average of (6/2) & (6/2) + 1 = value of 3rd and 4th observations
is the median affected by extreme values or outliers?
what is the median best for?
skewed distribution, but not as RELIABLE as the mean, b/c does not account for all distributions
Changing the value of a single score may not affect the mode or median, but it will affect the ____ mean
The mean is not suitable for___
most preferred measure of central tendency as DESCRIPTION OF DATA and ESTIMATE OF THE PARAMETER
In a symmetrical distribution
the mean = median = mode – no skewness
NEGATIVELY skewed distribution
MEAN < MEDIAN
mean < median < mode
POSITIVELY skewed distribution
MEAN > MEDIAN
Mode < Median < Mean
mean is negatively effected by
The range is
difference between highest and lowest score
In any normal curve, a constant proportion of the cases fall within
1,2, and 3 standard deviations of the mean
Within 1 sd:
Within 2 sd:
Within 3 sd:
With increase in sample size, standard deviation becomes
the larger the sample, the more
____ the data
meta analyses – different places, same study – combine data
to make the study more accurate, more sensitivity, lower variance, more accurate
researchers observe the subjects involved (asking questions, taking measurements, looking at clinical records), but they do dot manipulate data
involves active intervention with the subjects
A type of epidemiologic study where a group of exposed individuals (individuals who have been exposed to the potential risk factor) and a group of non-exposed individuals are followed OVER TIME to determine the incidence of disease
case control study
RETROSPECTIVE: A study type that uses cases (with the health problem) and compares them with controls (without the health problem) to find out what MAY HAVE caused the problem. A type of retrospective study.
the study of causation, or origination
Two kinds of prospective studies
Experimental studies (e.g., clinical trials, lab experiment)
Studies the possible “cause and effect” relationship between two variable
repeated cross sectional study
CHANGES OVER TIME: Data from 2+ points in time from different samples
A population group unified by a specific common characteristic, such as age, and subsequently treated as a statistical unit.
4 types of observational studies
Case-series, Cross-sectional, Cohort, Case-control
studying several patients, similar but unusual symptoms, ex – Kaposi sarcoma in homosexual males
cross sectional study characteristics
collecting data only once from each subject, NOT for DIRECTION of any CAUSAL relationship, NOT GOOD for rare things
COHORT studies characteristics, AKA Follow-up, prospective, longitudinal
IDENTIFY RISK factors, FOLLOW over time-SMOKING (DOLL & HILL study on smoking doctors),
CASE-control studies characteristics
RETROSPECTIVE, longitudinal – TWO groups…one with, one without condition – but similar characteristics. Then BOTH groups are QUESTIONED about past exposure to possible RISK FACTORS.
Doll and hill
Both a cohort and case-control study on smoking and lung cancer – first a case-control study, then a cohort study
cohort vs case-control study
Plus: many potential cases, small sample size ok, cheap, quick results, NEG: controls, case selection, Recall bias of subjects, FINDINGS may conflict with other studies, COHORT more reliable, but not always a practical alternative
Cross sectional study limitations
Provide info on PREVALENCE, not INCIDENCE..no distribution data
Observational study PROS..
suitable for common diseases, prolonged study time, larger number of subjects, less selection bias, subjects usually volunteer, incidence is determined
Experiments to compare two or more clinical treatments
IDEAL is RANDOMIZED, DOUBLE-BLIND
Four phases of clinical trials
1. Establish safety, dose finding, PK studies 2. Establish biological activity or potential efficacy 3. Randomized comparison of treatment 4. Long-term surveillance in broader population
Blinding the pt
to eliminate response or placebo bias
Blinding the investigator
to eliminate treatment bias and researcher expectancy
Code of trial
Entrusting a disinterested third party to obtain the random numbers and decide on the allocation rules
The Gold Standard
double-blind randomized controlled trial
cross over randomized trial
Group 1: receives drug A
Group 2: receives drub B (or a placebo)
A range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability.
Confidence interval: A way of admitting that any measurement from a sample is only ____
an ESTIMATE of the population. Although the estimate given from the sample is LIKELY to be close, the TRUE VALUES for the population may be ABOVE or BELOW the sample values.
The TRUE VALUES for the population in CONFIDENCE INTERVALS may be
ABOVE or BELOW the sample values
The TRUE mean of a confidence interval is most likely to be ____
SOMEWHERE within the specified range – ABOVE or BELOW the range.
TWO parts of a confidence interval
1. Standard error of the mean 2. Standard Z-score
Standard error of the mean
An estimation of QUALITY of the SAMPLE for the estimate. If the mean is 10 and the standard error of the mean is 2, then the true score is likely to fall somewhere between 8 and 12 or 10 +/- 2.
The degree of confidence provided by the interval provided (Zorro was CONFIDENT that he could make that Z- score sign!!)
Confidence Interval (CI) of the mean calculated by:
Mean plus/minus Z-Score x Standard error
Z-Score for 95% confidence = 1.96
Z-Score for 99% confidence = 2.58
95% CI =
99% CI =
Mean +/- 2 x Standard error
Mean +/- 2.6 x Standard error
reject the null hypothesis (reached statistical significance)
if p>0.05 then:
do not reject the null hypothesis (has not reached statistical significance)
the chance of a type 1 error is 5 in 100, or 1/20
The chance of a ____ error cannot be directly estimated from the p-value
the capacity to detect a difference if there is one
Increasing sample size (n) leads to
Increase in power
why use p-value?
provides criterion for making decisions about the null hypothesis
limits to the p-value: the p-value does NOT tell us –
The chance that an individual will benefit, the percentage of pts who will benefit, the degree of benefit expected for a given pt
The most common method to increase the power is by
increasing the sample size