# Biostatistics I – Rapid Review

mode
The number that occurs most often in a set of data; Not effected by extreme values

median
A measure of central tendency for one variable indicating the point or score at which half the cases are higher and half are lower.

null hypothesis
Hypothesis that predicts NO relationship between variables. The aim of research is to reject this hypothesis

type 1 error
False Positive; (alpha) Rejecting the Null Hypothesis when it is true

type 2 error
Shows there is not an effect or difference when one exists: Stating that the null hypothesis is true, when it is in fact false

Categorical variables (qualitative)
Nominal (blood type A,B.AB,O), Ordinal (GCS 3-15)

Numerical variables (Metric, Quantitative) Discrete (counts – frequency of taking medicine), Continuous (measures – weight, height)

Ordinal variables NO UNITS – order (or rank) data in terms of degree. They do not establish the numeric difference between data points. They indicate only that one data point is ranked higher or lower than another

nominal variables
NO UNITS – Set of categories that have different names “Having to do with names” Measurements on a nominal scale label characterize observations But do not make quantitative distinctions between observations – examples include: Race Gender etc Always discrete

discrete variables
COUNTED UNITS – can be placed in separate “bins” (ex: # of tv’s in a household, years spent in college, # of three-shots made)

continuous variables
MEASURED UNITS – can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals.

Characteristics of nominal variables
no unit of measurement, ARBITRARY ORDERING OF

CATEGORIES
blood type, often with frequencies – number of type A, AB

Characteristics of ordinal variables
e.g., allocating 90 pts on a table based on their Glasgow Coma Scale DATA has NO UNITS OF MEASUREMENT

Other main points about ordinal categorical variables GCS –
The difference between any pair of adjacent scores is NOT NECESSARILY the same as the difference between any other pair of adjacent scores

Example of continuous metric variables
weight in kg of 6 individuals

Example of discrete metric variables
The number of times that a group of 6 children with asthma used their inhalers in the past 24 hours
Nominal variables have no units of measurement, e.g., kg, cm, inches etc

Ordering of nominal variables…
There IS NO ORDERING as it is meaningless!!!!!!!!!!!!!!!!!!!!!! eg HAIR COLOR

Ordinal variables…main point is
They are in order…They are FREQUENCIES, they have NO UNITS OF MEASUREMENT..cannot be placed on a number line…basically just frequencies.

Unlike Nominal variables, the ordering is
NOT random, now we can order the categories in a MEANINGFUL way – severity of brain injury GCS

Metric continuous variables have the same
INTERVALS…blood pressure, birth weight, transaminase levels, bmi, peak flow..etc with UNITS OF MEASUREMENTS

eg of ordinal variables
degrees of pain, 1-10, socio economic status
is disease stage on the ordinal scale? Developmental stage? Yes, Yes

Is blood groups ordinal?
no, it is NOMINAL

DICRETE DATA def.
always whole NUMBERS…not 2.1 Data that can only take certain values.
For example: the number of students in a class (you can’t have half a student).
(Opposite of Continuous Data).

Counted units for
DISCRETE DATA

Age
metric continuous

Social class
Ordinal – commonly with frequencies

Number of children
Discrete

Age at birth of first child
metric continuous

age at menarche
metric continuous

menopausal state
nominal

age at menopause
metric continuous

Discrete

No of years taking oral contraceptives
Metric continuous…b/c decimals possible

No of months breastfeeding
Metric continuous

Lifetime use of hormone replacement therapy
Discrete

Mean years of hormone replacement therapy
metric discrete b/c no decimals

Family history of ovarian cancer
metric discrete b/c no decimals

History of benign breast disease
metric discrete b/c no decimals

Family history of breast cancer
metric discrete b/c no decimals

units of alcohol/week (%)
Ordinal for the ordering, but metric discrete for the units of alcohol

No of cigarettes/day
metric discrete b/c it is a frequency

Body mass index (kg/m3)
Metric continuous

Simple bar chart is appropriate when
only one variable is to be shown, eg, hair color of children receiving Malathion in the nit lotion study – equal width and spaces between bars

malathion
Cholinergic agonist, indirect acting

clustered bar chart
present more than one group, good for comparing relative sizes of the groups WITHIN each category. EG, sex of children receiving Malathion, according to hair color.

stacked bar chart
good to compare total number of subjects in one group, not good for category sizes between groups

What chart is used for continuous metric data?
The histogram (but must first group the values)

Frequency histogram points –
frequency plotted on the vertical axis, and group size on the horizontal axis, NO GAPS between the bars – continuous nature of underlying variable

Limitation of histograms
can present only ONE variable at a time

Box plot
A display that shows the distribution of values in a data set separated into four equal-sized groups. A box plot is constructed from the five summaries of the data. – Displays a data set along a number line using medians.

The only measure of location that could be used for categorical data
mode

Divide the distribution into 50% below and 50% above it
median

For an odd number of observations, how to calculate median?
(n+1)/2 =value of kth observation. Sample size is 5, median is 5+1/2 = value of 3rd observation

for even number of observations, how to calculate median?
is the average of the middle 2 scores = midpoint of (n)/2 & (n/2)+1. Sample size = 6, median is average of (6/2) & (6/2) + 1 = value of 3rd and 4th observations

is the median affected by extreme values or outliers?
no

what is the median best for?
skewed distribution, but not as RELIABLE as the mean, b/c does not account for all distributions

Changing the value of a single score may not affect the mode or median, but it will affect the ____ mean

The mean is not suitable for___
ordinal data

most preferred measure of central tendency as DESCRIPTION OF DATA and ESTIMATE OF THE PARAMETER
mean

In a symmetrical distribution
the mean = median = mode – no skewness

NEGATIVELY skewed distribution
MEAN < MEDIAN
mean < median < mode

POSITIVELY skewed distribution
MEAN > MEDIAN
Mode < Median < Mean

mean is negatively effected by
skewed distributions

The range is
difference between highest and lowest score

In any normal curve, a constant proportion of the cases fall within
1,2, and 3 standard deviations of the mean

Within 1 sd:
68%

Within 2 sd:
95.4%

Within 3 sd:
99.7

With increase in sample size, standard deviation becomes
smaller

the larger the sample, the more
____ the data
accurate

meta analyses – different places, same study – combine data
to make the study more accurate, more sensitivity, lower variance, more accurate

observational study
researchers observe the subjects involved (asking questions, taking measurements, looking at clinical records), but they do dot manipulate data

experimental study
involves active intervention with the subjects

cohort study
A type of epidemiologic study where a group of exposed individuals (individuals who have been exposed to the potential risk factor) and a group of non-exposed individuals are followed OVER TIME to determine the incidence of disease

case control study
RETROSPECTIVE: A study type that uses cases (with the health problem) and compares them with controls (without the health problem) to find out what MAY HAVE caused the problem. A type of retrospective study.

Aetiology
the study of causation, or origination

Two kinds of prospective studies
Longitudinal, Experimental

Experimental studies (e.g., clinical trials, lab experiment)
Studies the possible “cause and effect” relationship between two variable

repeated cross sectional study
CHANGES OVER TIME: Data from 2+ points in time from different samples

cohort
A population group unified by a specific common characteristic, such as age, and subsequently treated as a statistical unit.

4 types of observational studies
Case-series, Cross-sectional, Cohort, Case-control

case-series study
studying several patients, similar but unusual symptoms, ex – Kaposi sarcoma in homosexual males

cross sectional study characteristics
collecting data only once from each subject, NOT for DIRECTION of any CAUSAL relationship, NOT GOOD for rare things

COHORT studies characteristics, AKA Follow-up, prospective, longitudinal
IDENTIFY RISK factors, FOLLOW over time-SMOKING (DOLL & HILL study on smoking doctors),

CASE-control studies characteristics
RETROSPECTIVE, longitudinal – TWO groups…one with, one without condition – but similar characteristics. Then BOTH groups are QUESTIONED about past exposure to possible RISK FACTORS.

Doll and hill
Both a cohort and case-control study on smoking and lung cancer – first a case-control study, then a cohort study

cohort vs case-control study
Plus: many potential cases, small sample size ok, cheap, quick results, NEG: controls, case selection, Recall bias of subjects, FINDINGS may conflict with other studies, COHORT more reliable, but not always a practical alternative

Cross sectional study limitations
Provide info on PREVALENCE, not INCIDENCE..no distribution data

Observational study PROS..
suitable for common diseases, prolonged study time, larger number of subjects, less selection bias, subjects usually volunteer, incidence is determined

Clinical trials
Experiments to compare two or more clinical treatments
IDEAL is RANDOMIZED, DOUBLE-BLIND

Four phases of clinical trials
1. Establish safety, dose finding, PK studies 2. Establish biological activity or potential efficacy 3. Randomized comparison of treatment 4. Long-term surveillance in broader population

PK study
pharmokinetic study

Blinding the pt
to eliminate response or placebo bias

Blinding the investigator
to eliminate treatment bias and researcher expectancy

Code of trial
Entrusting a disinterested third party to obtain the random numbers and decide on the allocation rules

The Gold Standard
double-blind randomized controlled trial

cross over randomized trial
Group 2: receives drub B (or a placebo)

confidence interval
A range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability.

Confidence interval: A way of admitting that any measurement from a sample is only ____
an ESTIMATE of the population. Although the estimate given from the sample is LIKELY to be close, the TRUE VALUES for the population may be ABOVE or BELOW the sample values.

The TRUE VALUES for the population in CONFIDENCE INTERVALS may be
ABOVE or BELOW the sample values

The TRUE mean of a confidence interval is most likely to be ____
SOMEWHERE within the specified range – ABOVE or BELOW the range.

TWO parts of a confidence interval
1. Standard error of the mean 2. Standard Z-score

Standard error of the mean
An estimation of QUALITY of the SAMPLE for the estimate. If the mean is 10 and the standard error of the mean is 2, then the true score is likely to fall somewhere between 8 and 12 or 10 +/- 2.

Standard Z-Score
The degree of confidence provided by the interval provided (Zorro was CONFIDENT that he could make that Z- score sign!!)

Confidence Interval (CI) of the mean calculated by:
Mean plus/minus Z-Score x Standard error
Z-Score for 95% confidence = 1.96
Z-Score for 99% confidence = 2.58

95% CI =
99% CI =
Mean +/- 2 x Standard error
Mean +/- 2.6 x Standard error

if p
reject the null hypothesis (reached statistical significance)

if p>0.05 then:
do not reject the null hypothesis (has not reached statistical significance)

p=0.05, then
the chance of a type 1 error is 5 in 100, or 1/20

The chance of a ____ error cannot be directly estimated from the p-value
type 2

Power
the capacity to detect a difference if there is one

Increasing sample size (n) leads to
Increase in power

why use p-value?
provides criterion for making decisions about the null hypothesis

limits to the p-value: the p-value does NOT tell us –
The chance that an individual will benefit, the percentage of pts who will benefit, the degree of benefit expected for a given pt

The most common method to increase the power is by
increasing the sample size