**mode**

The number that occurs most often in a set of data; Not effected by extreme values

**median **

A measure of central tendency for one variable indicating the point or score at which half the cases are higher and half are lower.

**null hypothesis **

Hypothesis that predicts NO relationship between variables. The aim of research is to reject this hypothesis

**type 1 error**

False Positive; (alpha) Rejecting the Null Hypothesis when it is true

**type 2 error**

Shows there is not an effect or difference when one exists: Stating that the null hypothesis is true, when it is in fact false

**Categorical variables (qualitative)**

Nominal (blood type A,B.AB,O), Ordinal (GCS 3-15)

**Numerical variables (Metric, Quantitative)** Discrete (counts – frequency of taking medicine), Continuous (measures – weight, height)

**Ordinal variables** NO UNITS – order (or rank) data in terms of degree. They do not establish the numeric difference between data points. They indicate only that one data point is ranked higher or lower than another

**nominal variables**

NO UNITS – Set of categories that have different names “Having to do with names” Measurements on a nominal scale label characterize observations But do not make quantitative distinctions between observations – examples include: Race Gender etc Always discrete

**discrete variables**

COUNTED UNITS – can be placed in separate “bins” (ex: # of tv’s in a household, years spent in college, # of three-shots made)

**continuous variables**

MEASURED UNITS – can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals.

**Characteristics of nominal variables**

no unit of measurement, ARBITRARY ORDERING OF**CATEGORIES** –

blood type, often with frequencies – number of type A, AB

**Characteristics of ordinal variables**

e.g., allocating 90 pts on a table based on their Glasgow Coma Scale DATA has NO UNITS OF MEASUREMENT

**Other main points about ordinal categorical variables GCS –**

The difference between any pair of adjacent scores is NOT NECESSARILY the same as the difference between any other pair of adjacent scores

**Example of continuous metric variables**

weight in kg of 6 individuals

**Example of discrete metric variables**

The number of times that a group of 6 children with asthma used their inhalers in the past 24 hours

Nominal variables have no units of measurement, e.g., kg, cm, inches etc

**Ordering of nominal variables… **

There IS NO ORDERING as it is meaningless!!!!!!!!!!!!!!!!!!!!!! eg HAIR COLOR

**Ordinal variables…main point is **

They are in order…They are FREQUENCIES, they have NO UNITS OF MEASUREMENT..cannot be placed on a number line…basically just frequencies.

**Unlike Nominal variables, the ordering is**

NOT random, now we can order the categories in a MEANINGFUL way – severity of brain injury GCS

**Metric continuous variables have the same**

INTERVALS…blood pressure, birth weight, transaminase levels, bmi, peak flow..etc with UNITS OF MEASUREMENTS

**eg of ordinal variables**

degrees of pain, 1-10, socio economic status

is disease stage on the ordinal scale? Developmental stage? Yes, Yes

**Is blood groups ordinal?**

no, it is NOMINAL

**DICRETE DATA def.**

always whole NUMBERS…not 2.1 Data that can only take certain values.

For example: the number of students in a class (you can’t have half a student).

(Opposite of Continuous Data).

**Counted units for **

DISCRETE DATA

**Age**

metric continuous

**Social class**

Ordinal – commonly with frequencies

**Number of children**

Discrete

**Age at birth of first child**

metric continuous

**age at menarche**

metric continuous

**menopausal state**

nominal

**age at menopause**

metric continuous

**lifetime use of oral contraceptives**

Discrete

**No of years taking oral contraceptives**

Metric continuous…b/c decimals possible

**No of months breastfeeding**

Metric continuous

**Lifetime use of hormone replacement therapy**

Discrete

**Mean years of hormone replacement therapy**

metric discrete b/c no decimals

**Family history of ovarian cancer**

metric discrete b/c no decimals

**History of benign breast disease**

metric discrete b/c no decimals

**Family history of breast cancer**

metric discrete b/c no decimals

**units of alcohol/week (%) **

Ordinal for the ordering, but metric discrete for the units of alcohol

**No of cigarettes/day **

metric discrete b/c it is a frequency

**Body mass index (kg/m3) **

Metric continuous**Simple bar chart is appropriate when **

only one variable is to be shown, eg, hair color of children receiving Malathion in the nit lotion study – equal width and spaces between bars

**malathion**

Cholinergic agonist, indirect acting

**clustered bar chart**

present more than one group, good for comparing relative sizes of the groups WITHIN each category. EG, sex of children receiving Malathion, according to hair color.

**stacked bar chart**

good to compare total number of subjects in one group, not good for category sizes between groups

**What chart is used for continuous metric data? **

The histogram (but must first group the values)

**Frequency histogram points –**

frequency plotted on the vertical axis, and group size on the horizontal axis, NO GAPS between the bars – continuous nature of underlying variable

**Limitation of histograms**

can present only ONE variable at a time

**Box plot **

A display that shows the distribution of values in a data set separated into four equal-sized groups. A box plot is constructed from the five summaries of the data. – Displays a data set along a number line using medians.

**The only measure of location that could be used for categorical data **

mode

**Divide the distribution into 50% below and 50% above it **

median

**For an odd number of observations, how to calculate median?**

(n+1)/2 =value of kth observation. Sample size is 5, median is 5+1/2 = value of 3rd observation

**for even number of observations, how to calculate median?**

is the average of the middle 2 scores = midpoint of (n)/2 & (n/2)+1. Sample size = 6, median is average of (6/2) & (6/2) + 1 = value of 3rd and 4th observations

**is the median affected by extreme values or outliers? **

no

**what is the median best for?**

skewed distribution, but not as RELIABLE as the mean, b/c does not account for all distributions

**Changing the value of a single score may not affect the mode or median, but it will affect the ____** mean**The mean is not suitable for___ **

ordinal data

**most preferred measure of central tendency as DESCRIPTION OF DATA and ESTIMATE OF THE PARAMETER**

mean

**In a symmetrical distribution**

the mean = median = mode – no skewness**NEGATIVELY skewed distribution **

MEAN < MEDIAN

mean < median < mode

**POSITIVELY skewed distribution**

MEAN > MEDIAN

Mode < Median < Mean

**mean is negatively effected by**

skewed distributions

**The range is**

difference between highest and lowest score

**In any normal curve, a constant proportion of the cases fall within**

1,2, and 3 standard deviations of the mean

**Within 1 sd:**

68%

**Within 2 sd:**

95.4%

**Within 3 sd: **

99.7

**With increase in sample size, standard deviation becomes **

smaller

**the larger the sample, the more ****____ the data **

accurate

**meta analyses – different places, same study – combine data **

to make the study more accurate, more sensitivity, lower variance, more accurate

**observational study**

researchers observe the subjects involved (asking questions, taking measurements, looking at clinical records), but they do dot manipulate data

**experimental study**

involves active intervention with the subjects

**cohort study**

A type of epidemiologic study where a group of exposed individuals (individuals who have been exposed to the potential risk factor) and a group of non-exposed individuals are followed OVER TIME to determine the incidence of disease

**case control study**

RETROSPECTIVE: A study type that uses cases (with the health problem) and compares them with controls (without the health problem) to find out what MAY HAVE caused the problem. A type of retrospective study.

**Aetiology**

the study of causation, or origination

**Two kinds of prospective studies**

Longitudinal, Experimental**Experimental studies (e.g., clinical trials, lab experiment)**

Studies the possible “cause and effect” relationship between two variable

**repeated cross sectional study**

CHANGES OVER TIME: Data from 2+ points in time from different samples

**cohort**

A population group unified by a specific common characteristic, such as age, and subsequently treated as a statistical unit.

**4 types of observational studies **

Case-series, Cross-sectional, Cohort, Case-control

**case-series study**

studying several patients, similar but unusual symptoms, ex – Kaposi sarcoma in homosexual males

**cross sectional study characteristics**

collecting data only once from each subject, NOT for DIRECTION of any CAUSAL relationship, NOT GOOD for rare things

**COHORT studies characteristics, AKA Follow-up, prospective, longitudinal**

IDENTIFY RISK factors, FOLLOW over time-SMOKING (DOLL & HILL study on smoking doctors),

**CASE-control studies characteristics **

RETROSPECTIVE, longitudinal – TWO groups…one with, one without condition – but similar characteristics. Then BOTH groups are QUESTIONED about past exposure to possible RISK FACTORS.

**Doll and hill**

Both a cohort and case-control study on smoking and lung cancer – first a case-control study, then a cohort study

**cohort vs case-control study **

Plus: many potential cases, small sample size ok, cheap, quick results, NEG: controls, case selection, Recall bias of subjects, FINDINGS may conflict with other studies, COHORT more reliable, but not always a practical alternative

**Cross sectional study limitations **

Provide info on PREVALENCE, not INCIDENCE..no distribution data

**Observational study PROS..**

suitable for common diseases, prolonged study time, larger number of subjects, less selection bias, subjects usually volunteer, incidence is determined

**Clinical trials **

Experiments to compare two or more clinical treatments

IDEAL is RANDOMIZED, DOUBLE-BLIND

**Four phases of clinical trials**

1. Establish safety, dose finding, PK studies 2. Establish biological activity or potential efficacy 3. Randomized comparison of treatment 4. Long-term surveillance in broader population

**PK study **

pharmokinetic study

**Blinding the pt**

to eliminate response or placebo bias

**Blinding the investigator **

to eliminate treatment bias and researcher expectancy

**Code of trial **

Entrusting a disinterested third party to obtain the random numbers and decide on the allocation rules

**The Gold Standard **

double-blind randomized controlled trial

**cross over randomized trial**

Group 1: receives drug A

Group 2: receives drub B (or a placebo)

**confidence interval **

A range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability.**Confidence interval: A way of admitting that any measurement from a sample is only ____**

an ESTIMATE of the population. Although the estimate given from the sample is LIKELY to be close, the TRUE VALUES for the population may be ABOVE or BELOW the sample values.**The TRUE VALUES for the population in CONFIDENCE INTERVALS may be**

ABOVE or BELOW the sample values

**The TRUE mean of a confidence interval is most likely to be ____ **

SOMEWHERE within the specified range – ABOVE or BELOW the range.

**TWO parts of a confidence interval **

1. Standard error of the mean 2. Standard Z-score**Standard error of the mean**

An estimation of QUALITY of the SAMPLE for the estimate. If the mean is 10 and the standard error of the mean is 2, then the true score is likely to fall somewhere between 8 and 12 or 10 +/- 2.

**Standard Z-Score**

The degree of confidence provided by the interval provided (Zorro was CONFIDENT that he could make that Z- score sign!!)

**Confidence Interval (CI) of the mean calculated by: **

Mean plus/minus Z-Score x Standard error

Z-Score for 95% confidence = 1.96

Z-Score for 99% confidence = 2.58

**95% CI =****99% CI = **

Mean +/- 2 x Standard error

Mean +/- 2.6 x Standard error

**if p **

reject the null hypothesis (reached statistical significance)

**if p>0.05 then:**

do not reject the null hypothesis (has not reached statistical significance)

**p=0.05, then**

the chance of a type 1 error is 5 in 100, or 1/20

**The chance of a ____ error cannot be directly estimated from the p-value **

type 2

**Power **

the capacity to detect a difference if there is one

**Increasing sample size (n) leads to**

Increase in power

**why use p-value?**

provides criterion for making decisions about the null hypothesis

**limits to the p-value: the p-value does NOT tell us – **

The chance that an individual will benefit, the percentage of pts who will benefit, the degree of benefit expected for a given pt

**The most common method to increase the power is by**

increasing the sample size