Instrumental Variables Explained: The Complete Guide to Solving Endogeneity

LInk to slide: https://docs.google.com/presentation/d/1M6Zd3pIc__uOv1w6905GaH0Ogp4LIXec/edit?usp=sharing&ouid=107834574622602070583&rtpof=true&sd=true

If you’ve struggled with endogeneity in your regression analysis, you know the frustration. Your X variable is correlated with the error term, giving you biased, unreliable results. You can’t trust your coefficients. Your conclusions are questionable.

The ideal solution? Run a randomized experiment—like an A/B test—where you randomly assign who gets treatment. Random assignment eliminates endogeneity.

But here’s reality: You can’t always run experiments. Sometimes you’re analyzing historical data. Sometimes experiments are too expensive. Sometimes they’re simply not ethical or practical.

So what do you do?

Enter Instrumental Variables (IV)—the solution to endogeneity when you’re working with observational data. This method uses a “helper” variable (called an instrument, or Z) that affects your X variable but doesn’t directly affect your Y outcome. This helper variable lets you isolate the clean, unbiased relationship between X and Y.

By the end of this guide, you’ll understand exactly what instrumental variables are, how they work, when to use them, and most importantly, how to find good instruments for your own research.

The Problem We’re Solving

Let’s ground this in a concrete example.

Understanding Endogeneity

Imagine you want to know: Does IT training increase employee productivity?

Your variables are straightforward:

X = IT training hours (how many hours of training an employee receives)
Y = Productivity score

You run a regression and find that employees with more training are more productive. Case closed, right?

Not so fast. You have endogeneity.

Why Endogeneity Occurs

Managers don’t randomly assign training. They select their smart, high-performing employees for training programs. And those smart employees? They’re naturally more productive even without training.

The problem: X (training hours) is correlated with ability, which is in the error term. This is classic endogeneity.

If you run a regular regression, you’ll overestimate how much training actually helps because you’re picking up both:

The true effect of training
The effect of ability (which determines both who gets training AND who is productive)

You can’t separate these two effects with ordinary least squares (OLS).

The Ideal vs. The Practical

Ideal solution: Run a randomized experiment. Randomly assign some employees to receive training, others not. Random assignment ensures training is uncorrelated with ability.

The challenge: You can’t always do this. You’re analyzing historical data. Experiments are expensive. Or you simply don’t have the authority to randomly assign training.

The solution: Instrumental Variables.

What Is an Instrumental Variable?

An instrumental variable (Z) is a “helper” variable that must satisfy two critical requirements.

The Two Requirements

Requirement #1: Relevance

The instrument Z must affect your X variable
It creates variation in X
If people have different values of X, your instrument Z needs to be one of the reasons why

Requirement #2: Exclusion Restriction

Z must NOT directly affect Y
It can only affect Y indirectly, through X
There cannot be a direct pathway from Z to Y

The Pathway Diagram

Think of it like a chain:

Z → X → Y

Z affects X ✓ (allowed and required)
X affects Y ✓ (the relationship you’re trying to measure)
Z → Y ✗ (NOT allowed—no direct effect)

Why this matters: If Z only affects Y through X, then the variation in X that comes from Z is clean—it’s not contaminated by the error term. That’s the variation we can use to get an unbiased estimate.

Making It Concrete

This is still abstract. Let’s see a real example.

A Simple Example: Training & Productivity

Let’s return to our IT training question with a specific instrumental variable.

The Setup

Research Question: Does IT training increase employee productivity?

Variables:

X = IT training hours
Y = Productivity score
Problem = Endogeneity (managers select smart employees for training)

The Instrument: Distance to Training Center

Our instrument Z is: Distance from the employee’s home to the training center.

Why does this work? Let’s check both requirements.

Checking Requirement #1: Relevance

Does distance affect training?

Absolutely! Employees who live 2 miles from the training center are much more likely to attend sessions than employees who live 20 miles away.

Distance creates variation in who gets training because:

Closer employees face lower commute costs
They can attend more easily
They’re more likely to participate consistently

Relevance requirement: ✓ Satisfied

Checking Requirement #2: Exclusion

Does distance affect productivity directly?

Think carefully: Does living far from the training center make you less productive at your actual job?

Not really! Your commute to the training center has nothing to do with how good you are at your job. Distance only affects productivity THROUGH its effect on training attendance.

Potential violation to consider: Maybe people in rural areas (who live far from training centers) have different labor market conditions. That would violate exclusion.

Defense: If we’re only looking at employees within the same city, this concern is minimal. The exclusion assumption seems reasonable.

Exclusion requirement: ✓ Plausibly satisfied

The Pathway

Our complete pathway looks like this:

Distance to Center → Training Hours → Productivity

Distance affects training hours ✓
Training hours affect productivity ✓
Distance does NOT directly affect productivity ✓

This is a valid instrument!

Understanding the Two Requirements in Depth

These requirements are absolutely critical. If your instrument fails either one, everything falls apart.

Requirement #1: Relevance (The Testable One)

What it means: Your instrument Z must actually affect X—not just a little, but strongly and statistically significantly.

How to test it:

Run a regression (called the “first stage”):

X = α + βZ + ε

Then check two things:

Is the coefficient statistically significant? (p-value < 0.05)
Is the F-statistic greater than 10? (This is the critical threshold)

If F > 10: You have a strong instrument ✓ Safe to proceed

If F < 10: You have a weak instrument ✗ Stop immediately

Why this matters: Weak instruments give you biased estimates that are often WORSE than just using regular OLS. This is not optional—F > 10 is a hard requirement.

Requirement #2: Exclusion Restriction (The Challenging One)

What it means: Z must affect Y ONLY through X. It cannot have any direct effect on Y.

The hard truth: You CANNOT statistically test this. There is no test that proves your instrument satisfies exclusion.

What you do instead: Rely on:

Theoretical arguments
Common sense
Logic
Domain knowledge
Careful reasoning about potential violations

You must think hard about whether there might be any pathway from Z to Y that doesn’t go through X.

This is why finding good instruments is so difficult. You can test relevance easily. But exclusion requires careful thinking and sometimes involves judgment calls.

Testing Relevance: A Real Example

Let’s test whether our distance instrument is relevant using actual data.

Running the First Stage

We run the first stage regression:

Training Hours = 40 - 1.5 × Distance

Interpreting the Results

Coefficient on Distance: -1.5

Each additional mile from the training center reduces training by 1.5 hours on average
This makes intuitive sense

F-statistic: 45.2

Remember our threshold: F > 10

We got 45.2—far above the threshold!

Conclusion

This is a STRONG instrument. Distance powerfully predicts training hours.

Relevance requirement: ✓ Definitively satisfied

What About Exclusion?

Can we test whether distance affects productivity directly?

No—we cannot statistically test this.

But we can think about it logically:

Does living far from the training center make you less productive at your job? Probably not.
Could there be violations? Perhaps people in rural areas have different labor markets.
Is the assumption reasonable? If we’re studying employees within the same city, yes.

This is the kind of reasoning you must do. There’s no test—just careful thought about potential violations.

Understanding Exclusion Restriction: What’s Allowed vs. What’s Not

Let’s make the exclusion restriction crystal clear with visual examples.

What’s ALLOWED ✓

Indirect pathway: Z → X → Y

In our training example:

Distance affects training hours
Training hours affect productivity
Distance → Training → Productivity

This creates an indirect effect. This is perfectly fine—in fact, it’s exactly what we want!

What’s NOT ALLOWED ✗

Direct pathway: Z → Y (bypassing X)

The instrument cannot have a direct effect on the outcome. It can’t bypass X and influence Y separately.

Concrete Examples of Violations

Let’s think through potential violations for our distance instrument:

Potential Violation #1: Geographic Labor Markets

Maybe people who live far from the training center also live in rural areas with:

Different job opportunities
Different industries
Different labor markets

Then distance is actually a proxy for location, and location directly affects productivity through channels unrelated to training.

This would violate exclusion.

Potential Violation #2: Commute Stress

Maybe employees with long commutes to the training center (if they did attend) are:

More stressed
More tired
Less focused

If commute stress directly reduces productivity, that’s a violation of exclusion.

Playing Devil’s Advocate

When evaluating exclusion, ask yourself:

What are ALL the possible ways my instrument might directly affect the outcome?
Am I missing any alternative pathways?
What could go wrong?

Be your own harshest critic.

The Reality

Perfect instruments are rare. Really rare.

Most instruments have at least some plausible violation of exclusion. The question isn’t whether your instrument is perfect—it’s whether the exclusion assumption is plausible enough that IV gives you better estimates than the alternatives.

This is why instrumental variables is both powerful and challenging.

How IV Works: The Magic Explained

Now let’s understand the actual mechanism that makes instrumental variables work. This is the key insight.

Your X Variable Contains Two Types of Variation

Your X variable (training hours) contains TWO types of variation mixed together:

Type 1: Clean Variation (Good)

Changes in X that are NOT correlated with the error term
Some employees get more training because they live closer to the training center
This is random with respect to ability!
Where you live doesn’t determine how smart you are
This variation is clean, pure, uncontaminated

Type 2: Contaminated Variation (Bad)

Changes in X that ARE correlated with the error term
Some employees get more training because they’re ambitious and smart
Some get more training because managers selected them
This variation is tangled up with ability (which is in the error term)

The Formula

Training Hours = Clean variation (from distance) + Contaminated variation (from ability/selection)

What OLS Does (The Problem)

OLS uses ALL the variation—clean plus contaminated.

It can’t tell them apart!

Your OLS coefficient captures:

The true effect of training
The effect of ability

Result: Biased answer. You get the wrong estimate.

What IV Does (The Solution)

IV uses ONLY the clean variation.

It uses only the variation in training that comes from distance.

Why this works:

Distance affects training ✓
Distance is NOT correlated with ability ✓
So variation in training created by distance is uncontaminated

By using only that clean variation, IV gives you an unbiased estimate of the true causal effect.

The Water Analogy

Think of it like this:

OLS: Drinks from a contaminated water supply (some clean water mixed with polluted water). You get sick. Biased estimates.

IV: Filters the water first. Identifies the clean source (your instrument) and uses ONLY water from that source. You stay healthy. Unbiased estimates.

The Tradeoff

The catch: You end up with less water than you started with.

This is why IV has larger standard errors than OLS. You’re using less variation, so your estimates are less precise.

But that’s okay! It’s better to have a smaller amount of clean water than a larger amount of contaminated water.

Similarly, it’s better to have less precise but UNBIASED estimates than very precise but BIASED estimates.

Two-Stage Least Squares (2SLS): How to Actually Do It

The most common method for implementing IV is called Two-Stage Least Squares (2SLS).

The name tells you exactly what it is: you run two stages of regressions.

Stage One: Predict X Using Z

Run the regression:

X = α + βZ + ε

What this does: Extracts the “clean” part of X that comes from your instrument Z.

You’re predicting X based only on Z, ignoring all other factors.

You get: Predicted values of X (written as X̂, pronounced “X-hat”)

These predicted values represent ONLY the variation in X that came from Z.

Stage One Example

Using our training example:

Training Hours = 40 - 1.5 × Distance

For each employee, calculate predicted training hours based ONLY on how far they live from the training center:

Alice lives 2 miles away → Predicted training = 37 hours
Bob lives 20 miles away → Predicted training = 10 hours

We’re ignoring the fact that Alice is ambitious or that Bob’s manager selected him. We’re using ONLY the distance-driven variation.

That’s Stage One. We’ve isolated the clean variation.

Stage Two: Use Predicted X̂ to Explain Y

Run the regression:

Y = a + b(X̂) + ε

Notice: You’re NOT using actual training hours. You’re using PREDICTED training hours from Stage One.

This gives you: The IV estimate of how X affects Y.

The coefficient b is now unbiased because it’s using only the clean variation in X.

Stage Two Example

Productivity = 50 + 2 × (Predicted Training)

The coefficient is 2.

Interpretation: For each hour of training (driven by distance), productivity increases by 2 points.

The Magic at Work

You’re using only the variation in X that came from Z, which is NOT correlated with the error term.

That’s why you get an unbiased estimate.

Two stages:

Predict X from Z
Predict Y from predicted X

That’s 2SLS.

A Numerical Example: Step-by-Step

Let’s walk through a concrete example with actual numbers so you can see exactly how this works.

The Data

Three employees:

Employee	Distance (miles)	Actual Training (hours)	Productivity	Ability
Alice	2	40	85	High
Bob	20	15	70	High
Carol	10	25	75	Average

The Problem with OLS

When OLS looks at this data:

Alice: 40 hours training, productivity 85
Bob: 15 hours training, productivity 70
Carol: 25 hours training, productivity 75

OLS thinks: “More training leads to more productivity!”

But it can’t tell WHY Alice is so productive:

Is it because of the 40 hours of training?
Or is it because she’s naturally smart AND happens to have gotten more training?

This is the endogeneity problem. Training and ability are mixed together.

What IV Does: Stage One

We predict training from distance only:

Training = 40 - 1.5 × Distance

Alice (2 miles):

Predicted training = 40 – 1.5(2) = 37 hours
Note: Her ACTUAL training was 40 hours, but we’re ignoring that
We’re using only the prediction from distance

Bob (20 miles):

Predicted training = 40 – 1.5(20) = 10 hours

Carol (10 miles):

Predicted training = 40 – 1.5(10) = 25 hours

What we’ve done: Extracted the part of their training that’s due to where they live, completely ignoring ability, manager selection, motivation—all that contaminated variation is gone.

What IV Does: Stage Two

Now we use ONLY these predicted values in Stage Two.

We’re ignoring:

Alice is high-ability
Bob was selected by his manager
Any other contaminating factors

We’re using only the distance-driven variation in training.

When we run Stage Two, we get the clean, unbiased estimate of how training affects productivity, free from ability bias.

The Key Difference

OLS: Uses everything mixed together → Biased

IV: Carefully separates out just the clean part → Unbiased

A Real IS Example: CRM Software & Sales

Let’s examine another practical Information Systems example.

The Research Question

Does CRM (Customer Relationship Management) software increase sales revenue?

Variables:

X = Has CRM software (yes/no)
Y = Annual sales revenue

Simple question, right?

The Endogeneity Problem

High-performing sales teams are more likely to adopt CRM first!

Companies that are already doing well invest in better technology. This is reverse causality—one type of endogeneity.

If you just compare:

Companies with CRM → Higher sales
Companies without CRM → Lower sales

You can’t tell if:

CRM causes higher sales (causal effect)
Good companies buy CRM (selection effect)

That’s endogeneity.

The Instrument: Promotional Pricing

Our instrument Z: Did the company happen to be shopping for CRM during a promotional pricing period?

CRM vendors occasionally run sales or discounts. The timing of these promotions is somewhat random.

Checking the Requirements

Relevance: Do promotional periods affect CRM adoption?

Yes! Companies are more likely to buy CRM when it’s on sale. They’re price-sensitive. The promotion creates variation in who adopts CRM.

✓ Relevance satisfied

Exclusion: Does the timing of a CRM promotion directly affect sales revenue?

Think carefully:

The promotion affects whether you BUY the CRM ✓
But the random timing of when a vendor runs a sale?
That shouldn’t directly affect your sales except through CRM adoption ✓

✓ Exclusion seems plausible

The Results

Method	Estimated Effect
Regular OLS	CRM increases sales by $250,000
IV (promotional pricing)	CRM increases sales by $120,000

OLS was overestimating by more than 2-to-1!

Why the Difference?

OLS picks up:

True CRM effect
The fact that good sales teams buy CRM

The $250,000 includes both effects mixed together.

IV isolates: Only the true CRM effect = $120,000

Still significant! Still worthwhile! But nowhere near what the naive analysis suggested.

Why This Matters

This can completely change your business decisions.

If you’re deciding whether to invest in CRM based on these results, you want the true effect ($120k), not the inflated number ($250k).

IV gives you the real answer.

How to Test Your Instrument

You’ve found a potential instrument. How do you test whether it’s valid?

Test #1: First-Stage F-Test (Tests Relevance)

What it tests: Is your instrument strongly related to X?

How to do it:

Run Stage One of 2SLS (regress X on Z)
Check the F-statistic (reported automatically by statistical software)

The rule:

F > 10: Strong instrument ✓ Safe to proceed
F < 10: Weak instrument ✗ Stop immediately

This is a hard threshold. If your F-statistic is below 10, your IV results are not trustworthy.

Why it matters: Weak instruments give you biased estimates that are often WORSE than regular OLS.

Test #2: Overidentification Test (Only with Multiple Instruments)

When it applies: You have TWO or more instruments

What it tests: Are your multiple instruments giving consistent results?

The tests:

Sargan Test
Hansen J Test

How to interpret:

Null hypothesis: Your instruments are valid

p-value > 0.05: Instruments pass ✓ They’re consistent
p-value < 0.05: At least one instrument is invalid ✗ Problem

Important: You can ONLY run this test with multiple instruments. With just one instrument, this test is unavailable.

Test #3: Exclusion Restriction

The painful truth: You CANNOT statistically test this.

There is no test that proves your instrument doesn’t directly affect Y.

None. Zero. Doesn’t exist.

What you do instead:

Rely on theoretical arguments
Use logic and common sense
Think through all possible pathways from Z to Y
Play devil’s advocate with yourself
Ask: What am I missing? What could go wrong?
Make a judgment call

Sometimes it’s clear. Sometimes it’s debatable. Sometimes different researchers will disagree.

This is why instrumental variables is both an art and a science.

You need:

Technical skills to run regressions and interpret statistics
Deep domain knowledge to assess whether exclusion is plausible

Common IV Mistakes to Avoid

Learn from these four most common mistakes so you don’t make them yourself.

Mistake #1: Using a Weak Instrument

The mistake: Your F-statistic is less than 10, but you use IV anyway.

People see F = 5 or 6 and think: “It’s still significant at p < 0.05, so it’s probably okay.”

Wrong! It’s not okay!

Why it’s bad: Weak instruments give you biased estimates. Often WORSE bias than just using regular OLS and ignoring the endogeneity.

The solution:

Find a stronger instrument
Or stick with OLS and be transparent about the limitations
A weak instrument is worse than no instrument

Mistake #2: Violating the Exclusion Restriction

The mistake: Your instrument actually has a direct effect on Y, but you don’t realize it or hope nobody will notice.

Classic bad example: Using CEO age as an instrument for firm innovation

Logic: Older CEOs invest less in R&D (might satisfy relevance)
Problem: Does CEO age directly affect firm performance? Of course!
- Through experience
- Through networks
- Through risk tolerance
- Through many channels unrelated to innovation spending

This violates exclusion. Your IV estimates will still be biased.

The solution: Think really hard about alternative pathways from your instrument to your outcome. Be honest with yourself about potential violations.

Mistake #3: Using IV with Small Samples

The mistake: Running IV with insufficient data.

Why it’s bad: IV needs more data than OLS to give you precise estimates.

You’re throwing away variation—using only the variation from your instrument.

With small samples:

Standard errors will be huge
Confidence intervals will be extremely wide
You can’t conclude anything meaningful

The result: Very imprecise estimates. You might get the right answer on average, but you can’t tell because of all the noise.

The solution:

Get more data
Or use a different method if your sample is small

Mistake #4: Misinterpreting What IV Estimates

The subtle issue: IV doesn’t estimate the average treatment effect for everyone.

It estimates the Local Average Treatment Effect (LATE).

What this means: IV tells you the effect for people whose treatment status was influenced by your instrument.

In our training example:

IV tells you the effect of training for people whose attendance was affected by distance
It does NOT tell you the effect for:
- People who would attend no matter what (always-takers)
- People who would never attend regardless of distance (never-takers)

This is actually fine! But you need to be clear about what population you’re estimating the effect for.

These four mistakes account for probably 90% of problematic IV applications.

Avoid them, and you’re way ahead of most researchers.

When to Use IV: A Decision Guide

So when should you actually use instrumental variables?

✓ Use IV When ALL of These Are True:

1. You have endogeneity

Obviously—if you don’t have endogeneity, you don’t need IV. Just use regular OLS.

2. You can’t run an experiment

If you CAN run a randomized experiment (like an A/B test), do that instead!

Experiments are the gold standard. They completely eliminate endogeneity. IV is for when experiments aren’t possible.

3. You have a valid instrument

Both requirements must be satisfied:

Strong relevance (F > 10)
Plausible exclusion (passes the logic test)

4. Your sample size is large enough

IV needs more data than OLS to give you precise estimates.

If all four conditions hold: IV is a great choice ✓

✗ Don’t Use IV When:

1. You CAN run a randomized experiment

Seriously—if you can randomize, do that instead. It’s:

Simpler
More credible
Easier to interpret

2. Your instrument is weak

F-statistic < 10? Stop. Find a better instrument or use a different approach.

3. You can’t justify the exclusion restriction

If there are obvious direct pathways from your instrument to the outcome, your IV estimates will be biased.

Don’t pretend the problem doesn’t exist.

4. Simple controls would fix the endogeneity

Sometimes you can just add a few control variables to your regression and the endogeneity goes away.

Don’t overcomplicate things if you don’t need to.

The Quick Decision Tree

Question 1: Do you have endogeneity?

No → Use OLS
Yes → Continue to Question 2

Question 2: Can you run an experiment?

Yes → Do that! (Don’t use IV)
No → Continue to Question 3

Question 3: Do you have a valid instrument?

No → Try other methods (fixed effects, matching, etc.)
Yes → Continue to Question 4

Question 4: Is your F-statistic > 10?

No → Don’t use IV
Yes → Proceed with IV ✓

This simple flowchart will save you from most IV mistakes.

IV vs. Other Solutions for Endogeneity

Let’s put instrumental variables in context by comparing it to alternative approaches.

Comparison Table

Method	What It Fixes	What It Requires	When to Use
Instrumental Variables	All types of endogeneity	Valid instrument	Can’t experiment but have good instrument
Fixed Effects	Time-invariant omitted variables	Panel data	Have panel data, endogeneity from time-invariant factors
Randomized Experiments	All types of endogeneity	Ability to randomize	Whenever possible—gold standard
Difference-in-Differences	Omitted variable bias	Treatment/control groups + before/after data	Natural experiment setting
Matching	Selection on observables	Rich set of observable characteristics	Endogeneity from measurable confounders

The Key Insight

IV is powerful but demanding.

It can solve ANY type of endogeneity (reverse causality, omitted variables, measurement error)
But it requires a genuinely good instrument

When to Choose What

If you have panel data: Fixed effects might be easier

If you can run an experiment: Do that—it’s the best option

If you have a natural experiment: Use difference-in-differences

IV is for when:

Those other options aren’t available
But you DO have a valid instrument

Don’t use IV just because:

It sounds fancy
A reviewer asked for it
You want to look sophisticated

Use it because it’s the right tool for your specific situation.

Strategies for Finding Good Instruments

The most common question: “How do I actually FIND a good instrument?”

Where do these magical Z variables come from?

Strategy #1: Look for Natural Experiments

What they are: Situations where treatment is assigned based on rules or circumstances that are basically random.

Examples:

Policy changes affecting some groups but not others
Geographic variation (distance, climate, time zones)
Random events (lotteries, natural disasters)

Famous example: Using birthday quarter as an instrument for education

Compulsory schooling laws meant people born in certain months could drop out earlier. This created random variation in education.

Strategy #2: Exploit Timing Variation

Look for:

When did different people/companies adopt a technology?
When were different policies rolled out?
Staggered implementation schedules

Example: If a policy was implemented in different states at different times, that timing can be your instrument.

Strategy #3: Use Decision-Maker Characteristics

Consider:

Who made the decision?
What was their background?
Manager background or experience
Organizational factors (company size, age, industry)
Leadership changes

Example: New CEO from a tech background might affect IT spending. CEO’s background could be your instrument.

Strategy #4: Historical Accidents

Look for:

Distance to historical infrastructure
Legacy of past policies
Random assignment in past programs no longer active

Classic example: Distance to historical railroad lines as an instrument for modern economic development.

The railroad is long gone, but the distance still matters.

The Key Requirements

Find something that:

Creates meaningful variation in your X variable (relevance)
Plausibly has no direct effect on your Y outcome (exclusion)

The Reality

This is a creative process.

You need to know:

Your data
Your context
Your history

The best instruments often come from really deep knowledge of your specific domain.

And honestly? Most of the time, you won’t find a perfect instrument.

That’s okay. The question is whether your instrument is good enough that IV gives you better estimates than the alternatives.

Key Takeaways

Six essential points to remember about instrumental variables:

1. IV Solves Endogeneity by Using Clean Variation

It isolates the part of X that’s not contaminated by the error term.

That’s the whole point.

2. Two Requirements: Relevance & Exclusion

Relevance: Testable with the F-statistic (must be > 10)

Exclusion restriction: NOT testable—you have to think your way through it

3. The Method: Two-Stage Least Squares

Stage 1: Predict X from Z

Stage 2: Use predicted X to predict Y

Two stages, two regressions.

4. Test Relevance with F-Statistic > 10

This is not negotiable.

Below 10 = weak instrument

Weak instruments are worse than useless.

5. Good Instruments Are RARE

Don’t force it.

If you don’t have a plausible instrument:

Be honest about it
Use a different method
Acknowledge the limitations of OLS

It’s better to be transparent about endogeneity than to use a bad instrument and pretend you’ve solved the problem.

6. IV Estimates LATE (Local Average Treatment Effect)

It tells you the effect for people who were influenced by your instrument.

Not necessarily everyone in the population.

This is fine, but be clear about what you’re estimating.

Conclusion: The Power and Challenge of IV

Think of instrumental variables like using a clean water source.

The Analogy

When you have endogeneity:

Your X variable is like a contaminated water supply
Some water is clean, some is polluted
If you drink from it (OLS), you get sick
You get biased estimates

IV is like filtering the water:

You find a clean source (your instrument)
You use ONLY water from that source
You filter out all the contaminated variation

The Tradeoff

It’s not perfect.

You end up with less water than you started with—that’s why IV has larger standard errors than OLS.

But the water you DO have is clean and safe.

The Core Principle

It’s better to have:

A smaller amount of clean water
Than a larger amount of contaminated water

Similarly:

Less precise but UNBIASED estimates
Are better than very precise but BIASED estimates

The Power of IV

Master instrumental variables, and you unlock:

Causal effects even when experiments are impossible
Rigorous causal inference with observational data
Solutions to endogeneity in real-world settings

Use It Wisely

This is an incredibly powerful tool.

Use it:

Wisely
Carefully
Only when you have a genuinely valid instrument

The Critical Question

Always ask yourself: Is my instrument valid?

Both requirements:

Relevance (F > 10) ✓
Exclusion (passes logic test) ✓

Don’t skip that step!

Final Thoughts

Instrumental variables represents both:

Science: Technical skills to run regressions and interpret statistics
Art: Domain knowledge and careful reasoning about exclusion

The best IV applications combine both.

Good luck with your research, and remember: validity first, always.

The Problem It Solves: In regression, we need our independent variable (X) to be unrelated to the error term. But often X is “endogenous” — it’s correlated with unobserved factors hiding in the error term. This biases our estimates.

The Intuition: Imagine you want to know if education (X) causes higher wages (Y). But ability is unobserved — smart people get more education and earn more, so your estimate is biased. An instrument (Z) is a variable that affects education but has no direct effect on wages except through education. A classic example: distance to the nearest college. People who live closer to colleges get more education, but distance itself doesn’t make you earn more.

How It Works (Two-Stage Least Squares / 2SLS):

Stage 1: Regress X on Z (predict education using distance to college). This isolates the “clean” variation in X that comes only from Z.
Stage 2: Regress Y on the predicted X from Stage 1. This gives you an unbiased causal estimate.

Key Requirements for a Valid Instrument:

Relevance: Z must actually affect X (testable — check the first-stage F-statistic, should be >10)
Exclusion Restriction: Z affects Y only through X (not directly). This is assumed and argued theoretically — you can’t test it directly.

In IS Research: Papers use instruments like regulatory shocks, geographic distance, or industry-level averages to address endogeneity in IT investment studies.