STAT111 2019-02-07
Zhu, Justin

STAT111 2019-02-07
Thu, Feb 7, 2019


The bias doesn’t tell you about your $\theta$

The scaling of $\alpha$

What’s the bias? Million of scientists with sample size of n, calculating it on average. How close is that average to the real population? We can use probability.

Standardization – Highest value initialized to 1 Normalization – Area is initialized to 1

$$\sum_{i=1}^n (Yi - \mu)^2 = \sum{i=1}^n (Y_i - \bar{Y})^2 + n(\bar{Y} - \mu)^2$$

Bias does not tell you how $\hat{\theta}$ is to the estimand $\theta$, then we want to provide some guarantee such that $\hat{\theta}$ is close to $\theta$

Count up all those data points, and then we divide it by h, the diameter of the interval.

A large h picks up many y, while a small h picks up a small number of y

S&P 500 is a market price index which involves 500 of the largest US listed companies.

The index is weighted by market value, so big companies like GE have a large weight. $P_i$ is the dividend that is reinvested in the S&P 500.

S&P means Standard & Poor. “Arithmetic returns” is $yi = 100 \bigg( \frac{P{i+1} - P_i}{P_i}), i=1,2,\cdots,n$

In large h, the $\hat{f_n}y$ averages over a large number of $Y_i$ around $y$. We plot y and $\hat{f_n}(y)$.

All the data gets averaged in a certain point. More jagged lines. We could have an expected value:

H is how many data points we are averaging over. The bigger h is, the larger $f”_{Y_1}$ becomes.

Bias is small if h is small and it does not depend on n. Meanwhile, variance is large if h is small and small if n is large.

\bar{Y} is unbiased for the estimand $\theta = E(Y_1)$ so long as the mean exists. $\bar{Y}$ is unbiased for the estimand $F^{-1}(12)$ if $Y_1$ has a mean equal to the median if the distribution is symmetric.

If $S^2_{\alpha}$.

Check What is the mathematical definition of bias?

What does bias mean to a non-statistician? Why is bias not always bad?

The term bias might lead us to think that assessing bias is the best way to assess if $\hat{\theta}$ is close to $\theta$.

The $\hat{\theta}$ is the squared error loss. $$L(\theta, \hat{\theta})$$. Suppose $Y_i$ is iid $N(\mu,\sigma^2)$.

The loss function $L(\theta,x)\geq 0$ is convex in x with the property that $L(x,x) = 0$, for all x.

What would be the most natural loss function? This depends on the domain science. Absolute value is a natural way.

If $\hat{\theta}$ is one unit of $\theta$, if $(\hat{\theta}-\theta)^2$ is to assess it.

Error loss is $(\hat{\theta} - \theta)^2$. The mean value of $L(\theta, \hat{\theta})$. We have $\theta$ being a constant.

The estimator $S_n^2$ is both a method of moments estimator and an MLE of $\sigma^2$

The variance of a sum of squares is $2\sigma^4(n-1)$.

If h is small, $MSE$ is dominated by variance. If h is large, the MSE is dominated by bias.

As n gets large, the variance decreases, but the bias does not. That set of y is a very small set. Be prepared to calculate MSE of an estimator, involving the concepts learned in Stat110.

Trading of bias and variance. Increase the variance if it decreases the bias. We talked about bias and variance tradeoffs.

Bias/Variance tradeoff

Variance of MLE and of common estiamtor. MSE is the more common estimator, such that $\frac{2\sigma^4}{n-1} > \frac{2n - 1}{n^2}\sigma^4$

I think $\sigma^2$ is bigger than the $S^2_{n-1}$.

It turns out that: $$MSE(\sigma^2,S^2_{n-1} > MSE(\sigma^2,S^2n) > MSE(\sigma^2, S^2{n+1})$$

In practice, $S^2_{n-1}$ is the most common.

The MSE has a sigma squared in it. You know your estimator and you want to figure out what $\bar{y}$ is. The purpose of estimating $\sigma^2$ is not the focus. The second fact is that when you use the estimator, you can get really nice measures of intervals which include the true estimand (confidence intervals). There’s a focus on the mean of the normal. Most of the time we estimate the population variance to do something. Poor MSE creates more confident sets, then we go for it! It’s a helper estimator.

We focus on $y = \frac{1}{2}$. To estimate the function, we make a global tradeoff where we take the integral over y. This gives us the integrated MSE. The integral of bias squared and the variance gives us an integral of one The integrated mean square has the h with the smallest MSE.

The diameter over the interval is $n^{-15}c_3$.

Then our MSE will get. Suppose n is massive. We have h gets smaller as the bias increases.

Explain to a non-statistician why and when you might prefer a biased estimator. Explain to a statistician why and when you might prefer a biased estimator.