推定量とかいう分かるようなそうでもないような用語について詳しくまとめてみた

|| こんな感じの値が出るんじゃない？っていう値

「標本（一部のデータ）」から「母数（全体のやつ）」を

どうにかこうにか『推測』して得た値のこと。

推定値 Estimated Value

|| 量と値の違い

『母数 $μ$ 』の「予想できる値 $\hat{μ}$ 」のこと。

$\begin{array}{llllll} \displaystyle \hat{μ}&=&\displaystyle\frac{5+7+3+3+7+9+2+3+4+6}{10} \end{array}$

『推定「量」』は「確率変数（取り得るやつ）」を使い

『推定「値」』は「定数（サンプルから計算）」を使います。

$\begin{array}{llllll} \displaystyle E[\overline{X}]&=&μ \end{array}$

こういう『式 $E[\overline{X}]$ 』は「推定量」です。

誤差 Error

|| ほんとの値と予想の違い

『予想と実際のズレ』のこと。

$\begin{array}{llllll}\mathrm{Error}&=& \displaystyle μ-θ_{\mathrm{est}} \end{array}$

$μ$ は「母数」

$θ_{\mathrm{est}}$ は『点推定量』を表す記号だとします。

実際の値 $μ$ は正確には分からないので

だいたい「 $0$ 」にはなりません。

不偏性 Unbiased

|| 偏ってないこと

「推測した『期待値』が真の値になる」感じ。

$\begin{array}{llllll} \displaystyle E[θ_{\mathrm{est}}]-θ_{\mathrm{true}}&=&0 \end{array}$

「予想 $θ_{\mathrm{est}}$ の期待値 $E[θ_{\mathrm{est}}]$ 」と言える値が

「実際の値（母数） $θ_{\mathrm{true}}$ になる」

この時「不偏性を持つ」と言います。

偏り Bias

|| ばらけてない感じ

「特定の情報が多い」感じ。

$\begin{array}{llllll} \displaystyle \mathrm{Bias}(θ_{\mathrm{est}})&=&E[θ_{\mathrm{est}}]-θ_{\mathrm{true}} \end{array}$

『偏りがある』場合

この値 $\mathrm{Bias}(θ_{\mathrm{est}})$ は $0$ になりません。

話自体は単純ですね。

「予想 $θ_{\mathrm{est}}$ 」の期待値が母数と異なるなら

$\begin{array}{llllll} \displaystyle θ_{\mathrm{est}}&=&θ_{\mathrm{true}}±α \end{array}$

予想に使われた「この値 $θ_{\mathrm{est}}$ の中身」には

例えば「平均」であれば

$\begin{array}{llllll} \displaystyle E[x_1]&=&μ±ε \end{array}$

$\begin{array}{llllll} \displaystyle E[\overline{x}]&=&\displaystyle E\left[ \frac{x_1+x_2+x_3+\cdots+x_n}{n} \right] \\ \\ &=&\displaystyle \frac{1}{n}E\left[ \textcolor{pink}{x_1}+x_2+x_3+\cdots+x_n \right] \\ \\ \\ &=&\displaystyle \frac{1}{n}\left(\textcolor{pink}{μ±ε} +μ+μ+\cdots+μ \right) \\ \\ &≠&μ\end{array}$

母集団とは『大きく異なる集まり $x_1$ 』が含まれる。

つまり「偏ったサンプルである」ということになります。

まあ感覚的にはそのままですね。

これは「平均 $μ$ と大きく離れた値」が

『 $x_1$ に多く集まる』ことによって

$\begin{array}{llllll} \displaystyle E[x_1]&≠&μ \end{array}$

「母数」とは異なる予想に繋がる。

『偏り』の定義として

これはそう感覚とズレてはいないと思います。

一致性 Consistency

|| 正しさに近いだろうっていう感じ

「データが増えれば予想の精度が上がる」感じ。

$\begin{array}{llllll} \displaystyle ∀ε>0&\Bigl( \displaystyle\lim_{n \to \infty}P\Bigl( |θ_{\mathrm{est}}(n)-θ_{\mathrm{true}}|>ε \Bigr)&=&0 \Bigr) \end{array}$

これはわりと直感的な話ですね。

「サンプル数を増やす」と『母数に近い値が出る』

$\begin{array}{rlllll} \displaystyle P\Bigl( |\overline{X_n}-μ|≥ε \Bigr)&≤&\displaystyle\frac{σ^2}{nε^2} \\ \\ \\ \displaystyle \lim_{n \to \infty}P \Bigl( |\overline{X_n}-μ|>ε \Bigr)&=&0&&(∀ε>0) \end{array}$

いわゆる「大数の法則」です。

『一致性』というのは

この当たり前の感覚を表現する概念になります。

有効性 Efficiency

|| 推測の精度が高い感じ

「推測の誤差がほとんどない」感じ。

$\begin{array}{llllll} \displaystyle E\Bigl[ (θ_{\mathrm{est}}-θ_{\mathrm{true}})^2 \Bigr] \end{array}$

これは「推定量」と「母数」の

『分散が小さい』という形で定義されています

これは「サンプリングの結果」を評価する基準

とまあそういう風に言えるもので

$\begin{array}{llllll} \displaystyle (θ_{\mathrm{est}}-θ_{\mathrm{true}})^2 \end{array}$

優れたやり方であれば『推測の誤差は小さくなる』

ダメであれば『推測の誤差は大きくなる』

これもまた当たり前に思える話になります。

『分散』『バイアス』の「最小」を割り出すので

計算はけっこう複雑です。

頑健性 Robustness

|| 周りに左右されない感じ

『影響を受けにくい性質』のこと。

$\begin{array}{llllll} \displaystyle \mathrm{Max}(X) &\mathrm{Min}(X) \\ \\ \mathrm{Median}(X) &\mathrm{Quantile}_4(X)\\ \\ \mathrm{Mode}(X) \end{array}$

「最大値 $\mathrm{Max}$ 」「最小値 $\mathrm{Min}$ 」

「中央値 $\mathrm{Median}$ 」「分位数 $\mathrm{Quantile}$ 」

「最頻値 $\mathrm{Mode}$ 」辺りが持つ性質です。

これは『分布』の影響を強く受けることになる

「平均」なんかと区別する考え方で

例えば「年収」の統計を取った時

$\begin{array}{llllll} \displaystyle \mathrm{Mean}(X)&=&\displaystyle\frac{200+300+250+\cdots+10^5+10^7+\cdots}{n}10^4 \end{array}$

「平均」は『大きな値の影響を強く受ける』ことから

本当に得たい結果とはズレたものを導くことがあります。

ここでの「平均」に

『中間層の収入』を「意味する」

そういう「役割」を求めているのなら

$\begin{array}{llllll} \displaystyle \mathrm{Mean}(X)&=&552\times 10^4 \\ \\ \mathrm{Median}(X)&=&437\times 10^4 \end{array}$

「平均」ではこの役割を

十全には担えていないですよね。

ここで出てくるのが「中央値」なんかで

これは『分布の形』に影響を受けませんから

必ず役割通りの結果を返してきます。

これが「頑健性がある」という感覚で

『例外』の影響が強い場合なんかでは

特に意識しなければならないものになります。

同様に「最頻値」や「最大・最小」もまた

どのような分布であろうと

「役割」「意味」が薄まることはありません。

以上、ざっとまとめると

「不偏性」は『期待値と母数の誤差がほぼ無い』感じ

「一致性」は『推定量が母数に近づいていく』感じ

「有効性」は『推定量の誤差が小さければ良い』感じ

「頑健性」は『値の持つ意味が一定である』感じです。

『サンプル集め』において

「不偏性」「一致性」「有効性」が保証される場合

その推定量は母数に限りなく近い値を導きます。

不偏推定量 Unbiased Estimate

|| 偏りから考えられる推測

『偏り』から定義される推定量の一種。

$\begin{array}{llllll} \displaystyle E[θ_{\mathrm{est}}]&=&θ_{\mathrm{true}} \end{array}$

ただの『期待値の計算』です。

$↑$ は「偏りが無い」ことを表します。

標本平均と平均の不偏推定量

「無作為抽出」された「標本」から

「平均 $E[X]=μ$ の不偏推定量」を考えてみます。

$\begin{array}{llllll} \displaystyle E[\overline{X}]&=&\displaystyle E\left[\frac{X_1+X_2+...+X_n}{n}\right] \\ \\ \\ &=&\displaystyle E\left[\frac{μ+μ+...+μ}{n}\right] \\ \\ &=& μ \end{array}$

『偏りがほとんど無い』場合

「標本平均 $\overline{X}$ 」はこうで

この期待値 $E[\overline{X}]$ はこのように表現でき

$\begin{array}{llllll} E[\overline{X}]&=&μ \end{array}$

この値は『限りなく母平均 $μ$ に近づく』ことから

このようになる、と言えます。

$E[\overline{X}]$

この時の「関数 $E[\overline{X}]$ 」が

「平均 $μ$ 」の「不偏推定量」です。

不偏分散

『標本平均』と「平均の不偏推定量」は一致しますが

「不偏分散」は「標本分散」とは一致しません。

$\begin{array}{llllll} \displaystyle s^2&=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}\Bigl( x_i-\overline{x} \Bigr)^2 \\ \\ σ^{2}_{\mathrm{est}}&=&\displaystyle\frac{1}{n-1}\sum_{i=1}^{n}\Bigl( x_i-\overline{x} \Bigr)^2 \end{array}$

これは「不偏分散」の定義による仕様で

「標本分散」と「母分散」は一致しないため

$\begin{array}{llllll} \displaystyle E[σ^{2}_{\mathrm{est}}]&=&σ^2 \end{array}$

調整の過程で必然的にそのようになります。

計算してみましょうか。

$\begin{array}{llllll} \displaystyle \displaystyle E[s^2]&=&\displaystyle E\left[\frac{1}{n}\sum_{i=1}^{n}(x_i-\overline{x})^2\right] \\ \\ &=&\displaystyle \frac{1}{n} E \left[\sum_{i=1}^{n}(x_i-\overline{x})^2\right] \end{array}$

ごちゃついてるんで

期待値 $E$ の変数にだけ着目してみます。

$\begin{array}{llllll} \displaystyle \sum_{i=1}^{n}(x_i-\overline{x})^2 \end{array}$

ただこれを見てわかると思いますが

このままだと「母分散 $σ^2$ 」が式に登場しません。

$\begin{array}{llllll}\displaystyle \sum_{i=1}^{n}(x_i-\overline{x})^2&=& \displaystyle\sum_{i=1}^{n}\Bigl( x_i-\overline{x}+(μ-μ) \Bigr)^2 \\ \\ &=&\displaystyle\sum_{i=1}^{n}\Bigl( (x_i-μ)-(\overline{x}-μ) \Bigr)^2 \end{array}$

なので計算の中に「母平均 $μ$ 」を

良い感じに入れる必要があります。

$\begin{array}{llllll} \displaystyle \sum_{i=1}^{n}\Bigl( x_i-\overline{x} \Bigr)^2&=&\displaystyle\sum_{i=1}^{n}\Bigl( (x_i-μ)-(\overline{x}-μ) \Bigr)^2 \\ \\ &=&\displaystyle \sum_{i=1}^{n}\Bigl( (x_i-μ)^2-2(x_i-μ)(\overline{x}-μ)+(\overline{x}-μ)^2 \Bigr) \\ \\ &=&\displaystyle \sum_{i=1}^{n}(x_i-μ)^2-2\sum_{i=1}^{n}(x_i-μ)(\overline{x}-μ)+\sum_{i=1}^{n}(\overline{x}-μ)^2 \end{array}$

次、総和・期待値の定義から

$\begin{array}{llllll} \displaystyle \sum_{i=1}^{n} 1&=&\displaystyle \overbrace{1+1+1+1\cdots+1+1}^n\\ \\ &=& n \end{array}$

$\begin{array}{llllll} \displaystyle \sum_{i=1}^{n}\Bigl(x_i\Bigr)-nμ&=&\displaystyle n\left( \frac{1}{n}\sum_{i=1}^{n}\Bigl(x_i\Bigr) \right)-nμ \\ \\ &=&n(\overline{x}-μ) \end{array}$

こうなるので

$\begin{array}{llllll} \displaystyle \sum_{i=1}^{n}\Bigl( x_i-\overline{x} \Bigr)^2&=&\displaystyle\sum_{i=1}^{n}(x_i-μ)^2-2n(\overline{x}-μ)(\overline{x}-μ)+\sum_{i=1}^{n}(\overline{x}-μ)^2 \\ \\ &=&\displaystyle\sum_{i=1}^{n}(x_i-μ)^2-2n(\overline{x}-μ)^2+n(\overline{x}-μ)^2 \\ \\ &=&\displaystyle\sum_{i=1}^{n}(x_i-μ)^2-n(\overline{x}-μ)^2 \end{array}$

こうなりますから

これでやっと期待値を計算できるように。

というわけで本題に戻ると

$\begin{array}{llllll} \displaystyle E[X+Y]&=&E[X]+E[Y] \end{array}$

$\begin{array}{llllll} \displaystyle \sum_{i=1}^{n}i&=&1+2+3+4+\cdots+n \end{array}$

$\begin{array}{llllll} \displaystyle \displaystyle E[s^2]&=&\displaystyle\frac{1}{n}E\left[\sum_{i=1}^{n}(x_i-μ)^2-n(\overline{x}-μ)^2\right] \\ \\ &=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}E\left[(x_i-μ)^2\right]-n\cdot\frac{1}{n}E\left[(\overline{x}-μ)^2\right] \\ \\ &=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}E\left[(x_i-μ)^2\right]-E\left[(\overline{x}-μ)^2\right]\end{array}$

まあこうなるので

ここまでくれば後は仕上げだけ。

『分散の定義』と「標本平均の分散」から

$\begin{array}{llllll} \displaystyle E[s^2] &=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}E\left[(x_i-μ)^2\right]-E\left[(\overline{x}-μ)^2\right] \\ \\ &=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}V\left[x_i\right]-V\left[\overline{x}\right] \\ \\ &=&\displaystyle\frac{1}{n}\cdot nσ^2-\frac{σ^2}{n} \\ \\ &=&\displaystyle \frac{n-1}{n}σ^2 \end{array}$

「標本分散の期待値」が得られます。

見たまんま

この「標本分散の期待値」は

「母分散」の値とはズレていますね。

$\begin{array}{rlllll} \displaystyle E[s^2]&=&\displaystyle\frac{n-1}{n}σ^2 \\ \\ \displaystyle\frac{n}{n-1}E[s^2]&=&σ^2 \\ \\ \displaystyle E\left[\frac{n}{n-1}s^2\right]&=&σ^2 \end{array}$

最後、これでやっと「不偏分散」が求められます。

$\begin{array}{llllll} \displaystyle \displaystyle σ^2_{\mathrm{est}}&=&\displaystyle\frac{n}{n-1}s^2 \\ \\ &=&\displaystyle \frac{n}{n-1}\cdot\frac{1}{n}\sum_{i=1}^{n}(x_i-\overline{x})^2\end{array}$

$\begin{array}{llllll} \displaystyle σ^2_{\mathrm{est}}&=&\displaystyle\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\overline{x})^2 \end{array}$

計算はややこしいですが

結論はわりとシンプルです。

一致推定量 Consistent Estimate

|| 一致していく感じからの推測

「サンプル数を増やすと母数に近づく」感じ。

$\begin{array}{llllll} \displaystyle \overline{X_n}&=&\displaystyle\frac{x_1+x_2+...+x_n}{n} \end{array}$

$\begin{array}{llllll} \displaystyle \lim_{n\to\infty}\displaystyle\frac{x_1+x_2+...+x_n}{n}&=&μ \end{array}$

そのまんま『大数の法則』から得られた結論で

$\begin{array}{llllll} \displaystyle P\Bigl( |\overline{X_n}-μ|≥ε \Bigr)&≤&\displaystyle\frac{σ^2}{nε^2} \end{array}$

「確率収束」するとすれば

『標本平均』が「母平均」に近づくことは

わりと直感的にすぐ分かると思います。

標本分散の一致推定量

『標本分散』については

式変形すれば一致していくことがすぐに分かります。

$\begin{array}{llllll} \displaystyle \displaystyle s^2&=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}(x_i-\overline{x})^2 \\ \\ &=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}\Bigl( (x_i-μ)-(\overline{x}-μ) \Bigr)^2\end{array}$

これが分かっていれば

$\begin{array}{llllll} \displaystyle s^2&=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}\Bigl( (x_i-μ)-(\overline{x}-μ) \Bigr)^2 \end{array}$

$\begin{array}{llllll} \displaystyle \lim_{n\to\infty} \overline{x}&=&μ \end{array}$

$\begin{array}{llllll} \displaystyle \lim_{n\to\infty} \left( \frac{1}{n}\sum_{i=1}^{n} \Bigl( (x_i-μ)-(\overline{x}-μ) \Bigr)^2 \right) &=& σ^2 \end{array}$

この感覚は割とすぐに分かるでしょう。

よく見る分散の式の確認

期待値計算のよく見る形を確認しておきます。

$\begin{array}{llllll} \displaystyle V[X]&=&E\left[ (X-μ)^2 \right] \\ \\ &=&E[X^2]-E[X]^2 \\ \\ \\ E[x_i-μ]&=&E[x_i]-μ \\ \\ &=&\overline{x}-μ \end{array}$

分散 $V[X]$ の式変形については

$E[X]=μ$ であることから

$\begin{array}{llllll} \displaystyle V[X]&=&E[(X-μ)^2] \\ \\ &=&E[X^2-2Xμ+μ^2] \\ \\ &=&E[X^2]-2μE[X]+μ^2 \\ \\ \\ &=&E[X^2]-2E[X]E[X]+E[X]^2 \\ \\ &=&E[X^2]-2E[X]^2+E[X]^2 \\ \\ &=& E[X^2]-E[X]^2 \end{array}$

まあこんな感じに。

この形はそこそこ見るので覚えておきましょう。

有効推定量 Efficiency Estimate

|| 誤差が小さいこと

「誤差が最小になる推定量」のこと。

$\begin{array}{llllll} \displaystyle E\Bigl[(θ_{\mathrm{est}}-θ_{\mathrm{true}})^2\Bigr] \end{array}$

計算では『バイアス-バリアンス分解』

なんて名前がついてる手順を使って算出します。

二乗誤差のバイアス-バリアンス分解

入力するデータ $X=\{x_1,x_2,x_3,...,x_n\}$

このデータから得られる出力 $y=f(x)+ε$

$\begin{array}{llllll} \displaystyle y&=&f(x)+ε \\ \\ y_i&=&f(x_i)+ε \end{array}$

$ε$ を「平均 $0$ 」「分散 $σ^2$ 」

こういうデータ分布の『ノイズ』であるとすると

（分散が $0$ であればズレが無いのでノイズは無い）

$\begin{array}{llllll} \displaystyle \mathrm{Bias}\left( \hat{f}(x) \right)&=&E\Bigl[ \hat{f}(x)\Bigr]-f(x) \\ \\ \mathrm{Var}\left( \hat{f}(x) \right)&=&\displaystyle E\left[ \Bigl(\hat{f}(x)-E\left( \hat{f}(x) \right) \Bigr)^2 \right] \\ \\ \mathrm{Noise}&=&σ^2 \end{array}$

この「出力 $f(x)$ 」を推測したい場合

この $f(x)$ に「限りなく近い関数 $\hat{f}(x)$ 」を考えると

$\begin{array}{llllll} E\left[ \Bigl( \hat{f}(x)-y \Bigr)^2 \right]&=&\displaystyle \mathrm{Noise}+\Bigl(\mathrm{Bias}\left(\hat{f}(x)\right)\Bigr)^2+\mathrm{Var}\left(\hat{f}(x)\right) \end{array}$

「有効推定量」を計算するための関数は

このような形で表現されます。

ノイズはまず消せませんが

「不偏推定量」である場合

$\begin{array}{llllll} E\left[ \Bigl( \hat{f}(x)-y \Bigr)^2 \right]&=&\displaystyle \mathrm{Noise}+\mathrm{Var}\left(\hat{f}(x)\right) \end{array}$

「偏り $\mathrm{Bias}$ 」の部分を消すことができます。

式の導出

ごちゃごちゃしていて割と面倒なので

いくつかのパーツに分けて計算していきます。

$\begin{array}{llllll} &&\displaystyle E\left[ \Bigl( \hat{f}(x)-y \Bigr)^2 \right] \\ \\ &=&\displaystyle E\left[ \Bigl( \hat{f}(x)-\left( f(x)+ε \right) \Bigr)^2 \right] \\ \\ &=&\displaystyle E\left[ \Bigl( \hat{f}(x)-\left( f(x)+ε \right)+\left( E\left[ \hat{f}(x) \right]-E\left[ \hat{f}(x) \right] \right) \Bigr)^2 \right] \end{array}$

まずは最終着地から

「分散」「偏り」で分けてみたい

そんな発想からこのようにしてみて。

$\begin{array}{llllll} \displaystyle y-f(x)&=&ε \end{array}$

「誤差 $ε$ 」の定義

『平均 $0$ を基準に求められている』ことから

『全データ分を合計すると打ち消されてしまう』ので

$\begin{array}{llllll} \displaystyle E[ε]&=&0 \end{array}$

そして『求めたい関数 $f(x)$ 』

これの期待値 $E[f(x)]=f(x)$ は当然こうですから

$\begin{array}{llllll} \displaystyle E[y]&=&\displaystyle E\left[ f(x)+ε \right] \\ \\ &=&\displaystyle E\left[f(x) \right] +E[ε] \\ \\ &=&\displaystyle E\left[f(x) \right] \\ \\ &=&f(x) \\ \\ \\ \mathrm{Var}[y]&=&\displaystyle E\left[ (y-E[y])^2 \right] \\ \\ &=&\displaystyle E\left[ \left(y-f(x) \right)^2 \right] \\ \\ &=&\displaystyle E\left[ \left(f(x)+ε-f(x)\right)^2 \right] \\ \\ &=&\displaystyle E\left[ ε^2 \right] \end{array}$

$\begin{array}{llllll} \displaystyle \mathrm{Var}[ x ]&=&E\Bigl[ (x-μ)^2 \Bigr] \\ \\ &=&E\Bigl[ x^2-2μx+μ^2 \Bigr] \\ \\ &=&\displaystyle E[x^2]-2μE[x]+μ^2 \\ \\ \\ &=&\displaystyle E[x^2]-2μμ+μ^2 \\ \\ &=&\displaystyle E[x^2]-μ^2 \\ \\ &=&\displaystyle E[x^2]-\Bigl( E[x] \Bigr)^2 \end{array}$

$\begin{array}{llllll} \displaystyle E[ε^2]&=&\displaystyle \mathrm{Var}[ ε ]+\Bigl( E[ε] \Bigr)^2 \\ \\ &=&\displaystyle \mathrm{Var}[ ε ] \\ \\ &=&\displaystyle σ^2 \end{array}$

「誤差」「分散」の定義から

消したり省略したり置き換えたり

計算で楽ができそうな部分が分かります。

$\displaystyle E\left[ \Bigl( \hat{f}(x)-\left( f(x)+ε \right)+\left( E\left[ \hat{f}(x) \right]-E\left[ \hat{f}(x) \right] \right) \Bigr)^2 \right]$

ただこれ、すごい面倒です。

このままやるとすごく大変なので

どうにか式全体を簡単にしたいところですが。

一応、指標はあって

$\begin{array}{llllll} &&\displaystyle \Bigl( \hat{f}(x)-\left( f(x)+ε \right)+\left( E\left[ \hat{f}(x) \right]-E\left[ \hat{f}(x) \right] \right) \Bigr)^2 \\ \\ &=&\displaystyle \Bigl( \hat{f}(x)-E\left[ \hat{f}(x) \right]+E\left[ \hat{f}(x) \right]- f(x)-ε \Bigr)^2 \end{array}$

それで整理したいんですけど、

それでも項が多くてだいぶ複雑ですね。

$\begin{array}{llllll} \displaystyle A&=&\displaystyle\hat{f}(x)-E\left[ \hat{f}(x)\right] \\ \\ B&=&\displaystyle E\left[ \hat{f}(x) \right]- f(x) \end{array}$

$\begin{array}{llllll} \displaystyle \displaystyle \Bigl( A+B-ε \Bigr)^2 &=&A\Bigl( A+B-ε \Bigr) \\ \\ &&+B\Bigl( A+B-ε \Bigr) \\ \\ &&-ε\Bigl( A+B-ε \Bigr) \\ \\ \\ &=& A^2+AB-Aε \\ \\ &&+BA+B^2-Bε \\ \\ &&-εA-εB+ε^2 \\ \\ \\ &=&A^2+B^2+ε^2 \\ \\ &&+2AB-2ε(A+B) \end{array}$

まあそんなわけなので

とりあえず記述を簡略化してみます。

$\begin{array}{llllll} \displaystyle \displaystyle \Bigl( A+B-ε \Bigr)^2 &=&A^2+B^2+ε^2 \\ \\ &&+2AB-2ε(A+B) \end{array}$

するとちょっとだけ見やすくなりましたが

これだけだとまだ大変です。

なのでとりあえず分かるところから

$\begin{array}{llllll} \displaystyle E[ε]&=&0 \end{array}$

$\begin{array}{llllll} \displaystyle A&=&\displaystyle\hat{f}(x)-E\left[ \hat{f}(x)\right] \\ \\ B&=&\displaystyle E\left[ \hat{f}(x) \right]- f(x) \end{array}$

$\begin{array}{llllll} \displaystyle A+B&=&\hat{f}(x) - f(x)\end{array}$

「予測 $\hat{f}(x)$ と実際 $f(x)$ の差」と

「 $f(x)$ のデータのズレ $ε$ 」は無相関なので

$\begin{array}{llllll} \displaystyle E[-2ε(A+B)] &=&-2E[ε(A+B)] \\ \\ &=&-2E[ε]E[A+B] \\ \\ &=&0 \end{array}$

まずこの部分を削り

$\begin{array}{llllll} \displaystyle A&=&\displaystyle\hat{f}(x)-E\left[ \hat{f}(x)\right] \end{array}$

次いで $AB$ に関して

バイアス $B$ の式変形から

$\begin{array}{llllll} \displaystyle E[θ]&=&θ \\ \\ E\left[E[θ] \right]&=&E[θ] \end{array}$

$\begin{array}{llllll} \displaystyle B&=&\displaystyle E\left[ \hat{f}(x) \right]- f(x) \\ \\ &=&\displaystyle E\left[ \hat{f}(x) \right]- E\left[f(x)\right] \\ \\ \\ &=& \displaystyle E\left[ \hat{f}(x) - f(x)\right] \end{array}$

$\begin{array}{llllll} \displaystyle E[B]&=&\displaystyle E\left[ \displaystyle E\left[ \hat{f}(x) - f(x)\right]\right] \\ \\ &=&\displaystyle E\left[ \hat{f}(x) - f(x)\right] \\ \\ &=&B \end{array}$

こうなることと

$\begin{array}{llllll} \displaystyle A&=&\displaystyle\hat{f}(x)-E\left[ \hat{f}(x)\right] \\ \\ B&=&\displaystyle E\left[ \hat{f}(x) \right]- f(x) \end{array}$

「 $f(x)$ と比較してのバイアス」と

「 $\hat{f}(x)$ のデータごとのズレ」の間

つまり $A$ と $B$ の間には相関が無いこと

$\begin{array}{llllll} \displaystyle E[AB]&=&E[A]E[B] \end{array}$

$\begin{array}{llllll} \displaystyle &&\displaystyle\frac{1}{n}\sum_{i=1}^{n}(x_i-μ) \\ \\ &=&\displaystyle\frac{1}{n}\Bigl( (x_1-μ)+(x_2-μ)+\cdots+(x_n-μ) \Bigr) \\ \\ \\ &=&\displaystyle\frac{1}{n}\Bigl( x_1+x_2+\cdots+x_n -nμ\Bigr) \\ \\ &=&\displaystyle\frac{1}{n}\Bigl( x_1+x_2+\cdots+x_n \Bigr)-\displaystyle\frac{1}{n}\Bigl(nμ\Bigr) \\ \\ &=&μ-μ \end{array}$

そして「平均からのズレの期待値」

２乗にしない分散のような値は $0$ になることから

$\begin{array}{llllll} \displaystyle E[A]&=&\displaystyle E\left[ \hat{f}(x)-E\left[ \hat{f}(x)\right] \right] \\ \\ &=&0 \end{array}$

この部分も消すことができてしまいます。

$\begin{array}{llllll} \displaystyle A^2+B^2+ε^2 \end{array}$

するとこの部分が残って

後は $A^2+B^2$ を整理すればいいので

$\begin{array}{llllll} \displaystyle A&=&\displaystyle\hat{f}(x)-E\left[ \hat{f}(x)\right] \\ \\ B&=&\displaystyle E\left[ \hat{f}(x) \right]- f(x) \end{array}$

この「バイアスの期待値」は

$\begin{array}{llllll} \displaystyle E[B^2]&=&E[B]E[B] \\ \\ &=&BB \\ \\ &=&\displaystyle\left(\mathrm{Bias}(\hat{f}(x))\right)^2 \end{array}$

このようになるため

この形を参考に整理すると

$\begin{array}{llllll} \displaystyle E[A^2]&=&\displaystyle \mathrm{Var}\left(\hat{f}(x)\right) \\ \\ E[B^2]&=&\Bigl(\mathrm{Bias}\left(\hat{f}(x)\right)\Bigr)^2 \\ \\ E[ε^2]&=&\mathrm{Noise} \end{array}$

このように書けることから

$\begin{array}{llllll} E\left[ \Bigl( \hat{f}(x)-y \Bigr)^2 \right]&=&\displaystyle \mathrm{Noise}+\Bigl(\mathrm{Bias}\left(\hat{f}(x)\right)\Bigr)^2+\mathrm{Var}\left(\hat{f}(x)\right) \end{array}$

気付けば、良い感じの結論に

こうして辿り着くことができます。

推定量 Estimate

目次

推定値 Estimated Value

誤差 Error

不偏性 Unbiased

偏り Bias

一致性 Consistency

有効性 Efficiency

頑健性 Robustness

不偏推定量 Unbiased Estimate

標本平均と平均の不偏推定量

不偏分散

一致推定量 Consistent Estimate

標本分散の一致推定量

よく見る分散の式の確認

有効推定量 Efficiency Estimate

二乗誤差のバイアス-バリアンス分解

式の導出