決定木の関数計算 - ptoolisの日記

最近スタンフォード大学のフリードマン教授が書いた決定木に関する論文を読んでいます。勾配ブースティングモデルという方法が最適な決定木関数をデータから由来する。
一つな曖昧なところは改善関数です。The formula for improvement of a given split is I^2 = *1*(ymeanl-ymeanr)^2
Here the wl and wr variables are sums over all nodes of each sub tree of p(1-p). 他の分かりにくい説明は部分集合の話です。This describes the partial dependence of a model on a subset of the input vector, ie a function of only a subset of the inputs (with the other values assigned constant values). How the weights above are derived is unclear. Upon reexamination, these are the sums of p(1-p) at the left and right terminal nodes from a prospective split. Later, a weighted traversal is undertaken to demonstrate a partial dependency relationship. Partial dependencyは日本語で部分的な依存でしょう。
勾配ブースティングは弱い学習者を合わせて強いモデルを作るアルゴリズムです。基本的な計算式は以下です。
Fm(x) = F(m-1)(x) + v*pm*h(x;am)
Here Fm is the model at iteration m, v is a shrinkage parameter, and h is a tree function of the input x and some multiplier am.

*1:wl*wr)/(wl+wr