ゲーム理論のアルゴリズム

最近トップコーダーオープン問題に苦しんでいる。深さ優先探索の複雑性がを抜けるためにグリーデイ方法を使ったけど間違いの所が出ました。
An attempt was made for a simple greedy strategy by playing the game in reverse using the max card in either player's deck at each step. While this covers many test cases correctly, it seems to be 1 short of the forward-iterated optimum for some cases. 他人の正解を見て考え直せば動的計画法で解ける。The top solution from user gullesnuffs utilized dynamic programming on the turn number and top card of the deck.
簡単すぎる気がします。
It seems that dynamic programming discounts the composition of the decks and the order of placement or non-placement. The strategy assumes that there is only one optimal sequence of moves leading to a given (top,turn num) pair.
動的計画法の正解はすぐ思い出す技ではありません。
The key insight is that the dynamic programming can consider only the turn number and top of the deck as identifying indices, and visit each state at most d times where d is the length of one player's deck. In each turn, either a placement of any of the cards or a non-placement can occur. The player whose turn it is is based on the index (i%2==0->petr, i%2==1->snuke).
The non-placement is non-intuitive, as state (i+1,j) can be reached either via non-placement at (i,j) or placement at (i,j') where j'