Elimination ordering is often the key determinant of runtime and decomposability in sparse triangular decomposition for steady-state polynomial systems, whereas existing graph-guided methods rely primarily on graph legality and therefore fail to capture prefix-dependent algebraic risk. We formulate elimination ordering as a risk-aware sequential decision problem, construct state-wise action supervision by evaluating complete candidate orderings under a unified terminal Maple backend, and train a GRPO-style pairwise ranking policy on Proxy States combining prefix, graph, and algebraic summaries. On a repaired and audited benchmark containing 141 systems in total, with 13 systems reserved exclusively for training-data construction, the held-out comparison shows that the learned policy completes more systems than the baseline method, yields fewer timeouts, and is faster on more systems completed by both methods. These results show that a risk-aware approach provides a stronger overall treatment of elimination ordering and can produce higher-quality complete orderings under a shared algebraic backend.