The characteristic set method, discovered by Joseph Fels Ritt and Wen-Ts\"un Wu, has become an important tool for automatic theorem proving and polynomial system solving, and found wide applications in many areas. In this work, we propose the first work on learning the computation of the Ritt-Wu characteristic set of a system of polynomial equations via the Transformer model. Unlike existing works on learning to directly compute Gr\"obner bases, which is an important tool for polynomial system solving as characteristic sets, via Transformers, we leverage the pre-training and fine-tuning paradigm to firstly train Transformer models to learn basic operations like polynomial multiplication and pseudo division and then fine-tune the pre-trained models to enhance the learning of characteristic sets. Our approach follows the same philosophy as the recent work on learning to choose the optimal variable ordering for cylindrical algebraic decomposition, where both work are based on the view that the object to learn requires a ``deep'' computation graph to acquire and it is thus important to learn the basic operations associated to the node of the graph. For the learning of polynomial multiplication, we propose a multi-stage learning method. Firstly, a state-of-the-art unsigned integer accumulation model is extended to the case of signed integer accumulation. Secondly, the learning of polynomial multiplication is decomposed into multiple stages: an intermediate representation of the product is first learned, namely the one after collecting the coefficients of like terms during multiplication; the collected coefficients are then fed into the signed integer accumulation model; finally, the predicted combined coefficients are substituted back into the aforementioned collected coefficients in the intermediate representation to obtain the final product. Experimental results show that, compared with directly learning the two-polynomial product, the proposed method improves greatly the prediction accuracy (even from $0\%$ to $99\%$ on one challenging dataset). For the learning of pseudo division, we propose a transfer learning method based on fine-tuning a pre-trained multiplication model as well as three scratchpad-assisted learning schemes with different granularities. Experimental results show that fine-tuning method improves the accuracy of pseudo-division learning, with the best case increasing from 2.26\% to 97.30\%; the finer the Scratchpad, namely, the richer the intermediate-process hints provided during training, the higher the prediction accuracy of the model, and the finest-grained Scratchpad achieves accuracies above 96\% on both univariate and multivariate pseudo division tasks. On the task for learning characteristic set, the method of fine-tuning the pseudo division model gains an improvement on the accuracies by 2.9\% to 21.9\% for difficult datasets compared with direct learning; and, compared with the classical symbolic solver wsolve for characteristic set computation, Transformer-based parallel inference with batch size $\leq 512$ achieves a speedup of 1.27-7.15 on the tested datasets.