RDA: A Sparse Optimization Method for Deep Neural Networks
*晓东 贾 (北京大学)
We propose a new sparse optimization method for the neural network models in deep learning. This method can obtain more sparse solutions than traditional optimization methods such as proximal-SGD, while keeping almost the same accuracy. Our method is based on the regularized dual averaging (RDA) methods, which have been proven to be effective in obtaining sparse solutions in convex optimization problems, but have not been applied to deep learning fields before.