Content caching is an effective technique to alleviate peak-hour traffic congestion, reduce backhaul pressure, and improve user perceived experience in wireless networks. The main design issue of content caching is where, when, and what content to cache. The past few years have witnessed the extensive progress on addressing this issue from both the information-theoretic perspective and the optimization perspective by assuming that content popularity or user preference is given in advance. In this talk, we will investigate this issue from the machine learning perspective in practice when content popularity and user preference are unknown and may change over time. In specific, we formulate the caching optimization problem in multi-cell wireless networks as a multi-agent reinforcement learning framework. Both multi-agent Q-learning-based and multi-agent multi-armed bandit-based algorithms will be designed by taking into account the nature of base station cooperation in wireless networks. Simulation results based on real-world data set will demonstrate the advantage of our proposed learning algorithms.