Abstract:With the popularity of intelligent terminals, the demand of text topic mining is becoming more prevalent in many different domains. Theme modeling is the kernel of text topic mining. LDA (latent Dirichlet allocation) generating model is a probability model based on Bayesian framework, and it solves the problem of text potential topic extraction based on semantic association. The key technology of text clustering process, including LDA generating model, data sampling, model evaluation, was described and analyzed in depth. Theme discovery and clustering experiments were carried out in 2 794 learning journals on the network education platform. A thesaurus containing 3 800 terms was established. The problem of topic clustering was solved by kmeans algorithm and UVM (union vector method) algorithm in two steps. Meanwhile a general method of text mining experiment was proposed, and the algorithm of text distance in hierarchical clustering was improved. The experimental results show that the overall similarity of topics in the platform is good, but the focus of topics makes the content of many journals not identifiable, which affects the user's positioning of topics.