Abstract:To address the issues in traditional monocular depth estimation—where relative estimation methods lose scale information, metric estimation methods suffer from insufficient edge precision, and existing depth networks have large parameter counts and high computational costs—a two-stage fusion framework called LacDepth was proposed. This framework aimed to fuse metric and relative estimation methods through two stages. In the first stage, the deep residual pyramid module adopted a multi-scale Laplacian residual compensation mechanism and effectively improved the geometric fidelity of edge contours via a high-frequency feature enhancement strategy. In the second stage, the lightweight attractor-driven classifier constructed a three-level cascaded depth interval prediction network, established a pixel-level probability density function based on the conditional log-binomial distribution, and realized sub-interval fine-tuning of relative depth values through differentiable weighting. Experimental results show that LacDepth achieves the best comprehensive performance on the KITTI dataset, with an average relative error of 0.059 and a parameter count of 9.8×106, demonstrating significant advantages in balancing precision and efficiency.