In this study, a learning-based scale estimation technique is proposed to enable quantitative evaluation of inspection regions. The underlying idea is that surface texture of structures (i.e. bridges or buildings) captured on images contains the scale information of the corresponding images, which is represented by pixel per physical dimension (e.g., mm, inch). This allows training a regression model that provides a relationship between surface textures on images and their corresponding scales. Deep convolutional neural network is used to extract scale-related features from the texture patches and estimate their scales. The trained model can be exploited to estimate scales for all images captured from structure surfaces that have similar textures. The capability of the proposed technique is fully demonstrated using data collected from surface textures of three different structures and achieves an overall average scale estimation error of less than 15%.