End-to-End Scene Text Spotting at Character Level
Abstract
This work utilizes the new object detection framework, namely Detection using Transformers (DETR), to spot the characters in unconstrained environments (i.e., in the wild), which offers simpler and robust end-to-end architecture than the previous methods.
The proposed framework leverages an adaptive feature extraction to better focus on the position of character regions and a bounding box loss function that is more precise in spotting characters with different scales and aspect ratios.
To evaluate our proposed architecture's effect, we conduct experiments on the ICDAR benchmark designed explicitly for character-level text detection, namely the ICDAR13 dataset.
Experimental results show that the proposed method outperforms the state-of-the-art detectors when tested on the mentioned benchmark.