CM2026:P000088

TransFilm-Former: Spectral-Spatial Prototype Query Transformer for Transparent Film Detection

*Qianlei Wang (Chengdu Institute of Computer Applications)
Xiaolin Qin (Chengdu Institute of Computer Applications)

Automated detection of transparent films is exceptionally challenging due to low optical contrast, extreme light transmission, and severe background artifacts like periodic machine marks. Existing spatial-domain networks and standard query-based transformers struggle to isolate faint transparent regions from homogeneous noise. Furthermore, frequency-domain methods using wavelet transforms inherently sacrifice spatial resolution and lack flexibility against complex global patterns. To address these bottlenecks, we propose TransFilm-Former, a novel Spectral-Spatial Prototype Query Transformer. Our framework features a Dual-Domain Transformer Decoder that enriches query representations. In the frequency domain, the DFT-enhanced Cross-Attention (DCA) module leverages 2D Fast Fourier Transforms for global magnitude modulation, effectively attenuating periodic noise and amplifying target film signals without losing spatial resolution. In the spatial domain, the Prototype-guided Cross-Attention (PCA) module uses adaptive clustering to extract semantic foreground prototypes, drastically reducing spatial redundancy during query interaction. Extensive experiments on challenging film detection datasets demonstrate that TransFilm-Former achieves state-of-the-art performance, accurately localizing transparent films in complex environments.

Automated detection of transparent films is exceptionally challenging due to low optical contrast, extreme light transmission, and severe background artifacts like periodic machine marks. Existing spatial-domain networks and standard query-based transformers struggle to isolate faint transparent regions from homogeneous noise. Furthermore, frequency-domain methods using wavelet transforms inherently sacrifice spatial resolution and lack flexibility against complex global patterns. To address these bottlenecks, we propose TransFilm-Former, a novel Spectral-Spatial Prototype Query Transformer. Our framework features a Dual-Domain Transformer Decoder that enriches query representations. In the frequency domain, the DFT-enhanced Cross-Attention (DCA) module leverages 2D Fast Fourier Transforms for global magnitude modulation, effectively attenuating periodic noise and amplifying target film signals without losing spatial resolution. In the spatial domain, the Prototype-guided Cross-Attention (PCA) module uses adaptive clustering to extract semantic foreground prototypes, drastically reducing spatial redundancy during query interaction. Extensive experiments on challenging film detection datasets demonstrate that TransFilm-Former achieves state-of-the-art performance, accurately localizing transparent films in complex environments.

第十六届中国数学会计算机数学大会

TransFilm-Former: Spectral-Spatial Prototype Query Transformer for Transparent Film Detection