HOMEto discover JOINING USto achieve PUBLICATIONSto innovate GRANTSto establish ACTIVITIESto engage PEOPLEto collaborate TEACHINGto inspire CONTACTSto explore

Post-Doctoral Research Associate Postion Available

We are recruiting a Post-Doctoral Research Associate in Computer Vision and Artificial Intelligence. Deadline: 17th May 2026. More information here.

Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images

Jamie Stirling, Noura Al Moubayed and Hubert P. H. Shum
Proceedings of the 2026 International Conference on Pattern Recognition (ICPR), 2026

H5-Index: 68^#

# According to Google Scholar 2026

Abstract

Vector quantization approaches (VQ-VAE, VQ-GAN) learn discrete neural representations of images, but these representations are inherently position-dependent: codes are spatially arranged and contextually entangled, requiring autoregressive or diffusion-based priors to model their dependencies at sample time. In this work, we ask whether positional information is necessary for discrete representations of spatially aligned data. We propose the permutation-invariant vector-quantized autoencoder (PI-VQ), in which latent codes are constrained to carry no positional information. We find that this constraint encourages codes to capture global, semantic features, and enables direct interpolation between images without a learned prior. To address the reduced information capacity of permutation-invariant representations, we introduce matching quantization, a vector quantization algorithm based on optimal bipartite matching that increases effective bottleneck capacity by 3.5X relative to naive nearest-neighbour quantization. The compositional structure of the learned codes further enables interpolation-based sampling, allowing synthesis of novel images in a single forward pass. We evaluate PI-VQ on CelebA, CelebA-HQ and FFHQ, obtaining competitive precision, density and coverage metrics for images synthesised with our approach. We discuss the trade-offs inherent to position-free representations, including separability and interpretability of the latent codes, pointing to numerous directions for future work.

Downloads

Paper (1.0MB)

Supplementary Material (0.1MB)

arXiv

Cite This Research

Plain Text

Jamie Stirling, Noura Al Moubayed and Hubert P. H. Shum, "Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images," in Proceedings of the 2026 International Conference on Pattern Recognition, Lyon, France, 2026.

BibTeX

@inproceedings{stirling26investigating,
author={Stirling, Jamie and Moubayed, Noura Al and Shum, Hubert P. H.},
booktitle={Proceedings of the 2026 International Conference on Pattern Recognition},
title={Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images},
year={2026},
location={Lyon, France},
}

RIS

TY  - CONF
AU  - Stirling, Jamie
AU  - Moubayed, Noura Al
AU  - Shum, Hubert P. H.
T2  - Proceedings of the 2026 International Conference on Pattern Recognition
TI  - Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images
PY  - 2026
ER  -

Similar Research

Shuang Chen, Amir Atapour-Abarghouei and Hubert P. H. Shum, "HINT: High-quality INpainting Transformer with Mask-Aware Encoding and Enhanced Attention", IEEE Transactions on Multimedia (TMM), 2024

Jiaxu Liu, Li Li, Hubert P. H. Shum and Toby P. Breckon, "TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with State Space Model", Proceedings of the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2026

Shuang Chen, Amir Atapour-Abarghouei, Haozheng Zhang and Hubert P. H. Shum, "MxT: Mamba x Transformer for Image Inpainting", Proceedings of the 2024 British Machine Vision Conference (BMVC), 2024

Shuang Chen, Haozheng Zhang, Amir Atapour-Abarghouei and Hubert P. H. Shum, "SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM", Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

HomeGoogle ScholarLinkedInYouTubeGitHubORCIDResearchGateEmail

Last updated on 4 May 2026
RSS Feed