HOMEto discover JOINING USto achieve PUBLICATIONSto innovate GRANTSto establish ACTIVITIESto engage PEOPLEto collaborate TEACHINGto inspire CONTACTSto explore

Controllable Image Generation with Composed Parallel Token Prediction

Jamie Stirling, Noura Al Moubayed, Chris G. Willcocks and Hubert P. H. Shum
Proceedings of the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2026

Abstract

Conditional discrete generative models struggle to faithfully compose multiple input conditions. To address this, we derive a theoretically-grounded formulation for composing discrete probabilistic generative processes, with masked generation (absorbing diffusion) as a special case. Our formulation enables precise specification of novel combinations and numbers of input conditions that lie outside the training data, with concept weighting enabling emphasis or negation of individual conditions. In synergy with the richly compositional learned vocabulary of VQ-VAE and VQ-GAN, our method attains a 63.4\% relative reduction in error rate compared to the previous state-of-the-art, averaged across 3 datasets (positional CLEVR, relational CLEVR and FFHQ), simultaneously obtaining an average absolute FID improvement of -9.58. Meanwhile, our method offers a 2.3x to 12x real-time speed-up over comparable methods, and is readily applied to an open pre-trained discrete text-to-image model for fine-grained control of text-to-image generation.

Downloads

Paper (6.9MB)

Supplementary Material (1.9MB)

arXiv

Cite This Research

Plain Text

Jamie Stirling, Noura Al Moubayed, Chris G. Willcocks and Hubert P. H. Shum, "Controllable Image Generation with Composed Parallel Token Prediction," in Proceedings of the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, pp. 5074-5083, Colorado, USA, IEEE/CVF, 2026.

BibTeX

@inproceedings{stirling26controllable,
author={Stirling, Jamie and Moubayed, Noura Al and Willcocks, Chris G. and Shum, Hubert P. H.},
booktitle={Proceedings of the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop},
title={Controllable Image Generation with Composed Parallel Token Prediction},
year={2026},
pages={5074--5083},
publisher={IEEE/CVF},
location={Colorado, USA},
}

RIS

TY  - CONF
AU  - Stirling, Jamie
AU  - Moubayed, Noura Al
AU  - Willcocks, Chris G.
AU  - Shum, Hubert P. H.
T2  - Proceedings of the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop
TI  - Controllable Image Generation with Composed Parallel Token Prediction
PY  - 2026
SP  - 5074
EP  - 5083
PB  - IEEE/CVF
ER  -

Similar Research

Jamie Stirling, Noura Al Moubayed and Hubert P. H. Shum, "Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images", Proceedings of the 2026 International Conference on Pattern Recognition (ICPR), 2026

Jiaxu Liu, Li Li, Hubert P. H. Shum and Toby P. Breckon, "TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with State Space Model", Proceedings of the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2026

Shuang Chen, Amir Atapour-Abarghouei and Hubert P. H. Shum, "HINT: High-quality INpainting Transformer with Mask-Aware Encoding and Enhanced Attention", IEEE Transactions on Multimedia (TMM), 2024

Shuang Chen, Amir Atapour-Abarghouei, Haozheng Zhang and Hubert P. H. Shum, "MxT: Mamba x Transformer for Image Inpainting", Proceedings of the 2024 British Machine Vision Conference (BMVC), 2024

HomeGoogle ScholarLinkedInYouTubeGitHubORCIDResearchGateEmail

Last updated on 3 July 2026
RSS Feed