HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
High-quality instruction-based image editing dataset, utilizing advanced foundation models for data collection and proposing new evaluation metrics.
HQ-Edit offers a robust dataset focusing on instruction-based image editing tasks aimed to enhance the capabilities of image editing models. It comprises approximately 200,000 high-quality edits, meticulously curated using advanced foundation models such as GPT-4V and DALL-E 3.
Unlike previous approaches reliant on attribute guidance or human feedback for dataset creation, HQ-Edit adopts a scalable data collection pipeline leveraging cutting-edge foundation models. The dataset is enriched with diverse examples sourced online, expanded, and meticulously curated to create diptych images featuring detailed text prompts for both input and output images.
To ensure the high quality and alignment of paired images, the dataset undergoes meticulous post-processing, involving techniques such as decomposition, warping, and filtering. The methodological approach involves leveraging the GPT-4 model for generating diverse examples, covering a wide spectrum of human characteristics, objects, backgrounds, and editing attributes. Subsequent steps involve rewriting prompts to enhance clarity and conciseness, as well as creating diptych prompts to guide the image generation process effectively.
Comments
None