this post was submitted on 26 Oct 2023
5 points (72.7% liked)

Futurology

1813 readers
197 users here now

founded 1 year ago
MODERATORS
top 1 comments
sorted by: hot top controversial new old
[–] NOT_RICK@lemmy.world 1 points 1 year ago

Here’s the how:

The stages of Woodpecker work in harmony to validate and correct any inconsistencies between image content and generated text. First, it identifies the main objects mentioned in the text. Then, it asks questions around the extracted objects, such as their number and attributes. The framework answers these questions using expert models in a process called visual knowledge validation. Following this, it converts the question-answer pairs into a visual knowledge base consisting of object-level and attribute-level claims about the image. Finally, Woodpecker modifies the hallucinations and adds the corresponding evidence under the guidance of the visual knowledge base.