Bounding boxes

### Search before asking

- [x] I have searched the Multimodal Maestro [issues](https://github.com/roboflow/multimodal-maestro/issues) and found no similar feature requests.


### Description

As far as I know, Qwen2.5-VL is the first open source multimodal model that can extract bounding boxes.

e.g. from https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb:

![Image](https://github.com/user-attachments/assets/b09ed9d6-3369-4038-94d0-84a6ad24e329)

It would be great to support this so that other models can support this as well.

### Use case

We would use this for generative process automation in https://github.com/OpenAdaptAI/OpenAdapt

### Additional

_No response_

### Are you willing to submit a PR?

- [x] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounding boxes #138

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bounding boxes #138

Description

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions