Default to optimal confidence from model-eval by leeclemnet · Pull Request #2206 · roboflow/inference

leeclemnet · 2026-04-07T15:18:07Z

What does this PR do?

Model eval calculates F1-optimal confidence thresholds but they aren't currently used for model inference. This PR together with https://github.com/roboflow/roboflow/pull/11053 makes it so. This feature applies only to the inference_models pathway. The legacy inference pathways keep their existing default confidence.

The key changes are

Wire recommendedParameters from model_eval through to inference

New RecommendedParameters pydantic model in inference_models/weights_providers/entities.py (confidence, per_class_confidence); parsed from Roboflow API in weights_providers/roboflow.py and threaded through auto_loaders/core.py → initialize_model
Auto-loader injects it onto the model instance via hasattr(type(model), "recommended_parameters"), so model classes opt in by declaring a class-level recommended_parameters: Optional[RecommendedParameters] = None

New ConfidenceFilter and post_process_with_confidence_filter() wrapper (inference_models/models/base/)

Encodes the 4-tier priority chain: explicit user → per-class optimal → global optimal → hardcoded default
Wrapper rewrites the confidence kwarg to the filter's floor before calling post_process (so NMS keeps boxes any class might still want), then refines per-class on the way out
Added on ObjectDetectionModel, InstanceSegmentationModel, KeypointsDetectionModel, SemanticSegmentationModel, MultiLabelClassificationModel. Single-label classification deliberately opts out (top-1 always wins)

Inference adapters route through the new wrapper (inference/core/models/inference_models_adapters.py)

OD/IS/KP/multi-label classification adapters call post_process_with_confidence_filter instead of post_process
Multi-label response builder now reads prediction.class_ids directly instead of re-thresholding the full confidence vector — the model's per-class decision survives to the API response

API request schema (inference/core/entities/requests/inference.py)

ObjectDetectionInferenceRequest.confidence default flipped from 0.4 → None so model-eval recommendations can take effect; explicit user values still win. Description updated to document the fallback chain

OLD inference path compatibility (inference/core/models/{object_detection,instance_segmentation,classification}_base.py)

Coalesce confidence is None → existing per-class default at the entry of infer() / make_response(), so the USE_INFERENCE_MODELS=false matrix variant still works after the request default flipped to None

Plumbing

BackendType moved from auto_loaders/entities.py → weights_providers/entities.py to break a circular import introduced by the new schema
CI workflow integration_tests_workflows_x86.yml overlays a local source build of inference_models (pip install --force-reinstall --no-deps ./inference_models) so adapter changes are actually exercised against pinned PyPI requirements

Dependencies:

https://github.com/roboflow/roboflow/pull/11053 <- non-blocking, inference will fall back to existing 0.4 default confidence for model docs without recommendedParameters

Type of Change

New feature (non-breaking change that adds functionality)

Testing

I have tested this change locally
I have added/updated tests for this change

Test details:

New unit tests: test_confidence_filter.py, test_confidence_filter_attribute.py, test_post_process_filter.py, test_recommended_parameters.py, plus expanded test_roboflow.py and test_core.py coverage
Tested rfdetr OD workflow in staging against local inference server without https://github.com/roboflow/roboflow/pull/11053 deployed - verify the hard-coded default is still in effect even if the API doesn't yet serve the recommendedParameters

Debug logging:

ConfidenceFilter: tier 4 (hardcoded default), floor=0.4000, fallback=0.4000, per_class=None

Tested rfdetr OD workflow in staging against local inference server with https://github.com/roboflow/roboflow/pull/11053 deployed

Debug logging:

ConfidenceFilter: tier 2 (per-class), floor=0.2000, fallback=0.3600, per_class={'bishop': 0.5, 'black-bishop': 0.26, 'black-king': 0.75, 'black-knight': 0.22, 'black-pawn': 0.47, 'black-queen': 0.24, 'black-rook': 0.27, 'white-bishop': 0.67, 'white-king': 0.46, 'white-knight': 0.33, 'white-pawn': 0.47, 'white-queen': 0.2, 'white-rook': 0.45}

Tested yolo26 IS workflow in staging against local inference server with https://github.com/roboflow/roboflow/pull/11053 deployed

Debug logging:

ConfidenceFilter: tier 2 (per-class), floor=0.4600, fallback=0.4600, per_class={'Car-rims': 0.46, 'music-note': 0.5}

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

leeclemnet · 2026-04-14T18:03:07Z

            )
        )
+        if confidence_filter.has_per_class_refinement:
+            results = [


This one needs to stay as a second loop since post_process_predictions_for_precomputed_embeddings is also used by OWLv2

leeclemnet · 2026-04-14T19:05:00Z

Changes:

Moved ConfidenceFilter into common post_processing
ConfidenceFilter now accepts per-model default confidence and uses it as the fallback if no other confidence is set
Optimization: refactored prediction filters and moved them into the existing post-process loops (except Instant because of shared post-processing method with OWLv2) -- now only one loop over detections instead of two
Made confidence Optional[float] = None on concrete classes to match bases

PawelPeczek-Roboflow · 2026-04-15T09:19:04Z

+        call when this is False."""
+        return self._per_class is not None
+
+    def passes(self, class_name: str, confidence: float) -> bool:


is passes(...) actually used outside?
also - the name is misleading - it looks like it should return false if confidence is below global floor, regardless of the self._per_class presence

if not used - I am voting to turn into private helper

PawelPeczek-Roboflow · 2026-04-15T09:19:40Z

+                if global_optimal is not None
+                else default_confidence
+            )
+            self._per_class = dict(per_class)


is dict(...) needed?

PawelPeczek-Roboflow · 2026-04-15T09:31:50Z

+      3. Global optimal — single threshold for everything
+      4. Model's hardcoded default — single threshold for everything
+
+    Exposes:


try to reduce comments that express interface meaning in natural language - interface should be interpretable easily looking at the code or redesigned such that the reader builds good intuitions just looking at the signarures

PawelPeczek-Roboflow · 2026-04-15T09:33:54Z

+        also handles the no-refinement case (returns all-True)."""
+        n = len(class_ids)
+        if not self.has_per_class_refinement:
+            return torch.ones(n, dtype=torch.bool)


retracted
still haven't consumed all the PR, but why wouldn't just shortcut with confidences >= torch.full_like(confidences, self._fallback)
name of the method requires additional clarification in the comment increasing the cognitive load

I would make this private helper - maybe not worth creating mask and filtering with this mask when will be all True and simply always return original object?

PawelPeczek-Roboflow · 2026-04-15T09:38:34Z

+            class_id=detections.class_id[keep],
+            confidence=detections.confidence[keep],
+            image_metadata=detections.image_metadata,
+            bboxes_metadata=detections.bboxes_metadata,


looks like bboxes_metadata not filtered?
maybe we should re-use other method?

PawelPeczek-Roboflow · 2026-04-15T09:39:26Z

+            bboxes_metadata=bboxes_metadata,
+        )
+
+    def refine_keypoints_and_detections(


from what I remember Detections production is optional for keypoints models - we shall probably reflect that, or let user decide to refine separately

PawelPeczek-Roboflow · 2026-04-15T09:40:54Z

+        )
+        return refined_keypoints, refined_detections
+
+    def refine_multilabel_prediction(


retracted
use of passes(...) which only works with per-class refinement

PawelPeczek-Roboflow · 2026-04-15T09:47:49Z

+            image_metadata=prediction.image_metadata,
+        )
+
+    def refine_segmentation_result(


retracted
this method looks aligned to what I would expect (auto-fallback even if client does not care about has_per_class_refinement) and at the same time not aligned to other methods creating inconsistency

maybe we should check if class alignment is there and only apply if present?

PawelPeczek-Roboflow · 2026-04-15T09:50:55Z

+        safe_idx = class_ids_long.clamp(0, max(len(class_names) - 1, 0))
+        per_detection_thresholds = torch.where(
+            in_range,
+            thresholds_per_class[safe_idx] if len(class_names) > 0 else torch.full_like(confidences, self._fallback),


maybe empty class names deserves shortcut earlier in the logic?

PawelPeczek-Roboflow · 2026-04-15T09:51:52Z

+        )
+        return confidences >= per_detection_thresholds
+
+    def per_class_thresholds(self, class_names: List[str]) -> List[float]:


maybe tensor should be returned from the function?

maybe also private helper?

PawelPeczek-Roboflow · 2026-04-15T09:55:25Z

+        recommended_parameters: Optional[RecommendedParameters],
+        default_confidence: float,
+    ):
+        # Tier 1: explicit user value wins outright. No per-class refinement


this is just sanitisation, optional - maybe a helper function to establish all values we want to set and set the state of the class based on that - this way when any field needs to be added, it's easier not to get lost in a jungle of if-elif-else

PawelPeczek-Roboflow · 2026-04-15T10:11:59Z

+    # existing per-image loop when has_per_class_refinement is True.
+    # ------------------------------------------------------------------
+
+    def refine_detections(


looks like the true public interface is refine_detections, refine_instance_detections, refine_keypoints_and_detections, refine_multilabel_prediction, refine_segmentation_result - would keep them on top of the class

PawelPeczek-Roboflow · 2026-04-15T10:31:11Z

    return image_bboxes, masks
+
+
+class ConfidenceFilter:


looking at the code over and over and still have feeling that this does not fully match the puzzles.

When I see ConfidenceFilter - I believe the responsibility of this class is generating and applying filtering criteria based on confidence. And I believe this gut feeling is fair.

Current implementations are trivial (comparing tensor to constant float value)
Examples:

# rfdetr OD confidence_mask = predicted_confidence > confidence # rfdetr IS confidence_mask = confidence > threshold # resnet ML batch_element_confidence >= confidence # resnet single lablel: N/A # YOLO IS mask = class_conf > conf_thresh

Proposition in this class is to extend the logic to be confidence filter on steroids, which basically

requires to run old style filtering (but based on this class dictated floor value)

and then run another construction of tensors in the output entities

at the same time entangling knowledge about internals of data formats outputed from the model classes at the end of whole forward pass - which do not need to be known by ConfidenceFilter

we end up with:

worst-case scenario double filtering which is avoidable (performance loss)

not natural interactions between model and this class

class interface which is not generic and requires mutation for future variations of output entities.

Let's discuss if you find thsoe correct observations - maybe I lack some visibility, maybe this is for some reason not possible.

leeclemnet force-pushed the feat/model-eval-recommended-defaults branch 5 times, most recently from a19cb78 to 5660d57 Compare April 8, 2026 18:37

leeclemnet changed the title ~~Use recommended optimal confidence from model-eval~~ Default to optimal per-class/global confidence from model-eval Apr 8, 2026

leeclemnet force-pushed the feat/model-eval-recommended-defaults branch 2 times, most recently from 32fa48d to 473dd08 Compare April 8, 2026 20:07

leeclemnet changed the title ~~Default to optimal per-class/global confidence from model-eval~~ Default to optimal confidence from model-eval Apr 8, 2026

PawelPeczek-Roboflow reviewed Apr 9, 2026

View reviewed changes

Comment thread inference_models/inference_models/models/auto_loaders/core.py Outdated

PawelPeczek-Roboflow reviewed Apr 9, 2026

View reviewed changes

Comment thread inference_models/inference_models/models/auto_loaders/entities.py

leeclemnet force-pushed the feat/model-eval-recommended-defaults branch 5 times, most recently from 33ea1b0 to 262b5bf Compare April 10, 2026 16:39

leeclemnet added 3 commits April 14, 2026 10:05

squash commit

e582777

Show confidence hint 0.4 to match default

71736f7

dev dockerfiles overlay inference_models source build

65e7aad

leeclemnet force-pushed the feat/model-eval-recommended-defaults branch from ba6acfd to 65e7aad Compare April 14, 2026 12:36

confidence filter debug logging

9852e59

leeclemnet marked this pull request as ready for review April 14, 2026 13:21

leeclemnet requested review from dkosowski87, grzegorz-roboflow, hansent, probicheaux, rafel-roboflow and yeldarby as code owners April 14, 2026 13:21

PawelPeczek-Roboflow reviewed Apr 14, 2026

View reviewed changes

Comment thread inference_models/inference_models/models/auto_loaders/auto_resolution_cache.py Outdated

PawelPeczek-Roboflow reviewed Apr 14, 2026

View reviewed changes

Comment thread inference_models/inference_models/models/base/confidence_filter.py Outdated

concrete class post_process confidence optional default None

6309f59

leeclemnet commented Apr 14, 2026

View reviewed changes

cleanup

da83b8e

leeclemnet force-pushed the feat/model-eval-recommended-defaults branch from 581382e to da83b8e Compare April 14, 2026 18:39

PawelPeczek-Roboflow reviewed Apr 15, 2026

View reviewed changes

PawelPeczek-Roboflow requested changes Apr 15, 2026

View reviewed changes

simplify ConfidenceFilter and avoid double filtering

1ededd4

leeclemnet marked this pull request as draft April 15, 2026 15:00

leeclemnet force-pushed the feat/model-eval-recommended-defaults branch 2 times, most recently from cdbc2bd to e688ae7 Compare April 15, 2026 19:35

fix OOB bugs in yolov5/7 and rfdetr

341300d

leeclemnet force-pushed the feat/model-eval-recommended-defaults branch from e688ae7 to 341300d Compare April 15, 2026 19:58

leeclemnet added 4 commits April 15, 2026 17:47

confidencefilter readability

6211846

undo no-op diffs

029d69b

revert workflow UI change for now

a2841c4

deeplapv3plus: drop double construction of SemanticSegmentationResult

e30036b

leeclemnet marked this pull request as ready for review April 15, 2026 23:58

leeclemnet requested a review from PawelPeczek-Roboflow April 15, 2026 23:58

Conversation

leeclemnet commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Wire recommendedParameters from model_eval through to inference

New ConfidenceFilter and post_process_with_confidence_filter() wrapper (inference_models/models/base/)

Inference adapters route through the new wrapper (inference/core/models/inference_models_adapters.py)

API request schema (inference/core/entities/requests/inference.py)

OLD inference path compatibility (inference/core/models/{object_detection,instance_segmentation,classification}_base.py)

Plumbing

Dependencies:

Type of Change

Testing

Checklist

Additional Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leeclemnet commented Apr 14, 2026

Uh oh!

PawelPeczek-Roboflow Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PawelPeczek-Roboflow Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PawelPeczek-Roboflow Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PawelPeczek-Roboflow Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PawelPeczek-Roboflow Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leeclemnet commented Apr 7, 2026 •

edited

Loading

PawelPeczek-Roboflow Apr 15, 2026 •

edited

Loading

PawelPeczek-Roboflow Apr 15, 2026 •

edited

Loading

PawelPeczek-Roboflow Apr 15, 2026 •

edited

Loading

PawelPeczek-Roboflow Apr 15, 2026 •

edited

Loading

PawelPeczek-Roboflow Apr 15, 2026 •

edited

Loading