Skip to content

refactor(ocr): reduce OCRDetection bounding box from 4 vertices to 2-point AABB#1126

Closed
msluszniak wants to merge 3 commits intosoftware-mansion:mainfrom
msluszniak:claude/great-hamilton-udZyQ
Closed

refactor(ocr): reduce OCRDetection bounding box from 4 vertices to 2-point AABB#1126
msluszniak wants to merge 3 commits intosoftware-mansion:mainfrom
msluszniak:claude/great-hamilton-udZyQ

Conversation

@msluszniak
Copy link
Copy Markdown
Member

@msluszniak msluszniak commented May 7, 2026

Description

Reduces the OCRDetection bounding box from a 4-vertex rotated rectangle to a 2-point axis-aligned bounding box (AABB). The detector internally still uses 4 corners for cropping text regions; only the public-facing OCRDetection output is changed — bbox is now [top-left, bottom-right] instead of 4 arbitrary corners. The TypeScript type is narrowed from Point[] to [Point, Point].

Changes span Types.h, RecognitionHandler.cpp, OCR.cpp, VerticalOCR.cpp, JsiConversions.h, ocr.ts, and the integration tests for both OCR and VerticalOCR.

Introduces a breaking change?

  • Yes
  • No

OCRDetection.bbox is now a 2-element tuple. Any code indexing bbox[2] or bbox[3] must be updated.

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  1. Build and run the Android integration tests:
    adb push xnnpack_craft_quantized.pte xnnpack_crnn_english.pte we_are_software_mansion.jpg /data/local/tmp/rnexecutorch_tests/
    adb shell /data/local/tmp/rnexecutorch_tests/OCRTest
    adb shell /data/local/tmp/rnexecutorch_tests/VerticalOCRTest
    
  2. Confirm DetectionsHaveValidBoundingBoxes / DetectionsHaveValidBBoxes tests pass — bbox size == 2, bbox[0]bbox[1] on both axes.
  3. In a React Native app using useOCR, log detections[0].bbox — confirm it is [{x, y}, {x, y}] with exactly 2 elements.
  4. Exercise runOnFrame (VisionCamera) in portrait and landscape — confirm bounding boxes remain correctly oriented after rotation.

Screenshots

N/A

Related issues

Closes #760

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

DetectorBBox (internal, used for cropping) retains 4 points — only the public OCRDetection output is changed. The generateFromFrame path re-normalises the 2 AABB corners after inverseRotatePoints to guarantee bbox[0] is always top-left.

msluszniak and others added 3 commits March 3, 2026 16:42
Consolidate npm update configuration and change schedule to monthly.
Resolves software-mansion#760. The OCR and VerticalOCR pipelines previously exposed all four
rotated-rectangle corners in OCRDetection.bbox. Two points (top-left and
bottom-right of the axis-aligned bounding box) are sufficient for downstream
rendering and are simpler to consume.

Changes:
- Types.h: shrink OCRDetection.bbox from std::array<Point,4> to std::array<Point,2>
- RecognitionHandler.cpp: compute AABB (min/max x,y) over the four detector
  corners instead of forwarding them verbatim
- VerticalOCR.cpp: same AABB reduction in _processSingleTextBox
- OCR.cpp / VerticalOCR.cpp generateFromFrame: re-normalize the two bbox
  corners after inverseRotatePoints to guarantee bbox[0] <= bbox[1]
- JsiConversions.h: serialize 2 points instead of 4 to JavaScript
- OCRTest.cpp / VerticalOCRTest.cpp: assert size==2 and that bbox[1] >= bbox[0]
- ocr.ts: narrow TypeScript type from Point[] to [Point,Point] and update docs
@msluszniak msluszniak added the auto-push PR created via Claude routines or other autonomic flow label May 7, 2026
@msluszniak msluszniak self-assigned this May 7, 2026
@msluszniak
Copy link
Copy Markdown
Member Author

Closing this in favour of #1130

@msluszniak msluszniak closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-push PR created via Claude routines or other autonomic flow refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor code of OCR and VerticalOCR to utilise only two points in bounding boxes

2 participants