Patent: Image Processing Methodology for Line and Character Location in Images with Curves, Overlap, and Multiple Contours per Character
Line Location
Bank check with Magnetic Ink Character Recognition (MICR) line. MICR line is curved, signature overlaps. there are multiple contours and check has lines with different font sizes
This section describes the algorithm to locate the lines in the image, given these complicating factors. We assume that skew correction has already been performed on the image.
Assign each contour to one of three groups based on the size (or some other metric) of the contour: small, medium, and large. The size can be based on height, width, area, or a combination of these or other contour metrics.
Discard all small contours. These contours are deemed to be noise and do not contribute to any character of any line.
Sort all medium and large contours from left-to-right based on the X coordinate value of the left-most point of the contour.
Initialize LinesArray to an empty array, which will be added to and accessed in the following steps. A line is an array of textual characters. LinesArray is an array of these lines.
Process Medium Contours
Process each medium contour (MC) in order (left-to-right) as follows:
The MCProjectedRectangle represents the estimated allowable location for an adjacent contour to the left of the current contour. Calculate the coordinates of MCProjectedRectangle for MC as follows:
calculate the rectangle around MC;
extend the rectangle above and below by less than the minimum distance between two lines but by more than the maximum vertical distance between any two contours of the same character (both these values are configuration settings);
extend the rectangle to the left by more than the maximum distance between any two characters of the same line.
For example, while processing the "3" in the image below, we begin with the red rectangle surrounding the "3" alone, and then (as shown on the second line) we extend the red rectangle up, down, and to the left. It then encloses the final contour in the line indicated by the blue rectangle.
Try to find a single line in LinesArray by the following method:
find one or more lines whose LastContourOfLine intersects MCProjectedRectangle;
if this is true for multiple lines, select the line whose MCProjectedRectangle contains the largest percentage of LastContourOfLine; (See attached diagram, line_loc_5.2.2.png, showing an MCProjectedRectangle that overlaps with two lines.)
if MCProjectedRectangle contains the same percentage (e.g. 100%) of LastContourOfLine for multiple lines, select the line whose LastContourOfLine is nearest to MC.
if an existing line was found above, add MC to that line; otherwise, create a NewLine, add MC to the NewLine, and add NewLine to LinesArray.
Process Large Contours
For each Line in LinesArray:
For each large contour (LC) which intersects Line on the y-axis:
Let LineRect be the smallest rectangle surrounding all contours in Line. A configurable amount of padding may extend LineRect to the left and/or right. (see attached diagram, line_loc_6.1.1.png, showing an example of LineRect)
Divide LC into 0 or more sub-contours such that each sub-contour is inside LineRect. Add each of these sub-contours to Line. For example, see the attached diagram. (See attached diagram, line_loc_6.1.2.png , showing an example of where this might occur. Two characters here would have been missed until this step because they are part of an LC.)
Sort the contours in Line from left-to-right.
NOTE: The algorithm above separates contours based upon size into two groups: (1) medium contours MC are those which constitute a character by themselves and (2) large contours LC are those which result from some type of overlap that cause the joining of multiple characters from the same or different lines into a single contour. More generically, these two types of contours may be divided based on criteria other than size (e.g. color).
Character Location Within Each Line
After contours have been separated into lines, the contours of each Line are grouped into characters using the following algorithm:
1. Let MaxContourArea be the maximum contour area of all contours in Line, and let MinCharArea be a configurable percentage of MaxContourArea. (See attached diagram, char_loc_1.png, showing MaxContourArea.)
2. Let MaxWidth be the maximum width of all contours in Line and let MaxDistanceBetweenContoursOfSameChar be a configurable percentage of MaxWidth.
Note: by computing the values in 1 & 2 above dynamically on a per-line basis, we are able to handle lines with varying font sizes.
3. Let ContourBuffer be an empty array of contours. ContourBuffer contains the contour(s) of a single character.
4. For each Contour in Line, proceeding from left-to-right
If Area(Contour) < MinCharArea, add Contour to ContourBuffer. Remove any contours in ContourBuffer which are further away from Contour than MaxDistanceBetweenContoursOfSameChar. If Area(ContourBuffer) >= MinCharArea, the minimum rectangle around contours in ContourBuffer are the next character, so return this character and clear ContourBuffer.
If Area(Contour) >= MinCharArea, clear ContourBuffer and return the minimum rectangle around Contour as a character.