![]() ![]() ![]() ![]() no desaturation, many pieces of crap are colored differently than text, OCR can reject them more easily in colors.It became clear that highest possible apparent contrast causes errors. The next question is What file preprocessing is actually needed for succesful OCR, probably not the highest apparent contrast? You see that full check is a must, but it can still be faster than retyping. Here's the result.Īll extra spaces are removed. I cleaned the edges of the output image and dropped it to FreeOCR. As installable software there are commercial packages and FreeOCR. I believe at least Google likes to read your documents and do the OCR thing for you. Note: The paper edges stayed guite clean witout clipping anything.Įnhancing the letters needs some pattern matching filtering that knows this is typewritten text and replaces the letters with perfect ones. In the next example the image is at first desaturated, then it got Sobel Edge detection filtering and the screenshot shows the curves tool in use: It unfortunately makes the image negative, but the result can be inverted at the same time when the final contrast is stretched to limit with the curves. Darker text is available with edge detection. This still isn't the best possible apparent contrast. Colored dirt for example could be attacked only if the image was colored. I found that a steep curve which retains some grayshades, results better readability.ĭesaturating in the beginning is in your case useful, because it prevents all color boosting which can be caused (OCR is different, desaturating increases errors). It unfortunately also boosts color differences and all unwanted crap at the edges:īut the edges can be painted white or clipped off and the colors can be desaturated (not done here):įilter "treshold" makes everything strictly black and white. Here's a split view scene from the high pass filter:Īpplying curves can be used to increase the contrast. (It's not perfect because missing local contrast will not be fixed). The uneven light can be to some degree flattened with high pass filtering. Simple contrast boost will not work because the light is uneven. ![]() for subject pixels (which are normally quite different from the background) the difference is far from 0 and they remain visible.for background pixels, the average value of the background around them is removed, so whatever the initial background lightness, the result is close to zero (actually, 50% gray, since Grain extract adds a bias to the result),.The grain extract, which is basically a subtraction subtracts the average lightness of the area from the pixels in the initial image:.With the Gaussian blur a pixel value is replaced by the lightness of the area around it (the blur is assumed to be sufficiently wide to make the influence of local details such as text negligible).Next time you take these pictures, bring a dark sheet of paper (ideally, black) that you insert under the page that you are shooting. The text on the other side of the page shows through and limits a bit your ability to stretch contrast. middle handle adjusted to optimize contrast.left handle to where the histogram seems to cease (anything to its left becomes completely black).right handle slightly left of the middle of the big spike (anything to its right becomes completely white).You can then use the Levels tool comfortably to optimize the result. In the resulting layer, the background is gray (around 50%) but is a more uniform gray. create a new layer with the result: Layer>New from visible.set the top layer to Grain extract mode.apply a Gaussian blur that is sufficient to make the text disappear completely (around 50px on your image).Then you apply the following technique(*) to even the lighting: If you are using Gimp 2.10, you can also set Image>Precision to 32-bit floating point/linear. Adjusting brightness/contrast is not easy due to the uneven lighting.įirst, to avoid color fringes, you work on a grayscale version of the image, either Image>Mode>Grayscale or Color>Desaturate. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |