Yes, but then you are painstakingly and inefficiently bitmap editing a rasterized image of the original document.
There are surely much more efficient ways to achieve the intended result, it is just that the Linux software development community still hasn’t addressed that issue which has already been solved on other OSes for nearly 30 years.
On a side note, one of my regular work-related issues that requires OCR is that I receive PDF documents containing text , images, and formulae/equations, often formatted in a particular way (more particularly, conformant with, inter alia, WIPO standards for patent documents.
Attempting to do this on a Linux workstation is an exercise in frustration and a complete waste of time.
A XML subset of SGML was tried for a number of years prior to this and on the whole, failed to gain majority acceptance because users found it too hard to read/understand/implement with their usual word processor.
The patent profession and processing institutions (patent offices, etc) are now forcing a move to DOCX-based solutions for the preparation, filing and processing of patent applications. The major issues with this are the disparities in DOCX format in the various software implementations, including in Microsoft’s own products over the years, and naturally the lack of any cohesive validation system for being able to prove that what someone actually typed in a DOCX file and submitted is what was actually received and correctly rendered at the other end . The way the patent offices attempt to get around this problem is by converting the DOCX to PDF using XSLT processing and presenting that PDF to the user as a cross-check before validating the submission.
Unfortunately, whilst the patent offices have imposed this new format, it is up to the users to check, each time they submit a document, that the submitted document as received by the patent office corresponds to what the user intended to submit. Depending on the technical field, this can be a near impossible task with some patent applications representing hundreds, if not, thousands of pages (e.g. in the biotech sector). Additionally, the inclusion of images and mathematical formulae require further extra care to ensure that nothing has gone missing or been changed during the conversion.
Some patent offices require that the whole DOCX file be resubmitted when making later amendments to content (corrections, deletions, etc). Here again, the risk of getting things wrong at one or other of the ends of submission/reception process are fairly significant, and can prove fatal to the patentee in the event of litigation. These concerns are unfortunately largely dismissed by the patent offices, which have become reliant on the DOCX format as the “Holy Grail” way forward to reducing their own operating costs, as PDF/TIFF processing (the USPTO would only accept TIFF files for years) was inevitably very labour intensive. In time, the aim is almost certainly to remove all use of the PDF format completely.