After using the pdf to word you will have a word document that was created from a pdf, the following is what you should do apart from the standard format which you would otherwise do for any word document before having to convert it to an ebook. You have to read each of the words in the document to ensure that it is correct.
If you have scanned several books so that public can access them, then the level of proofreading required might be a burden for you. But will need to remind yourself that it is your book which you are trying to sell online and people will pay money for it. That is why, the readers require to buy an error free book and thus, the proofreading has to be word to word.
The proofreading will help you in achieving the following:
Look for the words which are incorrect
With the OCR and even if it is a standard pdf to word conversion, there will be misinterpretation of two letters which might be close to one other that look like the other letter. U and Li can be seen to be the same. Once you are able to find the errors, it might be worth to go through a global search so that you make replacement in the entire document. For instance, you can search Ught and replace it with Light.
Fix line breaks
Converters for pdf to word are known to be quite notorious for not being able to know where there is a line break and then placing them where they are not required. One of the ways you can ensure to detect them is by turning the option for show invisible or to change the font size.
Fix the hyphenated words
In case there is a word which is hyphenated due to having a split on two lines, then the pdf to word software will not know if the hyphen is to be there or be eliminated and thus, it might keep it. So a word such as insti-tution might be left on one line which might not be want you want.
Fix multiple spaces
It is possible to find words which are separated by various spaces all through the document. In order to get rid of such, you will have to use the find and replace command. You can start with having to find about 20 spaces and replacing it with one space, then 19, 18 as you go down.
OCR is known mostly to miss out on italic and bold formatting and also mixing the lower and upper cases, you have to check them out.
Go the nuclear way
In case documents happen to be a total mess, then you can decide to use the option of nuclear in removing the mess. It is called so because it is like a city nuking and starts over from the scratch. What you have will be a text document that is plain with all the words and with no formatting, and thus, you will still have to go through it to correct the incorrect words.