A Septic Tank readily installed as it wasn't in the Baringo Case, Source: flickr, SuSanA Secretariat

We made the Auditor General Reports computer-readable

Share on Facebook Tweet Print Friendly, PDF & Email

Most of the Audit Reports that the Office of the Auditor General (OAG) has uploaded to its web page in the last few years are not machine-readable for your computer without special software. You don’t care? Read here why it matters. And, because it matters, we have made the reports computer-readable and uploaded more than 1,850 to RoGGKenya.org for you to search through them with the help of computer technology.

A Company Name that is not found

Let us pick a random example in the latest uploaded OAG audit report about the Baringo county executive. There is a certain company adversely mentioned in it. it was paid to build a septic tank in a 3 Million KeSh contract but had not done it when the OAG auditor visited the site at the Mogotio Hospital. The name of the company is “Pearltek Kenya Ltd”.
Now, you find through google advanced search (link) the report containing this company name on the oag website, if you limit your search to oagkenya.go.ke (this is the search leading to the result with the report) or by using the search term “site:oagkenya.go.ke filetype:pdf “pearltek kenya ltd” in your google search window.
But then, after you download the file found, try to find “Pearltek” within the 97 pages of that file! The pdf reader on your phone or computer will not be able to do it (try using “ctrl-f” or clicking the magnifying lens). You have to scroll and read with your eyes. A discouraging choice. If you were planning a follow-up research it may already discourage you.

A Computer usually cannot read within Pictures

The cause for this calamity is that the OAG has chosen to print the audit reports, thereafter scan them, but then only save them in the pdf file with the embedded scan pictures of the words in the file. So what you see is not typed text. It is pictures of a text. And pictures cannot be read by pdf readers, it can only be read by OCR software (“optical character recognition” software) which transforms pictures into readable text. In an earlier interview with a RoGGKenya reporter, the deputy Auditor General has cited security reasons for acting so, but he did not specify.

How we made the Audit Reports Readable – and searchable

Now, we at RoGGKenya have downloaded more than 1,850 Auditor General Reports from the oagkenya.go.ke-Website, we made an OCR software read and resave them as readable pdf. And we uploaded the results, for your conveniance. Now you find the same Baringo County report here in an OCR-ed version. And you can also retrieve the other more than 1,850 OCR-ed reports and find terms that are in them through our own search page.

Visit our Search Page!

This is our search page:
https://roggkenya.org/agencies-stakeholders/auditor-general/search-within-auditor-general-reports/

Please note:
  • The file name of the OCR-ed files has been changed and includes “OCR_by-RoGGKenya” to indicate that it is not the original.
  • There may be inaccuracies, odd typo or odd layout caused by the OCR process which is not 100 percent error-prone. The original file still remains to be the one uploaded on the  oagkenya.go.ke-Website.  But the original file is also to be found on RoGGKenya, with “retrieved_by-RoGGkenya” in its  name.
  • Earlier, search engines like google could also not read the scanned texts in the pictures. That made search or terms and names in the files on google impossible. This seems to have changed recently. Google is also applying OCR technology. But it still remains true that you do not find the terms easily in a large file once you retrieved it.
  • In your reporting you might ask  the responsible persons in the OAG office why exactly they chose to save pictures instead of words on their web page. The Controller of Budget, for example, does not hide her words behind pictures. Whereas she has other issues that we have pointed at two years ago.