On Other Projects...
|Present Usernames on other WikiMedia Projects|
|SQL||Simple English Wikipedia|
|SQL||English Wikipedia (Admin)|
|SXT40||English Wikipedia (Unprivileged account for public computers)|
|SXT-404Bot||English Wikipedia (Bot)|
OCR'ing the PNG Scanset
- Scanset can be found at User:Tim Starling/ScanSet PNG demo.
- OCR software is Tesseract, and, is freely available from Google
- The GIMP or Photoshop are very helpful.
- Download the page that you wish to convert.
- Open it in your image editor, and, split the page in half, so there is only one column to process (Tesseract can only handle one column at a time)
- Save it as a BMP
- Run Tesseract (tesseract inputfile.bmp outputfile -l eng)
- Manually correct mistakes, and, remove extra linefeeds
- Merge into a single article