User:Phe/Test6

From Wikisource
Jump to navigation Jump to search

Lower the rate, worst the ocr. Ocr rate is the percentage of the cumulated word length for all word of length >= 3 by the cumulated word length for these word which are recognized by a standard dictionary. Roughly < 0.70, large portion of ocr are unreadable. 0.70-0.80 many small portion of text unreadable. 0.80-0.90 many errors but not much clustered, most text readable with some efforts. > 0.90 good to excellent ocr. Variation between 0.92 to 0.88 imply much larger number of error, no idea exactly, but roughly a 0.88 will contain 5 to 10 times more error than a 0.92. These qualification of ocr rate is correct only for EB1911.

Index ocr rate
Index:EB1911_-_Volume_01.djvu 0.912545166425
Index:EB1911_-_Volume_02.djvu 0.909824360081
Index:EB1911_-_Volume_03.djvu 0.90772059933
Index:EB1911_-_Volume_04.djvu 0.911220770214
Index:EB1911_-_Volume_05.djvu 0.907126908172
Index:EB1911_-_Volume_06.djvu 0.926589085639
Index:EB1911_-_Volume_07.djvu 0.919514228447
Index:EB1911_-_Volume_08.djvu 0.9135650729
Index:EB1911_-_Volume_09.djvu 0.925076966839
Index:EB1911_-_Volume_10.djvu 0.918602714008
Index:EB1911_-_Volume_11.djvu 0.908900218755 [1]
Index:EB1911_-_Volume_12.djvu 0.902881728457
Index:EB1911_-_Volume_13.djvu 0.887533026512
Index:EB1911_-_Volume_14.djvu 0.908943660788
Index:EB1911_-_Volume_15.djvu 0.915601587476
Index:EB1911_-_Volume_16.djvu 0.90801236584
Index:EB1911_-_Volume_17.djvu 0.916525487986
Index:EB1911_-_Volume_18.djvu 0.914158557242
Index:EB1911_-_Volume_19.djvu 0.913233798569
Index:EB1911_-_Volume_20.djvu 0.91178648195
Index:EB1911_-_Volume_21.djvu 0.895079206979
Index:EB1911_-_Volume_22.djvu 0.91882986895
Index:EB1911_-_Volume_23.djvu 0.899058604672
Index:EB1911_-_Volume_24.djvu 0.907582789193
Index:EB1911_-_Volume_25.djvu 0.916146038521
Index:EB1911_-_Volume_26.djvu 0.912637074328
Index:EB1911_-_Volume_27.djvu 0.917310887107
Index:EB1911_-_Volume_28.djvu 0.909883291555
Index:EB1911_-_Volume_29.djvu [2]
  1. no ocr, see commons file
  2. no ocr