Wikisource:WikiProject DNB/Progress

From Wikisource

Jump to: navigation, search
Wikisource:WikiProject DNB Progress

This project page is for reference on general cleanup of the pagespace bot postings of DNB volumes.

[edit] Progress and troubleshooting table

Volume Index % text[1] Best scan[2] Offset[3] Glitches[4] Comment[5]
1 index 100 [1] (poor) 14
2 index 100 [2] (good) 12
3 index 100 [3] (good) n/a Two pages missing after djvu.133.
4 index 100 [4] (good) 4
5 index 100 [5] (good) 8 Listing. Better text needed.
6 index 100 [6] (good) 12 Listing.
7 index 100 [7] (good) 6 First five pages of actual text just missing. Better text needed.
8 index 100 [8] (good) 4
9 index 100 [9] (poor) 6
10 index 100 [10] (poor) 8
11 index 100 [11] (good) 6 Better text needed.
12 index 100 [12] (good) 6 Better text needed.
13 index 100 [13] (good) 6 Better text needed.
14 index 100 [14] (good) 6 Better text needed.
15 index 100 [15] (good) 6
16 index 100 [16] (good) 7
17 index 100 [17] (OK) 6
18 index 100 [18] (poor) 6
19 index 100 [19] (good) 6 Better text needed.
20 index 24 [20] (good) n/a There is duplication after this to 27; two pages missing after djvu.216; two pages missing after djvu.321; one missing after djvu.345. Adding templates.
21 index 7 [21] (good) n/a One or two pages are missing after each of djvu.231, 343, 377, 382, 386, 389, 392, 395, 396, 407, 414, 417, 427. This is a weird image, apparently mixing two pages.
22 index 11 [22] (good) n/a Several instances of duplicate pairs: djvu.52 and djvu.53 are duplicates, giving p. 44 and p. 45 again; same business with djvu.58 and 59 duplicating djvu.56 and 57, same with 62 and 63 as duplicates of 60 and 61, and also something obscure within 71-73 (very bad images).
23 index 15 [23] (good) n/a A page missing after djvu/301 and another after 302; whole block of pages missing after 359; duplication around 388; page missing after 392. Better text needed.
24 index 99 [24] (good), metadata says wrong volume n/a Two pages are missing after djvu/185. This repeats p. 244 rather than being p. 246; this and this are the wrong way round. Better text needed.
25 index 2 [25] (poor) n/a The page djvu/25 is a duplicate of the one before. A page is omitted after djvu/86. Two pages are omitted after djvu/326. This page is corrupt as image and is not needed for the page sequence.
26 index 6 [26] (poor) n/a P. 15 of text missing after Page:Dictionary of National Biography volume 26.djvu/20. Duplicate pair: djvu/137 and 138 duplicate 135 and 136. A page is omitted after djvu/292.
27 index 3 [27] (poor) n/a Block of duplication from djvu/106 onwards repeats eight pages from djvu/98. listing
28 index 4 [28] (good) 6 Two djvu's are missing between this and this; also two between this and this; also problem between 149, 150.
29 index 100 [29] (good, but breaks off p.279) 6
30 index 99 [30] (poor) n/a Two pages missing after djvu/33.
31 index 99 [31] (poor) 6 Listing.
32 index 98 [32] (good) 4 Text shifted relative to djvus.
33 index 9 [33] (poor) 6
34 index 100 [34] (poor) 6
35 index 99 [35] (poor) 6 Text shifted relative to djvus.
36 index 21 [36] (good) 4 Two pages missing after djvu/255. Page missing after djvu/392.
37 index 100 [37] (good) 14
38 index and pagelist 19 [38] (good) 6 Text layer present; complete pages.
39 index 13 [39] (good) n/a P.13 missing after this; p.17 missing after this; p.23 missing after this; p.27 missing after this; p.29 missing after this; p.31 missing after this; p. 35 after this; p.37 after this; page missing after djvu/50; also p. 63 missing after this; also p. 402 missing after this. Listing.
40 index 14 [40] (good) n/a Duplicate pair: djvu/102 and 103 duplicate 100 and 101.
41 index 17 [41] (good) n/a Duplicate pair: djvu/94 and 95 duplicate the two previous pages.
42 index 6 [42] (poor) 6
43 index and pagelist 100 [43] (good) 6 Poor scans throughout the work, mark problematic. Better text needed. Incomplete listing.
44 index 15 [44] good n/a Duplicate page around djvu/116. Page missing after 344, and another after 353. Scans after ~p.230 are horrid, and text and image out of alignment Will need to be replaced.
45 index and pagelist 9 [45] (good) 8 OCR layer is best scan. Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
46 index and pagelist 12 [46] (good) 6 In later half of book identified Problematic scans, may be more in first half. Text layer present, realigned latter pages; complete pages. All bios transferred to Page: and converted to <pages>
47 index and pagelist 100 [47] (good) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
48 index and pagelist 11 [48] (good) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
49 index and pagelist 11 [49] (good) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
50 index and pagelist 12 [50] (good) 12 (new scan 20091125) Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
51 index and pagelist 6 [51] (OK) 8 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
52 index and pagelist 5 [52] (poor) n/a Numbers of illegible pages identified in 2nd half, presumably similar in first half. Available copy poor, rescue job performed. Will need later review when work done to determine further needs. All bios transferred to Page: and converted to <pages>
53 index and pagelist 5 [53] (OK) 8 duplication of pages, and illegible scans. Will determine whether worth replacing with alternate volume.
54 index and pagelist 7 [54] (good) 7 some scans may be indistinct
55 index and pagelist 13 [55] (good) 6 replace file to fix text misalignment.
56 index and pagelist 8 [56] (good)

found to be missing pages 165-172.

6 Generated a new djvu version for Commons that is a mix of both files. Version based on Good version with inserts from other available source
All bios transferred to Page: and converted to <pages>
57 index and pagelist 15 [57] (good) 6 Existing text pages may need to be replaced (<20091107)
All bios transferred to Page: and converted to <pages>
Scan replaced note.
58 index and pagelist 9 [58] (good) 8 pp. 270, 274, 278, 280, 376, 384, 396 mis-scanned. Text layer present. Certain pages imported. All bios transferred to Page: and converted to <pages> Do we need to replace this volume?
59 index and pagelist 2 [59] (good) 6 Text layer present; complete pages. Listing. Text layer is not best scan. All bios transferred to Page: and converted to <pages>
60 index and pagelist 3 [60] (good) n/a Two pages missing after this, now separately imported. Text layer present, recovered missing pp. 18-19 All bios transferred to Page: and converted to <pages>
61 index and pagelist 100 [61] (good) 6 Templates added, complete pages. All bios transferred to Page: and converted to <pages>
62 index and pagelist 14 [62] (OK) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
63 index and pagelist 100 [63] (good) 24 Better text needed; complete pages, though p. xviii needs rescan. All bios transferred to Page: and converted to <pages>


[edit] Notes

  1. Percentage text added. Apart from vol. 2, text completed generally ranges between 2% and 4%. Text verified and marked is currently negligible. One point is that the project page suggests ligatures should be added; the æ ligature is very common in the DNB. Another point that needs to be visited is the use of strike-through text to show a "diff" for a later edition.
  2. The code here is "good" for the Toronto scans, "poor" for the Google scans, and "OK" for the Hyderabad scans that are of intermediate quality. This is the rule-of-thumb only: in some cases the Toronto scan for a page may be so corrupt that another scan works better.
  3. The offset is the difference of the djvu file number and the page number in the volume. This ought to be consistent throughout the volume: if it currently is known not to be, the entry is "n/a" and the next column gives details.
  4. The bot-generated initial postings have imperfections, to be noted here.
  5. Points include: the best scan may not have been used by the bot ("better text needed"); progress in formatting at least all the author templates.