Wikisource:WikiProject DNB/Progress

From Wikisource
Jump to: navigation, search
Wikisource:WikiProject DNB Progress

This project page is for reference on general cleanup of the pagespace bot postings of DNB volumes.

[edit] Progress and troubleshooting table

Volume Index % text[1] Best scan[2] Offset[3] Glitches[4] Comment[5]
1 index 100 [1] (poor) 14
2 index 100 [2] (good) 12
3 index 100 [3] (good) 6 New djvu file, while readable throughout, contains isolated blurred text. Text and images aligned.Yes check.svg Done All text images present.
4 index 100 [4] (good) 4
5 index 100 [5] (good) 8 This 2011 text file includes 1904 Errata corrections, but needs index pages inserted into final six djvu pages of this file. Listing. Replace text.Yes check.svg Done Without terminating indices
6 index 100 [6] (good) 12 Deleted bot applied pages, text and image align, so undeletions possible if poor image scan Better quality version uploaded. Listing.
7 index 100 [7] (good) 6 30 DEC scan requires index pages - currently available on some pages from previous file. Text replaced. Yes check.svg Done Needs index pgs.
8 index 100 [8] (good) 4
9 index 100 [9] (poor) 6
10 index 100 [10] (fair) 8 Needs new text. Current text is best available (30 Jan 11). Will keep looking… All 15 problematic pages are identified on the index page. The text has been refreshed, but will require an alternative source to proofread and validate pending locating better source. Text refreshed for all problematic pages.
11 index 100 [11] (good) 6 Text replaced with the good version identified. Most red pages deleted, though may be some that were not meant to be deleted.
12 index 100 [12] (good) 6 Text images are reasonably good prior to page 368 (with a few that may have blurred sections). All replacement pages >367 are marked and have had text refreshed. Better text needed.
13 index 100 [13] (good) 6 Better text needed. Found. (1-31-11) Replace text. Yes check.svg Done
14 index 100 [14] (good) 6 Better text needed.
15 index 100 [15] (good) 6
16 index 100 [16] (good) 7
17 index 100 [17] (OK) 6
18 index 100 [18] (poor) 6
19 index 100 [19] (good) 6 Yes check.svg Done to this recommended volume, keep same pagination
20 index 24 [20] (good) n/a There is duplication after this to 27; two pages missing after djvu.216; two pages missing after djvu.321; one missing after djvu.345. Better text found (all pages). (1-31-11) Will upload once pages 95-97 are validated with existing images. Adding templates. Replaced djvu file. Yes check.svg Done
21 index 7 [21] (good) 6 One or two pages are missing after each of djvu.231, 343, 377, 382, 386, 389, 392, 395, 396, 407, 414, 417, 427. This is a weird image, apparently mixing two pages. Better text found (all pages). (1-31-11) Text replaced. Yes check.svg Done
22 index 11 [22] (good) n/a The following five pages are torn and have missing characters adjacent to the column edge: pp. 51, 88-90, & 143. Text replaced. Yes check.svg Done
23 index 15 [23] (good) in place n/a Better text needed.Yes check.svg Done Awaiting validation.
24 index [24] (good), metadata says wrong volume 14 Yes check.svg Done to best quality volume
25 index 2 [25] (poor) n/a The page djvu/25 is a duplicate of the one before. A page is omitted after djvu/86. Page 104 of the original is missing, with Page:Dictionary of National Biography volume 25.djvu/109 being a repeat of p. 102. Two pages are omitted after djvu/326. This page is corrupt as image and is not needed for the page sequence. OCR is misaligned. Some pages are illegible. V 25 pdf this file has all pages but needs djvu conversion.
26 index 6 [26] (poor) n/a P. 15 of text missing after Page:Dictionary of National Biography volume 26.djvu/20. Duplicate pair: djvu/137 and 138 duplicate 135 and 136. P. 287 of text missing and muddled after djvu/174. A page is omitted after djvu/292. Index page numbers reflects these comments. Needs new text. Complete pdf file uploaded to AI 18 Jan.
27 index 3 [27] (poor) n/a Block of duplication from djvu/106 onwards repeats eight pages from djvu/98. listing
28 index 4 [28] (good) 6 All text images replaced. Text replaced. Yes check.svg Done Index pages awaiting validation.
29 index 100 [29] (good, but breaks off p.279) 6 DPF vol. 29 452 pages and complete index. vol 29 djvu few pages missing or blurred images.
30 index 99 [30] (poor) n/a Two pages missing after djvu/33.
31 index 99 [31] (poor) 6 Listing.
32 index 98 [32] (good) 4 Text shifted relative to djvus. Alt. djvu file:Vol 32 (following pages need replacements: 288, 320, 321, 424, 426, 434, 438).
33 index 9 [33] (poor) 6
34 index 100 [34] (poor) 6
35 index 99 [35] (poor) 6 Poor text, as good as we can get at this point in time.
36 index 21 [36] (good) 4 Two pages missing after djvu/255. Page missing after djvu/392.
37 index 100 [37] (good) 14
38 index and pagelist 19 [38] (good) 6 Text layer present; complete pages.
39 index 13 [39] (good) 6 P.275-6 missing characters where ripped, see this; pp. 301, 369, & 373 text images are crowded on one margin; missing characters. New text needed. Yes check.svg Done
40 index 14 [40] (good) n/a Duplicate pair: djvu/102 and 103 duplicate 100 and 101.
41 index 17 [41] (good) n/a Duplicate pair: djvu/94 and 95 duplicate the two previous pages. Better text found (all pages). (1-31-11) AI vol 41 All pages are present, text near one margin on pages 175-78 will be challenging. Replaced File Yes check.svg Done
42 index 6 [42] (poor) 6
43 index and pagelist 100 [43] (good) 6 Poor scans throughout the work, mark problematic. Replaced text with new AI source file. Yes check.svg Done .
44 index 15 [44] good 12 12 "workable" problematic pages remain and are identified. Replace text. Yes check.svg Done
45 index and pagelist 9 [45] (good) 8 OCR layer is best scan. Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
46 index and pagelist 12 [46] (good) 6 In later half of book identified Problematic scans, may be more in first half. Page 176 is quite blank. Pages 408 and 409 duplicate 406 and 407. Text layer present, realigned latter pages; complete pages. All bios transferred to Page: and converted to <pages>
47 index and pagelist 100 [47] (good) 6 Numerous pages marked as problematic. Candidate for replacement Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
48 index and pagelist 11 [48] (good) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
49 index and pagelist 11 [49] (good) 6 Numerous problematic pages; replace file. Updated source file. Yes check.svg Done
50 index and pagelist 12 [50] (good) 12 (new scan 20091125) Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
51 index and pagelist 6 [51] (OK) 8 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
52 index and pagelist 5 [52] (poor) n/a Numbers of illegible pages identified in 2nd half, presumably similar in first half. Available copy poor, rescue job performed. Will need later review when work done to determine further needs. All bios transferred to Page: and converted to <pages>
53 index and pagelist 5 [53] (OK) 8 duplication of pages, and illegible scans. Will determine whether worth replacing with alternate volume.
54 index and pagelist 7 [54] (good) 7 some scans may be indistinct
55 index and pagelist 13 [55] (good) 6 replace file to fix text misalignment.
56 index and pagelist 8 [56] (good)

found to be missing pages 165-172.

6 Generated a new djvu version for Commons that is a mix of both files. Version based on Good version with inserts from other available source
All bios transferred to Page: and converted to <pages>
57 index and pagelist 15 [57] (good) 6 Existing text pages may need to be replaced (<20091107)
All bios transferred to Page: and converted to <pages>
Scan replaced note.
58 index and pagelist 9 [58] (good) 8 Better text found (all pages). (1-31-11) Uploaded new volume which was intact. Previously proofread text aligns with recently added text images. Volume replaced with complete version. Yes check.svg Done Completed page alignment.
59 index and pagelist 2 [59] (good) 6 Text layer present; complete pages. Listing. Text layer is not best scan. All bios transferred to Page: and converted to <pages>
60 index and pagelist 3 [60] (good) n/a Two pages missing after this, now separately imported. Pages 88 and 89 duplicate pages 86 and 87; two djvu images that should be there instead are missing. Better text found (all pages). (1-31-11) Text layer present, recovered missing pp. 18-19 All bios transferred to Page: and converted to <pages>
61 index and pagelist 100 [61] (good) 6 Templates added, complete pages. All bios transferred to Page: and converted to <pages>
62 index and pagelist 14 [62] (OK) 6 Text layer present; complete pages. All bios transferred to Page: and converted to <pages>
63 index and pagelist 100 [63] (good) 24 Better text needed; complete pages, though p. xviii needs rescan. All bios transferred to Page: and converted to <pages>

[edit] Notes

  1. Percentage text added. Apart from vol. 1, text completed generally ranges between 2% and 4%. Text verified and marked is currently negligible. One point is that the project page suggests ligatures should be added; the æ ligature is very common in the DNB. Another point that needs to be visited is the use of strike-through text to show a "diff" for a later edition.
  2. The code here is "good" for the Toronto scans, "poor" for the Google scans, and "OK" for the Hyderabad scans that are of intermediate quality. This is the rule-of-thumb only: in some cases the Toronto scan for a page may be so corrupt that another scan works better.
  3. The offset is the difference of the djvu file number and the page number in the volume. This ought to be consistent throughout the volume: if it currently is known not to be, the entry is "n/a" and the next column gives details.
  4. The bot-generated initial postings have imperfections, to be noted here.
  5. Points include: the best scan may not have been used by the bot ("better text needed"); progress in formatting at least all the author templates.
Personal tools
Variants
Actions
Navigation
Toolbox
Print/export