Wikisource:WikiProject DNB/Progress
From Wikisource
| Wikisource:WikiProject DNB Progress |
This project page is for reference on general cleanup of the pagespace bot postings of DNB volumes.
- There is now a separate statistics page.
- Special:Indexes for Dictionary of National Biography gives a general snapshot of text creation and advancement.
- Djvu files for those who really want to know more.
[edit] Progress and troubleshooting table
| Volume | Index | % text[1] | Best scan[2] | Offset[3] | Glitches[4] | Comment[5] |
|---|---|---|---|---|---|---|
| 1 | index | 100 | [1] (poor) | 14 | ||
| 2 | index | 100 | [2] (good) | 12 | ||
| 3 | index | 100 | [3] (good) | 6 | New djvu file, while readable throughout, contains isolated blurred text. | Text and images aligned. |
| 4 | index | 100 | [4] (good) | 4 | ||
| 5 | index | 100 | [5] (good) | 8 | This 2011 text file includes 1904 Errata corrections, but needs index pages inserted into final six djvu pages of this file. | Listing. Replace text. |
| 6 | index | 100 | [6] (good) | 12 | Deleted bot applied pages, text and image align, so undeletions possible if poor image scan | Better quality version uploaded. Listing. |
| 7 | index | 100 | [7] (good) | 6 | 30 DEC scan requires index pages - currently available on some pages from previous file. | Text replaced. |
| 8 | index | 100 | [8] (good) | 4 | ||
| 9 | index | 100 | [9] (poor) | 6 | ||
| 10 | index | 100 | [10] (fair) | 8 | Needs new text. Current text is best available (30 Jan 11). Will keep looking… All 15 problematic pages are identified on the index page. The text has been refreshed, but will require an alternative source to proofread and validate pending locating better source. | Text refreshed for all problematic pages. |
| 11 | index | 100 | [11] (good) | 6 | Text replaced with the good version identified. Most red pages deleted, though may be some that were not meant to be deleted. | |
| 12 | index | 100 | [12] (good) | 6 | Text images are reasonably good prior to page 368 (with a few that may have blurred sections). All replacement pages >367 are marked and have had text refreshed. | Better text needed. |
| 13 | index | 100 | [13] (good) | 6 | Better text needed. Found. (1-31-11) | Replace text. |
| 14 | index | 100 | [14] (good) | 6 | Better text needed. | |
| 15 | index | 100 | [15] (good) | 6 | ||
| 16 | index | 100 | [16] (good) | 7 | ||
| 17 | index | 100 | [17] (OK) | 6 | ||
| 18 | index | 100 | [18] (poor) | 6 | ||
| 19 | index | 100 | [19] (good) | 6 | ||
| 20 | index | 24 | [20] (good) | n/a | There is duplication after this to 27; two pages missing after djvu.216; two pages missing after djvu.321; one missing after djvu.345. Better text found (all pages). (1-31-11) Will upload once pages 95-97 are validated with existing images. | Adding templates. Replaced djvu file. |
| 21 | index | 7 | [21] (good) | 6 | One or two pages are missing after each of djvu.231, 343, 377, 382, 386, 389, 392, 395, 396, 407, 414, 417, 427. This is a weird image, apparently mixing two pages. Better text found (all pages). (1-31-11) | Text replaced. |
| 22 | index | 11 | [22] (good) | n/a | The following five pages are torn and have missing characters adjacent to the column edge: pp. 51, 88-90, & 143. | Text replaced. |
| 23 | index | 15 | [23] (good) in place | n/a | Better text needed. |
|
| 24 | index | [24] (good), metadata says wrong volume | 14 | |||
| 25 | index | 2 | [25] (poor) | n/a | The page djvu/25 is a duplicate of the one before. A page is omitted after djvu/86. Page 104 of the original is missing, with Page:Dictionary of National Biography volume 25.djvu/109 being a repeat of p. 102. Two pages are omitted after djvu/326. This page is corrupt as image and is not needed for the page sequence. OCR is misaligned. Some pages are illegible. | V 25 pdf this file has all pages but needs djvu conversion. |
| 26 | index | 6 | [26] (poor) | n/a | P. 15 of text missing after Page:Dictionary of National Biography volume 26.djvu/20. Duplicate pair: djvu/137 and 138 duplicate 135 and 136. P. 287 of text missing and muddled after djvu/174. A page is omitted after djvu/292. Index page numbers reflects these comments. | Needs new text. Complete pdf file uploaded to AI 18 Jan. |
| 27 | index | 3 | [27] (poor) | n/a | Block of duplication from djvu/106 onwards repeats eight pages from djvu/98. | listing |
| 28 | index | 4 | [28] (good) | 6 | All text images replaced. | Text replaced. |
| 29 | index | 100 | [29] (good, but breaks off p.279) | 6 | DPF vol. 29 452 pages and complete index. vol 29 djvu few pages missing or blurred images. | |
| 30 | index | 99 | [30] (poor) | n/a | Two pages missing after djvu/33. | |
| 31 | index | 99 | [31] (poor) | 6 | Listing. | |
| 32 | index | 98 | [32] (good) | 4 | Text shifted relative to djvus. Alt. djvu file:Vol 32 (following pages need replacements: 288, 320, 321, 424, 426, 434, 438). | |
| 33 | index | 9 | [33] (poor) | 6 | ||
| 34 | index | 100 | [34] (poor) | 6 | ||
| 35 | index | 99 | [35] (poor) | 6 | Poor text, as good as we can get at this point in time. | |
| 36 | index | 21 | [36] (good) | 4 | Two pages missing after djvu/255. Page missing after djvu/392. | |
| 37 | index | 100 | [37] (good) | 14 | ||
| 38 | index and pagelist | 19 | [38] (good) | 6 | Text layer present; complete pages. | |
| 39 | index | 13 | [39] (good) | 6 | P.275-6 missing characters where ripped, see this; pp. 301, 369, & 373 text images are crowded on one margin; missing characters. | New text needed. |
| 40 | index | 14 | [40] (good) | n/a | Duplicate pair: djvu/102 and 103 duplicate 100 and 101. | |
| 41 | index | 17 | [41] (good) | n/a | Duplicate pair: djvu/94 and 95 duplicate the two previous pages. Better text found (all pages). (1-31-11) AI vol 41 All pages are present, text near one margin on pages 175-78 will be challenging. | Replaced File |
| 42 | index | 6 | [42] (poor) | 6 | ||
| 43 | index and pagelist | 100 | [43] (good) | 6 | Poor scans throughout the work, mark problematic. | Replaced text with new AI source file. |
| 44 | index | 15 | [44] good | 12 | 12 "workable" problematic pages remain and are identified. | Replace text. |
| 45 | index and pagelist | 9 | [45] (good) | 8 | OCR layer is best scan. Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 46 | index and pagelist | 12 | [46] (good) | 6 | In later half of book identified Problematic scans, may be more in first half. Page 176 is quite blank. Pages 408 and 409 duplicate 406 and 407. | Text layer present, realigned latter pages; complete pages. All bios transferred to Page: and converted to <pages> |
| 47 | index and pagelist | 100 | [47] (good) | 6 | Numerous pages marked as problematic. Candidate for replacement | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> |
| 48 | index and pagelist | 11 | [48] (good) | 6 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 49 | index and pagelist | 11 | [49] (good) | 6 | Numerous problematic pages; replace file. | Updated source file. |
| 50 | index and pagelist | 12 | [50] (good) | 12 | (new scan 20091125) | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> |
| 51 | index and pagelist | 6 | [51] (OK) | 8 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 52 | index and pagelist | 5 | [52] (poor) | n/a | Numbers of illegible pages identified in 2nd half, presumably similar in first half. | Available copy poor, rescue job performed. Will need later review when work done to determine further needs. All bios transferred to Page: and converted to <pages> |
| 53 | index and pagelist | 5 | [53] (OK) | 8 | duplication of pages, and illegible scans. Will determine whether worth replacing with alternate volume. | |
| 54 | index and pagelist | 7 | [54] (good) | 7 | some scans may be indistinct | |
| 55 | index and pagelist | 13 | [55] (good) | 6 | replace file to fix text misalignment. | |
| 56 | index and pagelist | 8 | [56] (good) found to be missing pages 165-172. |
6 | Generated a new djvu version for Commons that is a mix of both files. | Version based on Good version with inserts from other available source All bios transferred to Page: and converted to <pages> |
| 57 | index and pagelist | 15 | [57] (good) | 6 | Existing text pages may need to be replaced (<20091107) All bios transferred to Page: and converted to <pages> |
Scan replaced note. |
| 58 | index and pagelist | 9 | [58] (good) | 8 | Better text found (all pages). (1-31-11) Uploaded new volume which was intact. Previously proofread text aligns with recently added text images. | Volume replaced with complete version. |
| 59 | index and pagelist | 2 | [59] (good) | 6 | Text layer present; complete pages. Listing. Text layer is not best scan. All bios transferred to Page: and converted to <pages> | |
| 60 | index and pagelist | 3 | [60] (good) | n/a | Two pages missing after this, now separately imported. Pages 88 and 89 duplicate pages 86 and 87; two djvu images that should be there instead are missing. Better text found (all pages). (1-31-11) | Text layer present, recovered missing pp. 18-19 All bios transferred to Page: and converted to <pages> |
| 61 | index and pagelist | 100 | [61] (good) | 6 | Templates added, complete pages. All bios transferred to Page: and converted to <pages> | |
| 62 | index and pagelist | 14 | [62] (OK) | 6 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 63 | index and pagelist | 100 | [63] (good) | 24 | Better text needed; complete pages, though p. xviii needs rescan. All bios transferred to Page: and converted to <pages> |
[edit] Notes
- ↑ Percentage text added. Apart from vol. 1, text completed generally ranges between 2% and 4%. Text verified and marked is currently negligible. One point is that the project page suggests ligatures should be added; the æ ligature is very common in the DNB. Another point that needs to be visited is the use of strike-through text to show a "diff" for a later edition.
- ↑ The code here is "good" for the Toronto scans, "poor" for the Google scans, and "OK" for the Hyderabad scans that are of intermediate quality. This is the rule-of-thumb only: in some cases the Toronto scan for a page may be so corrupt that another scan works better.
- ↑ The offset is the difference of the djvu file number and the page number in the volume. This ought to be consistent throughout the volume: if it currently is known not to be, the entry is "n/a" and the next column gives details.
- ↑ The bot-generated initial postings have imperfections, to be noted here.
- ↑ Points include: the best scan may not have been used by the bot ("better text needed"); progress in formatting at least all the author templates.