Wikisource:WikiProject DNB/Progress
From Wikisource
| Wikisource:WikiProject DNB Progress |
This project page is for reference on general cleanup of the pagespace bot postings of DNB volumes.
- There is now a separate statistics page.
- Special:Indexes for Dictionary of National Biography gives a general snapshot of text creation and advancement.
- Djvu files for those who really want to know more.
[edit] Progress and troubleshooting table
| Volume | Index | % text[1] | Best scan[2] | Offset[3] | Glitches[4] | Comment[5] |
|---|---|---|---|---|---|---|
| 1 | index | 100 | [1] (poor) | 14 | ||
| 2 | index | 100 | [2] (good) | 12 | ||
| 3 | index | 100 | [3] (good) | n/a | Two pages missing after djvu.133. | |
| 4 | index | 100 | [4] (good) | 4 | ||
| 5 | index | 100 | [5] (good) | 8 | Listing. Better text needed. | |
| 6 | index | 100 | [6] (good) | 12 | Listing. | |
| 7 | index | 100 | [7] (good) | 6 | First five pages of actual text just missing. | Better text needed. |
| 8 | index | 100 | [8] (good) | 4 | ||
| 9 | index | 100 | [9] (poor) | 6 | ||
| 10 | index | 100 | [10] (poor) | 8 | ||
| 11 | index | 100 | [11] (good) | 6 | Better text needed. | |
| 12 | index | 100 | [12] (good) | 6 | Better text needed. | |
| 13 | index | 100 | [13] (good) | 6 | Better text needed. | |
| 14 | index | 100 | [14] (good) | 6 | Better text needed. | |
| 15 | index | 100 | [15] (good) | 6 | ||
| 16 | index | 100 | [16] (good) | 7 | ||
| 17 | index | 100 | [17] (OK) | 6 | ||
| 18 | index | 100 | [18] (poor) | 6 | ||
| 19 | index | 100 | [19] (good) | 6 | Better text needed. | |
| 20 | index | 24 | [20] (good) | n/a | There is duplication after this to 27; two pages missing after djvu.216; two pages missing after djvu.321; one missing after djvu.345. | Adding templates. |
| 21 | index | 7 | [21] (good) | n/a | One or two pages are missing after each of djvu.231, 343, 377, 382, 386, 389, 392, 395, 396, 407, 414, 417, 427. This is a weird image, apparently mixing two pages. | |
| 22 | index | 11 | [22] (good) | n/a | Several instances of duplicate pairs: djvu.52 and djvu.53 are duplicates, giving p. 44 and p. 45 again; same business with djvu.58 and 59 duplicating djvu.56 and 57, same with 62 and 63 as duplicates of 60 and 61, and also something obscure within 71-73 (very bad images). | |
| 23 | index | 15 | [23] (good) | n/a | A page missing after djvu/301 and another after 302; whole block of pages missing after 359; duplication around 388; page missing after 392. | Better text needed. |
| 24 | index | 99 | [24] (good), metadata says wrong volume | n/a | Two pages are missing after djvu/185. This repeats p. 244 rather than being p. 246; this and this are the wrong way round. | Better text needed. |
| 25 | index | 2 | [25] (poor) | n/a | The page djvu/25 is a duplicate of the one before. A page is omitted after djvu/86. Two pages are omitted after djvu/326. This page is corrupt as image and is not needed for the page sequence. | |
| 26 | index | 6 | [26] (poor) | n/a | P. 15 of text missing after Page:Dictionary of National Biography volume 26.djvu/20. Duplicate pair: djvu/137 and 138 duplicate 135 and 136. A page is omitted after djvu/292. | |
| 27 | index | 3 | [27] (poor) | n/a | Block of duplication from djvu/106 onwards repeats eight pages from djvu/98. | listing |
| 28 | index | 4 | [28] (good) | 6 | Two djvu's are missing between this and this; also two between this and this; also problem between 149, 150. | |
| 29 | index | 100 | [29] (good, but breaks off p.279) | 6 | ||
| 30 | index | 99 | [30] (poor) | n/a | Two pages missing after djvu/33. | |
| 31 | index | 99 | [31] (poor) | 6 | Listing. | |
| 32 | index | 98 | [32] (good) | 4 | Text shifted relative to djvus. | |
| 33 | index | 9 | [33] (poor) | 6 | ||
| 34 | index | 100 | [34] (poor) | 6 | ||
| 35 | index | 99 | [35] (poor) | 6 | Text shifted relative to djvus. | |
| 36 | index | 21 | [36] (good) | 4 | Two pages missing after djvu/255. Page missing after djvu/392. | |
| 37 | index | 100 | [37] (good) | 14 | ||
| 38 | index and pagelist | 19 | [38] (good) | 6 | Text layer present; complete pages. | |
| 39 | index | 13 | [39] (good) | n/a | P.13 missing after this; p.17 missing after this; p.23 missing after this; p.27 missing after this; p.29 missing after this; p.31 missing after this; p. 35 after this; p.37 after this; page missing after djvu/50; also p. 63 missing after this; also p. 402 missing after this. | Listing. |
| 40 | index | 14 | [40] (good) | n/a | Duplicate pair: djvu/102 and 103 duplicate 100 and 101. | |
| 41 | index | 17 | [41] (good) | n/a | Duplicate pair: djvu/94 and 95 duplicate the two previous pages. | |
| 42 | index | 6 | [42] (poor) | 6 | ||
| 43 | index and pagelist | 100 | [43] (good) | 6 | Poor scans throughout the work, mark problematic. | Better text needed. Incomplete listing. |
| 44 | index | 15 | [44] good | n/a | Duplicate page around djvu/116. Page missing after 344, and another after 353. Scans after ~p.230 are horrid, and text and image out of alignment | Will need to be replaced. |
| 45 | index and pagelist | 9 | [45] (good) | 8 | OCR layer is best scan. Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 46 | index and pagelist | 12 | [46] (good) | 6 | In later half of book identified Problematic scans, may be more in first half. | Text layer present, realigned latter pages; complete pages. All bios transferred to Page: and converted to <pages> |
| 47 | index and pagelist | 100 | [47] (good) | 6 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 48 | index and pagelist | 11 | [48] (good) | 6 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 49 | index and pagelist | 11 | [49] (good) | 6 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 50 | index and pagelist | 12 | [50] (good) | 12 | (new scan 20091125) | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> |
| 51 | index and pagelist | 6 | [51] (OK) | 8 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 52 | index and pagelist | 5 | [52] (poor) | n/a | Numbers of illegible pages identified in 2nd half, presumably similar in first half. | Available copy poor, rescue job performed. Will need later review when work done to determine further needs. All bios transferred to Page: and converted to <pages> |
| 53 | index and pagelist | 5 | [53] (OK) | 8 | duplication of pages, and illegible scans. Will determine whether worth replacing with alternate volume. | |
| 54 | index and pagelist | 7 | [54] (good) | 7 | some scans may be indistinct | |
| 55 | index and pagelist | 13 | [55] (good) | 6 | replace file to fix text misalignment. | |
| 56 | index and pagelist | 8 | [56] (good) found to be missing pages 165-172. |
6 | Generated a new djvu version for Commons that is a mix of both files. | Version based on Good version with inserts from other available source All bios transferred to Page: and converted to <pages> |
| 57 | index and pagelist | 15 | [57] (good) | 6 | Existing text pages may need to be replaced (<20091107) All bios transferred to Page: and converted to <pages> |
Scan replaced note. |
| 58 | index and pagelist | 9 | [58] (good) | 8 | pp. 270, 274, 278, 280, 376, 384, 396 mis-scanned. | Text layer present. Certain pages imported. All bios transferred to Page: and converted to <pages> Do we need to replace this volume? |
| 59 | index and pagelist | 2 | [59] (good) | 6 | Text layer present; complete pages. Listing. Text layer is not best scan. All bios transferred to Page: and converted to <pages> | |
| 60 | index and pagelist | 3 | [60] (good) | n/a | Two pages missing after this, now separately imported. | Text layer present, recovered missing pp. 18-19 All bios transferred to Page: and converted to <pages> |
| 61 | index and pagelist | 100 | [61] (good) | 6 | Templates added, complete pages. All bios transferred to Page: and converted to <pages> | |
| 62 | index and pagelist | 14 | [62] (OK) | 6 | Text layer present; complete pages. All bios transferred to Page: and converted to <pages> | |
| 63 | index and pagelist | 100 | [63] (good) | 24 | Better text needed; complete pages, though p. xviii needs rescan. All bios transferred to Page: and converted to <pages> |
[edit] Notes
- ↑ Percentage text added. Apart from vol. 2, text completed generally ranges between 2% and 4%. Text verified and marked is currently negligible. One point is that the project page suggests ligatures should be added; the æ ligature is very common in the DNB. Another point that needs to be visited is the use of strike-through text to show a "diff" for a later edition.
- ↑ The code here is "good" for the Toronto scans, "poor" for the Google scans, and "OK" for the Hyderabad scans that are of intermediate quality. This is the rule-of-thumb only: in some cases the Toronto scan for a page may be so corrupt that another scan works better.
- ↑ The offset is the difference of the djvu file number and the page number in the volume. This ought to be consistent throughout the volume: if it currently is known not to be, the entry is "n/a" and the next column gives details.
- ↑ The bot-generated initial postings have imperfections, to be noted here.
- ↑ Points include: the best scan may not have been used by the bot ("better text needed"); progress in formatting at least all the author templates.