From Wikisource
(Redirected from Scriptorium)
Jump to: navigation, search
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 308 active users here.



This section can be used by any person to communicate Wikisource-related and relevant information; it is not restricted. Generally announcements won't have discussion, or it will be minimal, so if a discussion is relevant, often add another section to Other with a link in the announcement to that section.

250,000 Validated Pages[edit]

We reached 250,000 validated pages on Monday 4 April, with this edit [1] by Akme. Beeswaxcandle (talk) 07:00, 5 April 2016 (UTC)

Hurrah! That's great. :) Now, on to the next ¼ million... — Sam Wilson ( TalkContribs ) … 00:21, 6 April 2016 (UTC)

2000 validated indices[edit]

Sometime in the past week we have past 2000 validated works (currently 2008), with a further 1,139 being proofread. Congrats to all. — billinghurst sDrewth 13:40, 9 May 2016 (UTC)

It looks like it was Index:Rebecca of Sunnybrook Farm (1903).djvu on 2 May. Beeswaxcandle (talk) 09:35, 10 May 2016 (UTC)
Interestingly two and a half months later we have 2,085 validated works and 1,135 proofread only works. 80 works validated in 2½ months is impressive, and ever so slightly we reduce the proofread only. — billinghurst sDrewth 16:06, 23 July 2016 (UTC)


Importing pre-1923 works from Canadian Wikilivres[edit]

Canadian Wikilivres:Category:1922 has so many pre-1923 works. As Special:Import does not allow uploading XML data, should we copy and paste the texts and the history from Canadian Wikilivres, or should we allow importing from Canadian Wikilivres? If American copyright term stays the same, many works published in 1923 will enter the public domain in 2019 in the USA, so we should prepare a system to move works from Canadian Wikilivres. Thanks.--Jusjih (talk) 19:17, 18 June 2016 (UTC)

Wait. As an Old Wikisource importer, oldwikisource:Special:Import allows me to upload XML data there. I just indirectly brought A Woman's History, but upload XML data here requires becoming an importer. Where should any trusted users apply?--Jusjih (talk) 19:33, 18 June 2016 (UTC)
Are they scan-backed? Beeswaxcandle (talk) 06:44, 19 June 2016 (UTC)
Not always. German Wikisource requires scans with texts, but we and Canadian Wikilivres do not. If anything has questionable source, we may consider deleting for non-copyright reason.--Jusjih (talk) 21:09, 20 June 2016 (UTC)
If we want the extended import facility we can reach a consensus to identify and appoint suitable people to be temporarily granted the right via a request to m:SRP. It shouldn't be overly problematic. We could also explore whether we can have a transwiki import via Special:Import, we can do it to internal wikis, and it may be possible with external wikis. That I don't know. — billinghurst sDrewth 06:06, 22 July 2016 (UTC)

Make the OCR gadget a default gadget[edit]

The OCR gadget is enormously useful and I didn't even know it existed until recently. It seems like it would be a good idea to enable this gadget by default so that more people will discover it. Any thoughts? Kaldari (talk) 00:05, 14 July 2016 (UTC)

Nice to hear from you every year or two. I'm not very technical, and I have no idea what that javascript does. I think you might want to take this proposal to MediaWiki for vetting by techies before bringing it here. Outlier59 (talk) 00:24, 14 July 2016 (UTC)
We had it as default for a while, then we turned it off, and I forget why exactly. I know that we backed off numbers of our gadgets for being the default. Many don't require it as a default, though maybe we can better mention it on our information to users. — billinghurst sDrewth 06:39, 14 July 2016 (UTC)
The reason the OCR gadget is not on by default is that it should only be used when there is no text-layer on an imported .djvu or .pdf file. It doesn't do as good a job as that done by the OCR at Internet Archive. Occasionally although there is a text-layer in a file it doesn't show up in the Page: namespace. If this happens to you, drop a note on this page and someone will have a look and in the majority of cases fix the file. Beeswaxcandle (talk) 06:52, 14 July 2016 (UTC)
Yeah, this seems sensible. I think I've only ever used the gadget about twice. Although, I've yet to delve into that part of things since the IA stopped making djvus; has that changed anything? — Sam Wilson ( TalkContribs ) … 07:58, 14 July 2016 (UTC)
I actually had a mild panic attack when I came back from a wiki break and couldn't find the button. It doesn't help that the default header text references the use of the button, but then the button is nowhere to be found. It doesn't help that the header when transcribing says "If no text layer is automatically made available, click the {OCR image} button on the toolbar to generate one." The word "button" is actually a link to your preferences, but that's not really obvious. At minimum, can we get that header text ammended? Maybe to say something like, "Don't see the button? Update your preferences!" ...which has the added benefit of giving more visibility to the available gadgets. Mukkakukaku (talk) 17:15, 26 July 2016 (UTC)

Making <pages> more flexible?[edit]

ShakespeareFan00 recently approached me with a multi-page-table-transclusion issue which, upon further research, has led me to realise the current implementation of the <pages> tag has a number of built-in (or hard-coded if you prefer) aspects which are not always desirable. In particular there are two issues which concern me:

  1. the output of <pages> is always enclosed in <div>–</div>, even when such enclosure (as in the middle of a continuing table) is simply wrong. Fortunately in most cases the mediawiki parser subsequently corrects the "error" and the end-result still (mostly—but not always) "works."
  2. necessary embedded metadata used to later produce page number links back to the originating Page: name space is always passed through MediaWiki:Proofreadpage_pagenum_template (nice and controllable at least by administrators) but then the results of that pass are then further enclosed in <span>–</span> which is immutable and only works properly in a textual context. It certainly fails in tables—and it is one of the root causes for some of the stranger recommendations to use {{nop}} to join table pages.

I have certainly not started writing code for any of this but do not anticipate it will be all that difficult to modify the ProofreadPage extension to continue to behave identically as before, yet accept two–or more–new parameters, tentatively titled:

defaults to Proofreadpage_pagenum_template and controls the MediaWiki: name space (thus automatically fully protected) template-like item used to process page number metadata.
defaults to span and controls the HTML element(s) used to wrap the above output. Perhaps an opening and closing pair expressed in JSON might be better?
defaults to div and controls the HTML element(s) used to wrap the entire <pages> output. I envisage none might be a useful alternate option to deliver raw output where desirable (e.g. for tables.)

Most (perhaps all) changes required—if this proposal be accepted—would affect but a single function, render, of class PagesTagParser in ProofreadPage/includes/PagesTagParser.php.

To set expectation levels, creating and applying these changes would result in <pages> becoming usable for one "purpose" per invocation. It would still not be possible to arbitrarily mix (for example) textual and tabular-extraction transclusions on a single tag.

Any thoughts/objections? AuFCL (talk) 04:42, 18 August 2016 (UTC)

I can think of a 'gotcha' already. On some multi=page tables you have the situation where the first part of the table (that's split up due to transclusion limits) falls after 'text' content. and where you would need a normal startingpages enclosure, but not necessarily a standard end one. Conversely you may also have a page where the end-of table proceeds normal text. Here you may need a normal end enclosure, but not necessarily a starting one. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
I disagree. You can always avoid this situation by introducing additional section markers and transcluding only like-with-like content with a single <pages> invocation. In this case it is nearly certain an intervening structural element (e.g. table closure followed by ordered list commencement) would be required before commencing another <pages> extraction. I hope you would never expect the system to handle ramming a block structure into a non-block context without consequent loss of fidelity? You cannot do that in HTML, let alone the wiki sub-set. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Noted, this would be a documentation issue, rather than a technical one then. ShakespeareFan00 (talk) 21:33, 18 August 2016 (UTC)
There's also the issue of "ribbons" (Table rows which are relevant in Page: namespace, but which when indvidual sections are pulled together to construct a Main namespace page from parts of the table would not be, ( such as row with Chapter, Page headings in small type in TOC pages. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
As discussed elsewhere such things are never included and may be considered to be logically enclosed in implicit <noinclude> blocks. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Being able to tweak the enclosures might also be useful in respect of pulling parts of multi-page lists ( The currently stalled Transcription of Ruffhead's statutes uses mutli-page lists.). I appreciate the above related to tables, but it's another use case that might need to be considered. (I'm not entirely happy with that works approach though. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
See above. Don't mix unlike content blocks in a single operation. Worst case(s) I currently envisage:
  1. text: already handled by the current hard-coded implementation (<span> wrapping works for possibly 99.99% of cases including all inline, and fully-complete—opened and closed—block structures)
  2. table extractions: partially handled by the current implementation (<span> works well inside table cells; but fails spectacularly when inserted between </tr><tr>. The roulette wheel of wiki. Who wants to bet on black?)
  3. ordered lists extractions: the current implementation does not work at all here. Only complete lists may be transcluded without resort being made to multiple counter restarts.
  4. definition lists: probably doesn't work. Fortunately not used much outside of discussion pages. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Comment : Would it be feasible to have an opening/closing pagesenclosure option? This follows on from my thoughts earlier. having a nostart and noend might resolve one issue. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
And finally, It would be nice with the changes to de-overload {{nop}}, so that it doesn't need to be used for both a paragraph break, AND a de-facto "following input is at the start of new line so the parser understands it correctly", which is from where what caused this discussion to happen.

ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)

I think that this discussion belongs at mul:Wikisource talk:ProofreadPage as you are talking about something that affects all the Wikisources as it is universal code. Some of the issues that you discuss about page numbering are somewhat due to our implementation of that feature, so we would need to ensure that what is done is not impacting other WSes negatively.

When the discussion is moved there we can use the mediawiki message delivery system to put a message on all the WS Scriptoriums. — billinghurst sDrewth 13:42, 18 August 2016 (UTC)

Agreed. However let us try to thrash out the major issues before parading what might turn out to be a bad idea more widely? I do not pretend this suggestion is a panacea, nor do I want people to think it is a substitute of machine over natural intelligence. The current implementation of <pages> works well enough for maybe as many as 9,999 out of 10,000 cases. I am simply proposing an additional flexibility which will permit it to work for a few more, currently pathological, cases—so in this sense it is an improvement on an imperfect case; but with no pretence of perfection itself. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Okay with me, I just wanted to ensure that we set appropriate expectations at the outset. — billinghurst sDrewth 22:39, 18 August 2016 (UTC)
Is there a way to move an entire disscussion thread cross wiki? (I've got no objections to an admin doing so). :) ShakespeareFan00 (talk) 13:46, 18 August 2016 (UTC)
COPY ... PASTE ... COMMENT ... wikilink with [[Special:Permalink/nnnnnn]]. I would wrap it in {{cquote}} if they have it and what you are transferring is small. Otherwise it is a summation of the preliminaries, and the outocmes, and a permalink as background reading of the detail. — billinghurst sDrewth 22:39, 18 August 2016 (UTC)

Additional Issue 1: Headers & footers[edit]

If there's going to be an update made to proofread page, can I make a request that a consideration is made concerning the possibility of reviewing how header and footer content is potentially handled as well?

Can someone explain how headers/footers are currently handled internally? As I was wanting to suggest that there should be a way of seperating PAGE based incldueonlys and noincludes from the conceptually different header and footers (An aside here is that under certain circumstances line feeds, nops etc have to be inserted in the page content or footer, to get the parser to see it as seperated from the header or page content respectively. (Table handling is one such area where this is most noticeable. I've also sometimes when using some templates had to force {{-}} clearance in a footer, even though the template was in the PAGE content. This is proabably a much bigger issue to solve though :( )? ShakespeareFan00 (talk) 10:26, 18 August 2016 (UTC)

Umm just add a clear line to first part of the footer, no need for special code. The first component of footer needs to run into the preceding paragraph for some situations. — billinghurst sDrewth 13:46, 18 August 2016 (UTC)
<pages> always strips off the Page: space header and footer and throws them away (recall the "<noinclude>" hints which appear on all header/footer edit boxes?) so I don't really see this as an issue. And I certainly don't envisage altering this behaviour.
Billinghurst is quite right above. My only misgivings are that unprotected new lines are both not very visible, and prone to automated processes or "cleanup"/"tidy" script removals resulting in unnecessary angst. (Also don't forget the precedent set by Visual Editor always axing an unaccompanied </div> in a footer. At least you can say this for {{nop}}: you can see it!)
An unappreciated aspect of the wiki-syntax is that people forget that at a technical level the table closure symbol recognised by the parser is not really |} at all but in fact (new-line)|}! Similarly |+, |- etc. AuFCL (talk) 20:32, 18 August 2016 (UTC)
@AuFCL:, see the note about Tablecrux I left at your talk page :) ShakespeareFan00 (talk) 20:38, 18 August 2016 (UTC)

Bot approval requests[edit]


Preferably, we ask your HELP questions at Wikisource:Scriptorium/Help.

Repairs (and moves)[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

Index:Index:Ministry to US Catholic LGBTQ Youth - A Call for More Openness and Affirmation.pdf [edit]

This should be at Index:Ministry to US Catholic LGBTQ Youth - A Call for More Openness and Affirmation.pdf ? ShakespeareFan00 (talk) 08:33, 20 August 2016 (UTC)

Yes check.svg sorted Beeswaxcandle (talk) 09:54, 20 August 2016 (UTC)

Other discussions[edit]

What free software and web resources do you use when contributing to Wikisource?[edit]

I was considering getting back into Wikisource and I was wondering if you guys used any free software or web resources in the course of your work that you would recommend to others. Abyssal (talk) 16:50, 8 July 2016 (UTC)

This is what I use:
  • Firefox—a browser is all you really need to get started.
  • GIMP for extracting images from scans
  • Inkscape for extracting images from PDF files
  • AutoHotKey for works that contain many repetitions of a tedious string, and also for simple insertion of special characters
  • website for Greek characters, though I use AutoHotKey for that now too
  • and websites for searching for Unicode characters
  • Notepad++ for regex find-and-replace - also website for learning and testing regex
  • DjVuLibre for manipulating and viewing DjVu files
  • PDFsam - just started using this to manipulate PDF files, good for getting rid of the pesky Google scan page at the beginning of all their scans
  • AutoWikiBrowser for tedious bulk edits
Beleg Tâl (talk) 18:05, 8 July 2016 (UTC)
Thanks, those are really helpful! Do you know of any free offline OCRs that are any good? Abyssal (talk) 19:06, 8 July 2016 (UTC)
I use IrfanView (a free download for Windows) for image manipulation and conversion. For OCR I use TextGrabber on an iPad. Beeswaxcandle (talk) 23:11, 8 July 2016 (UTC)
For ocr-ed djvu, DjVuToy can be used. However, this application uses Microsoft Office Document Imaging (MODI) for doing ocr. If you don't have it, it can be added as described here. Hrishikes (talk) 02:22, 9 July 2016 (UTC)
Hey @Beleg Tâl:, when you say that Inkscape can be used to extract images from PDFs, can it be used to extract all the pages at once? I can only get it to open one page at a time. Abyssal (talk) 15:18, 11 July 2016 (UTC)
As far as I know it can only do one page at a time. —Beleg Tâl (talk) 15:21, 11 July 2016 (UTC)
@Abyssal: I use PDF24Creator for extracting all pages at a time. Hrishikes (talk) 04:32, 12 July 2016 (UTC)
@Hrishikes: Thanks. I've downloaded PDF24Creator. Abyssal (talk) 12:47, 14 July 2016 (UTC)
This is a nice topic because there are so many helpful programs out there that we may not be aware of. I've been creating some decent quality ocred djvus myself lately since stopped creating them. For example this file was turned in this Index:Evolution and Natural Selection in the Light of the New Church.djvu OCRed djvu file with the programs below. It's not perfect but it's decent enough. Maybe we should create a help page where all these programs are mentioned and also some tips on how to use them.
  • downthemall which is a firefox addon download manager. This is useful when scans are available, but each page is an individual file. You can use batch descriptors to easily add all the files you want to download to the manager, and there is also a renaming mask which allows you to rename each file as you like before being saved.
  • scantailor This is an excellent program for processing scanned images. It allows for separating double page scans, cropping etc and is mostly automated.
  • gscan2pdf. This is Linux only, this is a most excellent program which was hard for me to find. It is a gui and can open most image formats including pdf and djvu. It can ocr the files and then it can bind the files and save them as pdf or djvu. Very excellent which has allowed me to create many djvus. Jpez (talk) 07:05, 10 July 2016 (UTC)
  • BUB :Book Uploader Bot This tool supposedly will automatically upload books from various sites to You can then use ia-upload to download the book from to wikicommons. It works well for me for Google books, it didn't work at all for me for hathitrust though. I haven't tried any of the other sites mentioned. I like it for google books because it gets rid of the Google notice and adds all revelant data (author, publisher) etc automatically. By the way, has anyone by chance been successful transferring a book from Hathitrust with this tool? If it could work with Hathitrust it would be a great help. Jpez (talk) 09:33, 31 July 2016 (UTC)

Seeing display difference between logged-in and not[edit]

Looking at The Production of Security while logged in gives a significant difference for me with not logged in (checked in firefox amd chrome). I cannot see that I have anything in particularly in my local common.css file or my global.css file that would have an impact. Do others see a difference, or is it just me? Thx. — billinghurst sDrewth 06:39, 17 July 2016 (UTC)

Two differences for me: when logged out, the "Upload file" option disappears from Tools section; and the "Create a book" option appears at the top of "Download/print" section (it is the last item when logged in). Hrishikes (talk) 09:45, 17 July 2016 (UTC)
The title page was {{center|{{xxxx-larger block|{{xx-larger block|The Production<br><b>{{larger|of Security}}</p>}}}}}} [[category:not transcluded]]. Very big on top of very big, with some odd html tagging. I tried cleaning it up. See how it looks now. Outlier59 (talk) 12:09, 17 July 2016 (UTC)
Sorry, that page wasn't transcluded (duh!). More very big on top of very big at Page:The Production of Security.pdf/4, so I modified it. Probably too big for browser or skin to handle correctly. Feel free to revert. Outlier59 (talk) 12:36, 17 July 2016 (UTC)

Hmm, I should have been more specific. On the Page:The Production of Security.pdf/4, I am seeing a large gap in the first para when logged out, and a tighter para spacing when logged in (irrespective of the changes prior or now. Noting that the additions of nested larger will just increase the %size of text and it shouldn't be problematic. It seems to be something with the line height / line spacing. — billinghurst sDrewth 14:41, 17 July 2016 (UTC)

For me, there is no difference, in Chrome. Hrishikes (talk) 17:25, 17 July 2016 (UTC)
I just tried changing my skin to monobook, and I'm seeing that tight spacing that you're seeing, Billinghurst. The default is vector. So it looks like it's a difference between the skins. Outlier59 (talk) 21:02, 17 July 2016 (UTC)
In Firefox, logged in, the main namespace font looks like Times Roman (serif) and the same text in the page namespace is Arial (sans serif). I am just saying. — Ineuw talk 07:54, 18 July 2016 (UTC)
Buried within mediawiki:Vector.css is the fragment: {
	position: relative;
	font-size: 0.8750em;
	font-size: calc(1.00em * 0.8750);
	line-height: 1.6;
	z-index: 0;
Note the high-lit line line-height: 1.6;. Monobook hold-out users do not get this line as may be demonstrated by enclosing the transcluding line in <div style="line-height:normal;"> as I have done. Either this is your solution or recreate and repopulate mediawiki:Monobook.css with suitable content. Modern or Cologne Blue users will be experiencing similar issues. AuFCL (talk) 09:52, 18 July 2016 (UTC)

'Image missing' template floating to the top on transcribed pages?[edit]

I added the {{image missing}} template to two pages that I was proofreading. Then I checked in the regular namespace where this particular work is already transcribed, to see if it looked all right, and I found that the template had "floated" to the top of the page.

The work is here: Thiphanate-Methyl in Air (5606).

The template is present on these pages: 5, 6

This is what I see:


It doesn't appear related to browser or logged-in status. I'm logged in on Firefox and I see it, and logged out in Chrome and I still see it. And it's 99% probably not a cookie thing because this is a new machine and this is the first time I've logged into wikisource since I got it.

Has something changed with this template, or is this some weird new "expected behavior"? - Mukkakukaku (talk) 18:23, 18 July 2016 (UTC)

@Mukkakukaku: It is a known behaviour that has occurred with changes in the css/framing/extension (which I do not know), and it is a little unusual, but that is how it currently is working and it relates to Module:Message box (which it is out of my knowledge base). Once the images are added they will appear in line properly and the matter should resolve. — billinghurst sDrewth 02:24, 19 July 2016 (UTC)
Actually I don't think it has to do with the Lua module itself. I read through the code and it's pretty straightforward. I did notice that on a slow connection, the box actually renders in the correct place, and only after that does it move to the top of the page -- which makes me think that there's some JavaScript or something running later in the execution chain that is moving all elements of a particular "type" to the top.
But anyway, it's good to know that this is a known issue. It's just very confusing because the text on the template says that there's an image missing "at this place in the text" and yet the boxes are all hanging around at the top of the page above the header, and there's no indicator of anything "missing" at the given place in the text. -Mukkakukaku (talk) 13:13, 19 July 2016 (UTC)
Yes, and I believe it is also due to how our headers and footers work, and the containers that are set (/me waves hands to explain all that css gobbledygook black magic.) [Things were somewhat easier to decipher in some regards when all our css was all in the one file.] — billinghurst sDrewth 13:37, 19 July 2016 (UTC)
@Billinghurst, @Mukkakukaku: see: line 111 on MediaWiki:Common.js, Zdzislaw (talk) 21:57, 19 July 2016 (UTC)
Yep, that's it! Good eye, @Zdzislaw:. It's grabbing all the amboxes and shoving them up into the page header. The only reason I can think of that being a good idea is if we only had page-level amboxes and not section-level (or in-the-middle-of-the-text-level) amboxes. Nixing that line should fix the issue, but it will cause all ambox's to remain where they were initially placed. Depending on what the original intention of the coder was, a more nuanced solution may be needed (like maybe not using ambox? The comment says "envelope hatNotes & similar into main navigation header container". Only ambox is affected by this particular line of code.) Mukkakukaku (talk) 23:06, 19 July 2016 (UTC)
@George Orwell III: who was doing the heavy lifting here. I can guess that some of this is around things like {{no source}}, {{copyvio}}, {{sdelete}}, etc., however, that is only a guess and the reasoning was not discussed with the community. There is clear value on this being a community consensus. — billinghurst sDrewth 23:45, 19 July 2016 (UTC)

Tech News: 2016-29[edit]

20:18, 18 July 2016 (UTC)

The Last of the Tasmanians needs a map — Hathi Trust?[edit]

Hi. At Page:Last of the tasmanians.djvu/21 we appear to have a folded map (by design or by accident is unknown) which has not been properly scanned. I have checked the only other full version available at Google, and the map page is not shown at all. I am wondering whether someone with reasonable access to the Hathi Trust collection can check their collection to 1) see if the map is consistent in the same edition, and 2) can we get a version of the map separately to replace the rubbish version that we have. The page in our scan seems to follow directly after the list of illustrations. Thanks. — billinghurst sDrewth 02:18, 19 July 2016 (UTC)

@Billinghurst: Full map is available here. But the map is not mentioned in the list of illustrations. It is, however, a part of Daily life and origin of the Tasmanians by the same author and mentioned in the list of illustrations of that work. HathiTrust maps (when available) in all accessible versions of both books are defective. Hrishikes (talk) 04:36, 19 July 2016 (UTC)
Yes check.svg Done Image converted, reproportioned and uploaded as a png. Thanks for the pointer. — billinghurst sDrewth 21:19, 19 July 2016 (UTC)

Which tags are excluded in a page footer?[edit]

To resolve a page spanning text break in the main namespace, I tried to enclose text using <div></div>, placing the closing tag in the footer, but it was deleted when I previewed the page. I managed to resolve the problem using <span> in the same manner, but was wondering what other codes are not valid to be placed in the footer. This is the page where I used it with success. — Ineuw talk 17:16, 21 July 2016 (UTC)

It isn't just happening to you. I believe this is new behaviour and (thus far) only seems to affect </div> (my money is on this being some half-baked attempt to address phab:T138604 and no doubt over-enthusiastic system "trimming" is responsible. I forget the reference (related to phab:T133294?) but recently the internal database storage format (content model?) for Page: changed and this too could pertain.) My advice at this stage is to continue as if the stripped </div> is still present. AuFCL (talk) 22:09, 21 July 2016 (UTC)
Thanks @AuFCL:. You are right as usual. First off, yesterday's attempt to use <span> failed. Now, I did what you recommended and is fine (especially in the main namespace, where it counts). Furthermore, the strange is that in the following page's header, it retained the opening <div style="line-height:130%;">. What a goulash. Knowing how much they love me at phabricator, should I file a bug report? Biggrin-smiley.svgIneuw talk 03:58, 22 July 2016 (UTC)
W.r.t. last, don't bother. This stuff is being driven purely by Visual/Editor considerations and any hint that (horrors!) unbalanced HTML should be catered for in any way at all will be met with universal disparagement. You have put up with me ranting on this topic before so I shall not rehash that bit. So the header is kept; what: you expected consistency or something? How unreasonable of you! AuFCL (talk) 04:06, 22 July 2016 (UTC)
I would suggest that template the use may be more resilient to the new trimming. So you can always use {{div end}} or any of the other 'block' variations in footers to close an open div. I think one of the things that we are now also seeing with the html5 updates is that these raw tags that have been used out of scope are somewhat problematic to clean up manually, whereas where they are templated they do become easier to amend globally. — billinghurst sDrewth 06:00, 22 July 2016 (UTC)


Does anyone know why Index:A-Kentucky-Woman-Dec1892-NationalBulletin.jpg isn't showing a pagelist? --EncycloPetey (talk) 19:10, 21 July 2016 (UTC)

My fix attempt was not successful. --EncycloPetey (talk) 19:15, 21 July 2016 (UTC)

I changed [[File: to [[:File: to prevent it from displaying the images; it appears to work now. —Beleg Tâl (talk) 20:27, 21 July 2016 (UTC)
Neither name space references nor validation status correct. Fixed both. AuFCL (talk) 21:26, 21 July 2016 (UTC)
@EncycloPetey: I also rejoined pages at Why Democratic Women Want the Ballot. If the intent was to present the final transclusion as explicitly-separated pages this may not be what you wanted. AuFCL (talk) 22:30, 21 July 2016 (UTC)
My primary concern was that the individual proofread pages were not linked in the Index, and I couldn't think how to fix that. Thank you for correcting this. --EncycloPetey (talk) 22:35, 21 July 2016 (UTC)
Is the information at mul:Wikisource:ProofreadPage insufficient? — billinghurst sDrewth 23:59, 21 July 2016 (UTC)
Yes. --EncycloPetey (talk) 00:27, 22 July 2016 (UTC)
Perhaps selected aspects of Help:Index_pages#Using_individual_image_files ought to be combined with/linked to the help page Billinghurst references above? AuFCL (talk) 04:25, 22 July 2016 (UTC)

Index:Life and Select Literary Remains of Sam Houston of Texas.djvu[edit]

Index:Life and Select Literary Remains of Sam Houston of Texas.djvu is uploaded on Commons, and I created the Index here. My experience of this type is scant, and help pages aren't that helpful if the editor is unfamiliar with any terminology. I checked progress as Text Layer Requested, as a guess of what it needs. I have two issues with this:

  • There is no side-by-side, just the text on the left-hand side.
  • What is on the left-hand side has no formatting at all.

Please advise on how I might have erred, and how do I correct this. Exactly what is the index status supposed to be after I first create an index that already has the file on Commons? Maile66 (talk) 16:58, 22 July 2016 (UTC)

The "scan resoution in edit mode" was set to 0. I removed it, so now the page scans show up. I have no idea what that property is for, but I've noticed that whenever I try to use it, this happens. -Mukkakukaku (talk) 17:44, 22 July 2016 (UTC)
Thank you. That has resolved the issue. Maile66 (talk) 18:12, 22 July 2016 (UTC)
Normally there is no need to use the scan resolution so leave it empty so that the default width is used. On very high resolution scans there are some times we need to force it lower. — billinghurst sDrewth 00:05, 23 July 2016 (UTC)
  • Page headers on this. I'm thinking consistency is what is wanted. I started off with "larger" in the page headers. Then I noticed another editor who is gradually editing some pages without the "larger" in there. I removed that element from the previous pages I edited. Please let me know what the criteria is before I edit any other pages. Maile66 (talk) 23:55, 23 July 2016 (UTC)
On the talk page for the book, list formatting guidelines for the book. Show a link to a sample page (or two) for formatting. While you're in the process of working out the formatting, you might want to put an "under construction" tag on the talk page, to discourage others from wanton editing.
If someone comes along and edits without paying attention to your formatting guidelines or in-process notice, ask them to hold off on editing, because you're actively working on guidelines. If they mess up the text (as in diff -- initial M changed to L), either "undo" the edit or fix it. I suggest "undo" in that case, because the editor ADDED an error during "validation." Some editors like to jump in while other editors are working on a book. Sometimes this is helpful, sometimes not. If it's not helpful, tell the editor that it's not helpful. If it's helpful, send a thanks.
If you're not sure whether the header should be "larger" or something else, ping a couple of other users and ask them what they think. If you pinged me and asked me what I thought, I would suggest xx-larger with centered italics -- which you can set up on the Index file header. Is it REALLY xx-larger with centered italics? Beats me. But it's probably close enough. It's your call. So make the call and move on.
Try to keep discussions about this book on Index talk:Life and Select Literary Remains of Sam Houston of Texas.djvu.
Above all else, ENJOY what you're doing here! Outlier59 (talk) 02:15, 24 July 2016 (UTC)

Missing pages[edit]

  • Per this talk thread, I found pages missing from the djvu file as I was working on the Contents page. I checked the upload at Commons, and the pages are missing from there. There could possibly be more, but I'm not going to edit this Wikisource file anymore. That was the only djvu on this book that I found to upload. I believe this version has all the pages, because I've used it as a resource at Wikipedia. I only remember doing one or two other djvu uploads, and they were short and problem free, so I'm a bit inexperienced in general on uploads. Can anyone help or advise? Maile66 (talk) 21:47, 25 July 2016 (UTC)
You can directly upload lifeselectlitera00cran from Internet Archives to Commons using IA-upload. There is a djvu file. Upload to commons with name "Life and Select Literary Remains of Sam Houston of Texas (1884)", then create a new Index file in Wikisource. You can probably quickly copy most of what you've already done to the new Index file, then you should be good to go. That's my thought, anyhow. :)
Hi @Outlier59:, that looks intriguing, but the link is dead. Would be nice to upload directly without having to download first, I'd like to know about that if there's an option! -Pete (talk) 01:40, 26 July 2016 (UTC)
I swear half the time I can't figure out this link formatting stuff. Truly aggravating. Anyhow, the Index file is now at Index:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu, if anyone wants it. Outlier59 (talk) 01:52, 26 July 2016 (UTC)
Sorry @Outlier59:, I should have specified -- I meant the toollabs link, not the text itself. -Pete (talk) 01:58, 26 July 2016 (UTC)
Not a problem, but please do me and Maile66 a favor and correct the link formatting in this discussion for toollabs. Outlier59 (talk) 02:41, 26 July 2016 (UTC)
Hmm, not sure what you're saying. Does the IA-upload link you placed above work for you? It doesn't for me. I get a page on Commons that says: "Bad title

The requested page title was invalid, empty, or an incorrectly linked inter-language or inter-wiki title. It may contain one or more characters that cannot be used in titles." -Pete (talk) 03:02, 26 July 2016 (UTC)

Maybe try toollabs:ia-upload instead? I can only assume the Commons indirection was a mistake. AuFCL (talk) 03:29, 26 July 2016 (UTC)

@Maile66:, sorry to take us off track. In case this didn't get suggested before, on Commons, I would advise uploading a good scan to the same name as the existing one. That is, when looking at the existing file's page on Commons, look for the "Upload a new version" link.

(On my other subtopic: @AuFCL: yes, the ia-upload link above works for me...oddly, I had trouble finding it when simply searching the tool labs myself before I asked. But it was probably an error on my end somehow.) -Pete (talk) 15:42, 27 July 2016 (UTC)

@Peteforsyth: Everything is cool right now. Outlier59 already uploaded a good file on Commons, and Indexed it at Index:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu, as he indicated above. It's a much more accurate version than what I had uploaded. I'm in the process of making sure all the pages are there. Then I'll ask for a Commons delete on the incomplete file I originally loaded. Should I then have the old Index deleted, or just leave it for a record? I'm concerned someone might waste time trying to proof the old one. Maile66 (talk)

First contribution to Wikisource[edit]

Hi everyone!

I'm making my first contribution to Wikisource and I would like to know if some could check what I have do a give to me some advice to improve the future transcriptions. The work is Sixteen years of an artist's life in Morocco, Spain and the Canary Islands. I ask you for advices now because I have made up to the first chapter.

Thanks in advance. I await your answers!

Regards, Ivanhercaz | Talk Plume pen w.png 02:09, 23 July 2016 (UTC)

I've replied on your talk page with some comments. Best, Mukkakukaku (talk) 12:50, 23 July 2016 (UTC)
Small formatting issue Table of contents pages such as Page:Sixteen years of an artist's life in Morocco, Spain and the Canary Islands.djvu/9 and Page:Sixteen years of an artist's life in Morocco, Spain and the Canary Islands.djvu/10 won't connect properly since the content in them is inside of {{small}} and {{hi}}. Are these templates even necessary? —Justin (koavf)TCM 14:24, 26 July 2016 (UTC)
The contents can be made to look exactly like the original with {{dtpl}} if anyone's interested. Jpez (talk) 09:19, 31 July 2016 (UTC)

Tech News: 2016-30[edit]

19:54, 25 July 2016 (UTC)

pline woes[edit]

Judging by the troublesome results I'm getting on Page:Acharnians and two other plays (1909).djvu/171, I gather that setting a text-indent with div affects the location of the float:right in {{pline}}.

Is there a simple fix I'm not seeing? Is there a fix that can be put in place using {{pline}}, or at least without re-editing the entire work to use sidenotes?

I wouldn't have expected text-indent to alter the right margin, so perhaps something else is going on? --EncycloPetey (talk) 17:55, 26 July 2016 (UTC)

It looks like text-indent just shifts everything on that line. Maybe a style parameter could be added to {{pline}} and then an opposite text-indent could be used? Or maybe wrap the {{pline}} with <span style="text-indent:4em;"></span>? —Beleg Tâl (talk) 18:17, 26 July 2016 (UTC)
That's not what's happening here. The text-indent shifts only the first line of a paragraph, and these uses of {{pline}} aren't on the first line of the paragraph, unless the software is treating them like independent paragraphs. What's frustrating me most is that I've never seen {{pline}} do this before, and I've been doing similar formatting on many poetically formatted works for a while now. Notice that all the text on the page has text-indent, just to different values. --EncycloPetey (talk) 18:34, 26 July 2016 (UTC)
Wrapping the pline with a <span style="text-indent:0;">{{pline|400|r}}</span> does the trick. The problem is that the floated line number is inside the div with the text-indent applied to it, so it implicitly inherits that property (the "cascading" part of the "cascading style sheets.") It may be something we want to apply to the template in general instead of hacking on like I just did with line 400 as an example on page 17. --Mukkakukaku (talk) 18:55, 26 July 2016 (UTC)
Yes check.svg Done Thanks. I've adjusted the template accordingly, but only for right-floating line numbers. I'm hesitant to adjust anything for the left-aligned ones unless we have test cases to look at. --EncycloPetey (talk) 19:17, 26 July 2016 (UTC)

What is this tooltip trying to say?[edit]

So if you look at a transcribed work in the main namespace, at the top left is a bar with a series of colors which correspond to the statuses of the transcluded pages. Today I hovered my mouse over those colors for the first time and saw that I got a rather interesting tooltip:


Are those supposed to be percentages? If so, maybe they should have a '%' sign on them (and maybe fewer decimals?)

That's my best guess as to what they refer to, especially since the work in question only has 11 pages. Anything else is beyond me.

(For what it's worth, my mouse, which doesn't show up in the screenshot, is hovering over the strip of green and yellow. I'm using firefox, but it should be reproducible in all browsers since it's a generated value and not styling related. Mukkakukaku (talk) 19:45, 26 July 2016 (UTC)

Good catch! I don't think I've ever hovered over that thing. But yep, it certainly looks a bit odd. I guess file a bug with the proofreadpage extension? Sam Wilson 09:49, 27 July 2016 (UTC)
I'm also using FireFox, but I get no text at all when I hover over one of the colored bars in the Main namespace. I've tried a couple of different works, and I get no popup messages. --EncycloPetey (talk) 18:58, 27 July 2016 (UTC)
... weird. I have the tooltip in both Chrome and Firefox. Looking at the source, it's created using an HTML table, with the title attribute set (which renders the tooltip on hover.) It should be standard behavior across browsers, unless for whatever reason the server isn't giving you a page that has the title attribute set on that particular element. I see it both logged in and not. No idea why you don't see it. Mukkakukaku (talk) 20:08, 27 July 2016 (UTC)
I have it. So here's an additional issue. I went to a couple of other random Aircraft Accident Report pages. In all cases, the sum of the pages that were validated, proofread only, and not proofread was always 100 pages. In the two other cases I checked, one was 100/0/0 while the other was 0/100/0. In this case, we currently have fractions that are the repeating decimal versions of 81+911 and 18+211, which also add to 100, as far as it goes. (The picture above shows a different breakdown, but they still add to 100.) StevenJ81 (talk) 21:01, 27 July 2016 (UTC)
Quick note for StevenJ81 (probably of no interest to anybody else): It is no surprise these factors all add to 100% as that is precisely the way they are calculated in the first place. In essence the function prepareArticle within ProofreadPage.body.php gathers up the counts of pages in each category of proofreading state, and then calculates the percentage proportion of each divided by the total. So all you are really doing is (partially) reversing the calculation PHP has done earlier. AuFCL (talk) 21:33, 27 July 2016 (UTC)
Actually the numbers for Aircraft Accident Report: Eastern Air Lines Flight 663 add up to ~81.4 since I think it's ignoring the "problematic" pages. --Mukkakukaku (talk) 22:01, 27 July 2016 (UTC)
Well spotted. The raw "Problematic" figure is 18.518518518519% which pretty much rounds the total out to 100 again. I just noticed that there a bug report already outstanding for this portion of the issue. AuFCL (talk) 23:09, 27 July 2016 (UTC)
@AuFCL: I figured that. But then the tooltip should have percent signs available, too. On its surface, the tooltip seems to be suggesting that there are 100 pages, and that x, y and z are the numbers of those pages in each category. StevenJ81 (talk) 17:01, 28 July 2016 (UTC)
I believe I have now figured out the history of this. An earlier bug report trying to improve/standardise the HTML inadvertently introduced this issue (phab:T76284—be warned you will need to read between the lines, as the topic meanders somewhat) yet kept on using the language-specific strings now appropriate only to Special:IndexPages (see there for the intended usage.) Later on the language/translation people introduced the tooltip—clearly unaware that the semantics of the reported numbers had altered—presumably since their efforts had commenced.
Mainly driven by the motivation of "making the simplest possible change", I have proposed at phab:T119740 (mentioned above) that the raw page counts be substituted for the percentage values, and the current tooltip layout be maintained. Kindly comment over there if you do not like this suggestion. AuFCL (talk) 05:58, 29 July 2016 (UTC)
@EncycloPetey, @Mukkakukaku, @Samwilson, @StevenJ81: (with apologies to anyone I missed) The new version (1.28.0-wmf.13) of mediawiki rolled into wikisource today, with the tooltip changing to display absolute page counts rather than percentages. This works for me but more importantly are any of you still witnessing anomalies? AuFCL (talk) 23:31, 3 August 2016 (UTC)
No change for me: I'm still not seeing any percentages or other values when I hover over the bar, just as before. --EncycloPetey (talk) 00:16, 4 August 2016 (UTC)
@EncycloPetey: At least it isn't doing anything unusual or bad from your perspective? As a complete aside I did a little searching (you use Firefox if I recall?) and uncovered a setting under about:config— which appears to control this exact behaviour. I happen to have it currently set to "true" and hover displays are enabled (I am assuming it is set to "false" on your set-up.) I am definitely not recommending you change it; merely advancing this as a possibility for the differing behaviour. AuFCL (talk) 00:39, 4 August 2016 (UTC)

Two index pages for 'N Rays'?[edit]

Hello all. Does anyone know what the situation is with "N" Rays? The top level page includes from Index:N rays - Garcin.djvu but the three existing subpages include from Index:"N" Rays (Garcin).djvu. I suspect the first of these Index pages can be deleted, as there is more progress made on the second of them (and I'm assuming they're the same thing; they look like they are). Thought I'd raise it here in case anyone knows what's up. :-) Thanks! Sam Wilson 09:28, 27 July 2016 (UTC)

@Samwilson: I would be checking for differences in editions either by years, or place of publication and therefore potential spelling differences. To me there is evidence of duplication through lack of identification of an existing version as you indicated. — billinghurst sDrewth 14:48, 28 July 2016 (UTC)
Thanks @Billinghurst; I shall do some more thorough checking and either make a new edition, or raise one of the Indexes for deletion. Sam Wilson 01:12, 29 July 2016 (UTC)
I've deleted the lesser-quality of the two (after discussion), and fixed up the mainspace page to point to that. Sam Wilson 00:45, 4 August 2016 (UTC)

Page numbering[edit]

Can we please have a page that gives ONE agreed standard for page numbering in Index pagelists? I've had concerns expressed on my user talk page, that a pagelist I provided in good faith wasn't the best example for new contributors. Across wikisource page numbering seems to vary depending on the contributors (and scan uploaders). There needs to be ONE standard that is enforced universally. ShakespeareFan00 (talk) 10:40, 28 July 2016 (UTC)

I checked Help:Page numbers and although it mentions using a hyphen for blanks it gives vague advice about how to handle pages (such as titles, front and rear matter adverts etc.) which aren't in a "known" sequence. The advice on what get labled with what should be considerably expanded. (Also based on past experience a note about avoiding the use of period i.e "." in page list items should be added.) ShakespeareFan00 (talk) 10:45, 28 July 2016 (UTC)
I thought the help page was pretty unambiguous: if the page has no numbering or other sequence indicator, "-" is usually used. That pretty much leaves is up to the editor's discretion. My rule of thumb generally is not to change the convention already being used, and to use a consistent numbering/labelling scheme when I'm the one setting the standard.
That being said, I'm sure we can propose some changes to the help page for clarity, along with some tips for the quirks of the pagelist widget. (Eg. Multiple words, symbols, etc.) -- Mukkakukaku (talk) 11:20, 28 July 2016 (UTC)
Here's an example of how I think we should label pages Index:The Story of the Iliad.djvu. For pages without numbering I use an en dash, I used to label Title, Contents etc. but I prefer to not anymore since I believe the page's number should be displayed instead. I think labeling contents, title etc can be helpful for transcribing purposes, but once finished I think it's best we use the actual page numbers, and blanks marked as blanks like the example I provided. I agree with ShakespeareFan, we should have one clear way of doing this so all works can be consistent with each other. I don't think this would be too difficult to do. Jpez (talk) 11:46, 28 July 2016 (UTC)
Page numbers wherever possible, as they reflect the work, and enable specific linking to anchored pages, and to the style that the work has used.

I use endashes as they are more clickable than (little) hyphens. I dislike the use of the labels [Image], etc. as I find them stating the bleeding obvious, and ultimately useless for anchors, and I will endash them. If others choose to use them, so be it, however when they become essays in themselves rather than neat labels, ugh! To the over-enthusiasm of some contributors to do index pages on works that others have uploaded, well, that is a conversation that has been had previously. We should all have awareness about boot-stomping through work that another person has initiated.

Re the desire for a binding rule for <pagelists>, no! We have always tried to express guidance and allow some reasoned variation. We should be specific about the purpose of page numbering and the outcomes that are achieved by properly doing this task and the navigation and referencing that it allows. — billinghurst sDrewth 14:42, 28 July 2016 (UTC)

I agree with this: guidance is better than binding rules. I, for example, will always use hyphens instead of dashes for empty pages, as they are easy to type and there is limited benefit in making an empty page link more clickable. I will also always label image plates with "Image" of "Img" or "Plate" to make it clear that they have no page number but also are not blank, and I dislike the use of dashes for this purpose. To each their own! So long as it works and is used in a way that corresponds with the purpose of the tool, there should be no problem nor any need for a one-method-fits-all approach. —Beleg Tâl (talk) 14:53, 28 July 2016 (UTC)
Although I have a set of preferences for numbering pages in an Index, other editors here do not agree with all of them. And in some cases, I have found that my own preferences are not necessarily the best solution for a particular work. There are myriad ways that publishers number pages, format content, and include inserted materials. Thus, we can offer guidance towards "best practices", but a "mandated standard" is less than desirable. --EncycloPetey (talk) 15:34, 28 July 2016 (UTC)
Further to the above, there are some general recommendations on Help:Index pages#Parameters in the section on the Pagelist tag. If it is felt that these should be linked to from Help:Page numbers, then do so. I find that having everything to do with filing out the Index: parameters together is useful for pointing new wikisourcerors to, rather than sending them to multiple help pages. Beeswaxcandle (talk) 20:01, 28 July 2016 (UTC)
feel free to renumber my indexes. i’m just happy when the page numbers are close. when the work is done the reader will not notice. Slowking4RAN's revenge 03:34, 29 July 2016 (UTC)

It might help to consolidate/merge Help:Beginner's guide to Index: files, Help:Index pages, and Help:Page numbers. Newbies can get lost in this maze. Outlier59 (talk) 01:25, 31 July 2016 (UTC)

The Beginner's Guide exists specifically to be a trim quick-and-dirty explanation for beginners. Merging everything with that would defeat that function. The other two items are seeking to accomplish different things, and I'm skeptical of merging the two for that reason. --EncycloPetey (talk) 00:18, 4 August 2016 (UTC)


Uploaded as a test page? It seems to be non-english in any event. ShakespeareFan00 (talk) 21:07, 29 July 2016 (UTC)

It appears to be a page of a Spanish translation (1956?) of Nicola Abbagnano (1901-1990)'s Storia della filosofia (1946–1950). I expect it's copyright encumbered. Prosody (talk) 00:19, 30 July 2016 (UTC)

Two new developments[edit]

In Bengali Wikisource, there are two new developments. One, in the main space, there is a bidirectional arrow beside every interwiki link on the left panel, if you click on the arrow, side-by-side view of the work in both wikisources can be seen. Two, a book image is visible beside the up-arrow (index file link) at the top of every transcluded page in the page namespace, clicking which takes one to the transcluded page of the main space. I think these two would be good for English Wikisource too. Hrishikes (talk) 08:07, 30 July 2016 (UTC)

The birdirectional arrow is part of enWS and has been for years, it is mw:Extension:DoubleWikibillinghurst sDrewth 12:14, 30 July 2016 (UTC)
Please see Gitanjali and its Bengali version, bn:গীতাঞ্জলি. The arrow is visible in bn WS and not en WS. At least it is that way for me. The arrow is also visible in French version of the work, but not in Telugu and Chinese. Hrishikes (talk) 12:27, 30 July 2016 (UTC)
Those links are only temporary. The cross-Wikisource links only exist because the Wikisource links at Wikidata have been incorrectly placed all on a single data item for the original work. Translations have different publication data, and are in a different language, and are therefore supposed to have separate data items from the original work. So the reason these links are rare at English Wikisource is that we have added the data items at Wikidata correctly. Once the links at Wikidata have been "corrected", you will no longer be able to get side-by-side texts. This is an unfortunate, but necessary, issue of using Wikidata to support interwiki links for Wikisource. --EncycloPetey (talk) 17:42, 30 July 2016 (UTC)
I had mentioned two developments, but the discussion is focussing on one of them only. Anyway, I don't think datalinking is the explanation here. If the Bengali and English version are shown side-by-side at Bengali Wikisource but the same two items are not shown that way at en WS, that means it cannot be explained by Wikidata only. Because of course all the five versions are linked, that's why side-by-side view is possible in bn and fr WS, but the same is not possible at en, te and zh. That means the facility is not enabled at those three wikisources. For the second development, see this page, I have chosen an image page, so that knowledge of Bengali script is not necessary. I don't know the phabricator link for this, but it seems useful. Hrishikes (talk) 18:37, 30 July 2016 (UTC)
I didn't say that datalinking was the explanation for the current absence of arrows at some Wikisource projects. What I pointed out was that, if the data items are set correctly at Wikidata, then there won't be any side-by-side links from any Wikisource because none of them will be wikilinked. So the availability of side-by-side pages from those projects that have it is only temporary. --EncycloPetey (talk) 19:00, 30 July 2016 (UTC)
For this tool, only the interwiki menu at the left pane matters. It can be arranged either through Wikidata or locally. See mul:कपालकुण्डला, both Bengali and English versions can be seen side-by-side, without it being linked at Wikidata. Hrishikes (talk) 04:20, 31 July 2016 (UTC)
While this is true, it is terribly inconvenient. It requires the sites to link works manually, and to do this individually for every single mainspace page (chapter) in the work, assuming of course that the two works are transcluded in the same format on both language sites. We really need a cleaner way to handle this. --EncycloPetey (talk) 04:57, 31 July 2016 (UTC)
It would seem that the doublewiki extension is functioning only when the links are provided by Wikidata, and when they are direct interwikis they fail. This is in contravention of the requirements for WS interwiki. A bug should be raised in phabricator: to get that reversed. At wikidata, noting that editions should not be directly linked on the book page, each edition should be linked from its own edition page. Wikidata has pretty well given up on us and linking, their design hasn't worked for out interwikis. — billinghurst sDrewth 05:16, 31 July 2016 (UTC)
This is not so. In wikisources where it is enabled, the tool works whether the items are linked directly or through Wikidata. See examples above. Hrishikes (talk) 05:41, 31 July 2016 (UTC)
Interesting. Anyway, it is enabled here and has been for years, you can see it in Special:Version. Maybe someone is jiggered it in our css, as it used to work fine. — billinghurst sDrewth 12:55, 31 July 2016 (UTC)
Inclusion in the version page does not indicate enabling. The url shortener is also mentioned there. Is short url present for any page here? Hrishikes (talk) 17:41, 31 July 2016 (UTC)
The addition of a link back to the NS0 work page is very useful. It'd be nice to have here. How is the link destination being determined? Does the script look at the link in the #ws-title element on the Index page, and use the link found there? Sam Wilson 03:30, 31 July 2016 (UTC)
I am not aware how it is arranged. You may ask Bodhisattwa, you recently met him in Italy, I think. However, if there is a sectioning in the page namespace and both sections have been transcluded, then two book icons are displayed at the top, like here. Hrishikes (talk) 04:04, 31 July 2016 (UTC)
Ah, so it is perhaps everything from 'what links here' from NS0 (e.g. from your example). That's a good idea! Sam Wilson 06:47, 31 July 2016 (UTC)
There is also another. See the bnWs Main Page, all items in the New Texts section have epub/mobi/pdf download options alongside them. Good things from other wikisources are being deployed there. Hrishikes (talk) 07:05, 31 July 2016 (UTC)
Please don't experiment on the main page. If you want to play, please do it in a sandbox. FWIW that version was very much OUCH. Way too busy. I am also unsure why you would need to offer to download all those works from the main page, it would very much detract from the traffic that you would get into your site, and instead you would just rack up numbers on WSEXPORT and nothing for your wiki. — billinghurst sDrewth 12:57, 31 July 2016 (UTC)
Nonetheless it was an interesting demonstration—however briefly presented. Looked very professional under Vector. Why OUCH? What did it do to Monobook, and could that be fixed as well? AuFCL (talk) 17:17, 31 July 2016 (UTC)
Professional? It looked like an overcrowded mess. --EncycloPetey (talk) 17:46, 31 July 2016 (UTC)
We have sandboxes for the purpose of demonstrating changes, and structural changes to the main page should be always be tested and discussed prior to implementation. For those who want to play please use the existing Main page/sandbox, and feel free to build a sandbox and testcases for Template:New texts, and if you don't know how, please ask for assistance. The trialled version of the template if wanted to be viewed is this revision. — billinghurst sDrewth 23:13, 31 July 2016 (UTC)
@Samwilson, @Hrishikes: there is a js script mul:MediaWiki:TranscludedIn.js that is used to show tabs, in namespace Page, pointing to the texts that transclude that page in NS0. Zdzislaw (talk) 20:28, 31 July 2016 (UTC)

Is there a point to WikiProjects?[edit]

I was thinking about creating a WikiProject, and then I thought: with so few consistently active contributors (myself included among the not-so-consistent), are WikiProjects even worth the effort?

(I was thinking about making a WikiProject about aviation, since I've been doing a lot of work on accident reports (CAB, NTSB, etc.). But I don't want to bother if the community doesn't think it's worth the effort to make the project. I'll happily toodle along and do it piecemeal.)

-- Mukkakukaku (talk) 17:19, 1 August 2016 (UTC)

WikiProjects would be much more effective here if we could coordinate activities with members of some of the very active WikiProjects on Wikipedia (such as w:Wikipedia:WikiProject Aviation). We could certainly pull some of their numbers over for a bit for help in preparing source documents relevant to the field. BD2412 T 18:08, 1 August 2016 (UTC)
True, this is a possibility. But it also has the potential for attracting lots of one-time contributors rather than consistent users. I could easily see posting a "hey come help us work on {topic} over at wikisource" sort of message on a project talk. I would then expect to see a brief spike of interest for up to a week, and then it would taper off. If we were lucky, we'd end up with one or two contributors who might hang around in the long run.
(Ironically enough, I came here the opposite way, as a part of w:Wikipedia:WikiProject Aviation's accidents task force and decided that this would be a great place to centrally archive accident reports for use in writing articles.)
That being said, I'd be happy to create an aviation project and then try recruiting on the english Wikipedia, as an experiment of source. I really just didn't want to create the WikiProject and have it turn into yet another bit of disused meta content. --Mukkakukaku (talk) 18:57, 1 August 2016 (UTC)
The majority of our WikiProjects have been focused on a single multi-volume work, so that consistency can be co-ordinated across all the volumes. The most successful one has been the DNB. Consistency of look isn't so important for a topic, so I wonder if topic based work would be better co-ordinated through a portal, rather than a wikiproject. Beeswaxcandle (talk) 08:53, 2 August 2016 (UTC)
As it says here Wikisource:WikiProject A WikiProject is a collection of pages devoted to co-ordinating long-term tasks on Wikisource. I think WikiProjects is a great tool for large works, or for long term projects, for example to transcribe many works on a single subject and have all indexes gathered in one place. I'm thinking this could take many many years to completer in some cases, if ever. Through the years the wikiproject will be there to have all the needed information gathered in one place for future and current users. Jpez (talk) 09:33, 2 August 2016 (UTC)
Is it generally accepted that Portals can sometimes have lists of works that are in progress? I've sometimes found that to be helpful, when trying to figure out what's missing and what needs to be worked on. An extreme example that I recently created would be Portal:Western_Australia#Works_relating_to_Noongar_language. It's a sort of portal and wikiproject in one. Sam Wilson 09:58, 2 August 2016 (UTC)
What I'm looking to set up is a collaborative space to coordinate work on aviation topics. These topics would include accident reports (which I've been working on so far), court cases/lawsuits, government regulations, aircraft certifications, research, news articles, and so forth. This collaborative space would allow people interested in aviation to identify, locate, and collaborate on documents of interest, or critically important documents that are missing from the archive.
The reason that I think a portal is not the right answer, is that portals are currently being organized via the Library of Congress classification system, and there's nowhere within that system for a single aviation topic. There's "naval aviation" under the naval science category, "aeronautics" and "astronautics" under technology, "airlines" and "air transportation" are under "social sciences", government laws and regulations are under their respective categories, the investigative/administrative agencies (FAA, NTSB, CAB, AAIB, ICAO, etc.) are under politics, the accidents themselves are probably under history -- or maybe under the agency that investigated them putting them in the politics hierarchy. (That, and portals are really underutilized and not very visible.)
This brought wikiprojects to mind, since this is what we traditionally used for this purpose in the English language wikipedia. But if that's not what they're used for here, or not how we'd like for them to be used for here, then I'm open to suggestions. i just don't think portals are the way to go this situation, especially given how they're used today. Mukkakukaku (talk) 20:17, 2 August 2016 (UTC)

Tech News: 2016-31[edit]

21:48, 1 August 2016 (UTC)

Over 100 discussions in this list[edit]

If anyone else is having trouble working through this unwieldy 100+ list of topics, please discuss at Wikisource talk:Scriptorium. -- Outlier59 (talk) 01:18, 2 August 2016 (UTC)

Encountering delays from Wikisource when trying to edit[edit]

Anyone else seeing Wikisource response delays when trying to edit? I've been hitting the "submit" button multiple times to get a response for the past couple of days. Outlier59 (talk) 23:38, 3 August 2016 (UTC)

I haven't had any such problems, nor have I noticed anything unusual. --EncycloPetey (talk) 00:19, 4 August 2016 (UTC)
Apparently it was summer thunderstorms in my geographical area that messed up the ISP. Storms expected throughout August. Sorry to bother you about it. -- Outlier59 (talk) 01:06, 13 August 2016 (UTC)


This is nearly complete, If someone's prepared to do the last 3 pages of the catalouge at the back it can be marked for validation. ShakespeareFan00 (talk) 18:45, 6 August 2016 (UTC)


Reading ShakespeareFan00's comment, I have a question. Help:Index pages#Parameters says that for done/validated status, "completion of any advertisement pages is optional"; for proofread/to be validated, it says "all text pages have been proofread at least once". If as SF00 implies, advertising pages must be proofread for the work to be proofread, this is inconsistent with validation and seems silly to me. Can someone clear up the confusion? BethNaught (talk) 18:53, 6 August 2016 (UTC)

If the adverts are optional. Then this one can be progressed:) Nice to get the catalogue as well though ShakespeareFan00 (talk) 19:19, 6 August 2016 (UTC)
Adverts have always been optional here. They are not a part of the work and are not required to be completed before marking an Index: "To be Validated" or "Done". Beeswaxcandle (talk) 23:42, 6 August 2016 (UTC)
And I have just completed the 'advert' entries, you have a complete cataloge for Methun c. 1907. (which by virtue of it's authors being unknown staff of the publishers is also PD.). Not that wikisource collects publisher ephemera. ;) ShakespeareFan00 (talk)
Actually, it's not complete from a Wikisource perspective. There are no author or work links and the layout is unfriendly. Beeswaxcandle (talk) 23:52, 6 August 2016 (UTC)
@BethNaught: The overarching status of not proofread > proofread > validated relates to the work proper, "the author's work" not the edition, which is publisher's work. The argument will be buried in the archives, and I think about 2009. To keep track of works that have advertising and to know the status it generally falls to Category:not transcluded, Category:advertising not transcluded, and Category:fully transcluded. The front and end matter of editions are of historical and informational interest, so we have an interest in having them transcribed, and the better their page status, the more useful, and the ability to transclude. So SF00 misspoke with their statement; not a biggie in the scheme of things. — billinghurst sDrewth 02:05, 7 August 2016 (UTC)
You mean author/work links from the catalogue section? OK I'll concede on that. Also the layout used in that section (i.e plainlist) was the best I could think of. What would you suggest would be a better layout for it? ShakespeareFan00 (talk) 08:47, 7 August 2016 (UTC)

No transclusion[edit]

Index:Herodotus and the Empires of the East.djvu Added in the TOC but seeing no transclusion at all. Suggestions? ShakespeareFan00 (talk) 09:13, 7 August 2016 (UTC)

I used what the help said to use in respect of the pages tag. ShakespeareFan00 (talk) 09:13, 7 August 2016 (UTC)
Use direct transclusion from Page: space. As far as I know <pages> never works inside an Index: page—even if it did it would be re-entrantly building the Index: from itself, and mediawiki has never been too hot on recursion. AuFCL (talk) 10:14, 7 August 2016 (UTC)
Yup. Just re-checked: I thought I remembered seeing this somewhere. The ProofreadPage extension explicitly kills parser recognition of the <pages> tag inside Index: and Page: name spaces. If you really want the reference they are lines 42–48 of .../ProofreadPage/includes/Parser/PagesTagParser.php:
42 		// abort if the tag is on an index or a page page
43 		if (
44 			$pageTitle->inNamespace( $this->context->getIndexNamespaceId() ) ||
45 			$pageTitle->inNamespace( $this->context->getPageNamespaceId() )
46 		) {
47 			return '';
48 		}
AuFCL (talk) 10:25, 7 August 2016 (UTC)

Alert: Minor change made to "Mediawiki:PageNumbers.js"[edit]

Hi to all. A display issue of left hand side page numbers was recently identified (AuFCL) where titles with a question mark (?) in the title was corrupting the link from main namespace to Page: namespace. Hesperian has made a code change that seems to resolve this issue. We ask that all users report any problems that they have with page number links, and ask that a little testing be done where the Index:/Page: pages of a work have characters that are not alphanumeric. Thanks. — billinghurst sDrewth 04:14, 8 August 2016 (UTC)

Index:Faoistin naoṁ-Ṗadraig (1906).djvu[edit]

Querying if given the multiple languages this might be better suited to mulWS than here? ShakespeareFan00 (talk) 10:06, 8 August 2016 (UTC)

Symbol support vote.svg Support Outlier59 (talk) 00:35, 10 August 2016 (UTC)
Symbol oppose vote.svg Oppose There's English there that's worth hosting on its own here. Why wouldn't we want a copy of the Confession of Saint Patrick here?--Prosfilaes (talk) 08:40, 18 August 2016 (UTC)
Symbol oppose vote.svg Oppose also. The three languages are easily separated and hosted on each own WS. —Beleg Tâl (talk) 13:50, 18 August 2016 (UTC)

Tech News: 2016-32[edit]

15:40, 8 August 2016 (UTC)

Should existing text-only (no scan) works be retained in addition to scan-backed works?[edit]

I've run into this situation a number of times and am curious as to what the consensus is. The situation is as follows:

  1. There exists on enWS, in the main namespace, the text of a work, added in the early(-ier) days of the site (2008-ish). The talk page reveals that the source of the text is Project Gutenburg or similar.
  2. There is no indication, either within the Project Gutenburg or the work, of the publisher, year, or so on.
  3. A scan is located of the same general work. It is not the exact same text as the already existant work, so it is not a candidate for Match & Split.

In this scenario, should the newly transcribed work replace the Project Gutenburg/no-backing-scan-version, or be retained in addition to the other, with disambiguation provided via the {{versions}} template?

In the case of an exact version match, with identical text, I have in the past gone the Match-and-Split route. (Eg. with King Solomon's Mines.) But in this case I'm looking at now, though the other version currently on enWS purports to be from the same year as the scan I'm working on, I've found a number of textual and typographical differences in the texts. (Which is to say, though the version on enWS purports to be from 1905, my actual 1905 scan is different, so I find myself doubting the "provenance" of the existing version.)

--Mukkakukaku (talk) 18:36, 8 August 2016 (UTC)

  • @Mukkakukaku: I am not a regular on Wikisource so I do not know practice here. Based on my other wiki-experience, I would say to mark the text-only version as deprecated and let the scan-backed work take its place as the authoritative version. Since there is no space limit in Wikimedia projects, there is no reason to delete the text-only one until and unless there is some supporting evidence that it is an actual bad copy, and not a correct copy of some alternate version. I would not oppose the text-only version be deleted for a reason, but if no one offers a reason to delete it then keep it in limbo as an alternate version that is named in a way that makes it unlikely to be found. Categorize it as an alternate version, and link to it somehow in the documentation notes of the alternative version. The mistake to avoid is losing a transcription of an alternate version, until and unless someone makes an argument with evidence that it was totally a mistake. There are no space limits in wiki projects. Blue Rasberry (talk) 20:56, 8 August 2016 (UTC)

Hi Mukkakukaku,
  • If the text-only version is definitely the same text as the scan-backed version, then replace the text-only version with the scan-backed version.
  • If the text-only version is definitely a different text from the scan-backed version, then keep the text-only version in addition to the scan-backed version.
  • If you cannot figure out whether the text-only version differs from the scan-backed version, because no source has been given and there's no apparent textual evidence, then we replace the text-only version with the scan-backed version, on the grounds that there's no point keeping two copies of something if we can't even tell if they are the same or not.
Hesperian 00:51, 9 August 2016 (UTC)
To note that if it is a Gutenberg text, often the earlier version were not specific on the edition, or at least not outwardly identifed. I believe that @Prosfilaes: has access to backroom data and can identify some editions with that data. — billinghurst sDrewth 02:35, 9 August 2016 (UTC)
I think Hesperian's guidance is best. Blue Rasberry (talk) 14:24, 9 August 2016 (UTC)

book upload - public domain reprint - redacting newer content[edit]

When public domain and copyrighted text are mixed in a book which will be uploaded to Commons, how should copyrighted text be addressed? Is blacking it out a recommended option?

(This is a cross-post from Commons:Commons:Village_pump/Copyright#book_upload_-_public_domain_reprint_-_redacting_newer_content)

My question is actually for a case for Hindi Wikisource but I speak English and wanted to post here.

There is a famous poetry text from India from about year 1300. The text is in the public domain. On paper it might be 100 pages. It has been reprinted numerous times. There is a reprint from 2015 of the original text without translation. The reprint begins with copyright information and a preface, as is common with reprints of old texts. Following that, the old public domain text is printed verbatim, except that there are footnotes explaining the meaning of the many archaic terms. The book ends with indexing.

It is proposed that the public domain old text be uploaded to Commons then imported to Wikisource. The challenge to address is the procedure for removing copyrighted additions of text to the newly reprinted material. Is anyone aware of any example of a precedent for this?

One way to do this might be to scan the entire book, including both copyrighted and public domain parts. At this point, all contemporary additions to the original text might be blacked out, so that there is accounting for all pages in the reprint but the copyrighted parts are obviously removed. Another option could be to avoid putting the scanned pages in Commons at all, and instead, do the transcription of text from the book to a new digital file which serves as the master in Commons and Wikisource. So far as I understand, Commons and Wikisource prefer to have source documents whenever possible, including scans of documents which are transcribed for use in Wikisource.

Can anyone say whether they have or have not seen any such instance of this sort of book in Commons? Does anyone have an idea or preference for how uploading a book of this sort should be done? Is anyone aware of any book in Wikisource with redacted copyrighted portions? Blue Rasberry (talk) 20:50, 8 August 2016 (UTC)

Uh... You should type it into a digital file and only put the public domain contents there.Wetitpig0 (talk) 10:50, 12 August 2016 (UTC)


Whatamidoing (WMF) (talk) 18:03, 9 August 2016 (UTC)

Copyright problems which may arise from secondary sources.[edit]

Sometimes, the source text of Wikisource is from a secondary source (i.e. a source from other sources)

If the secondary source from which the text in Wikisource is copied infringes the copyright of other sources, and falsely declares the work to be free work, then will we be prosecuted for infringing copyright?

Also, if the copyright status of the source of the secondary source is unclear, while the secondary source declares it to be free work, should we copy it?


Wetitpig0 (talk) 10:44, 12 August 2016 (UTC)

If there's a work that we want to host, and it is clearly a free work, then we can host it here regardless of where you copy it from, though the usual method of transcluding scans of free books is hugely preferred to copy-pasting from other websites that don't list their sources. If you are unsure of the copyright status, post it at WS:Copyright discussions and the community can evaluate it on a case-by-case basis. —Beleg Tâl (talk) 13:04, 12 August 2016 (UTC)
i have never seen someone sued under this fact pattern, but i have seen DMCA takedowns of FoP German sculptures, so i would say that is a higher probability. if the secondary source says something, you should be able to make a determination at LOC copyright for orphan works, or translated works. we have outsourced the copyright paranoia to commons. Slowking4RAN's revenge 20:33, 13 August 2016 (UTC)
@Slowking4:, perhaps we should "outsource..." as you suggest, but we haven't done so. I'll leave your characterization aside, because I get what you're saying. If we want to address this issue, we should simply establish a clear policy on what kinds of things we permit under fair use, and what we don't. There is a clear framework for doing so: meta:Licensing_policy_FAQ_draft#Unfree_content_not_under_an_.27exemption_doctrine_policy.27
As it is, though, we host copyrighted material on English Wikisource, in clear violation of Wikimedia Foundation policy. -Pete (talk) 20:57, 13 August 2016 (UTC)
The onus is on us to be sure that everything we host at enWS is copyright-free—whether that be because the copyright has expired or because the work was published without copyright. We should not depend on other (non-MediaWiki) sites to have done the work for us. They get it wrong often enough for us to be cautious. @Peteforsyth:, please either point us to the copyrighted material we are hosting or post at WS:COPYVIO so that we can deal with it. Anything that is in clear violation will be removed immediately. If it is unclear we can look at it and discuss. Beeswaxcandle (talk) 00:12, 14 August 2016 (UTC)
@Beeswaxcandle:, my mistake -- and I regret making a claim that's both inaccurate and provocative. I misremembered the outcome of some books I have transcribed, including one with a few pages like this one: Page:A Basic Guide to Open Educational Resources.pdf/78 I am of the opinion that we should fully host works like this, which are freely licensed on the whole, but which contain a few non-free images (ironically, in this case, to illustrate the difference between free licensing and copyright in an effort to advocate free licensing). But that would require establishing an Exemption Doctrine Policy for English Wikisource, as linked above. -Pete (talk) 00:44, 14 August 2016 (UTC)
you are changing the subject. the new editor, was struggling with the mein kampf translation copyright, but your case of commons image deletions, proves my point. if you want to make a WS edp, go for it. Slowking4RAN's revenge 11:44, 14 August 2016 (UTC)
Sure. I misunderstood one thing, and misremembered another. I will gladly concede that I have added nothing of value to this particular thread! -Pete (talk) 01:32, 15 August 2016 (UTC)
oh no, if we get fair use on WS that will be valuable. the risk of suits is so small only a commons admin could calculate it, and the usefulness of images outweighs the downstream reuse restrictions. (and Gutenberg australia has jumped the gun on the english translation of mein kampf, when the translator died in 1946, see you on January 1). Slowking4RAN's revenge 02:33, 15 August 2016 (UTC)
For what it's worth, here's a handy link to my previous suggestion of an EDP: Wikisource:Scriptorium/Archives/2014-05#Dealing_with_non-free_images_in_transcriptions_of_freely_licensed_works -Pete (talk) 02:58, 15 August 2016 (UTC)
and i still support as much fair use as the community will allow. but i didn’t see much interest. Slowking4RAN's revenge 03:02, 18 August 2016 (UTC)
The translator died in 1946 which is before 1955 (cf. w:Copyright law of Australia), so what's the problem?--Prosfilaes (talk) 07:53, 15 August 2016 (UTC)
well the translator is Irish and work published in London in 1939, did they reprint it in Australia? maybe we should upload in commons under the 50 rubric and see what happens? Slowking4RAN's revenge 02:59, 18 August 2016 (UTC)
It doesn't matter whether they reprinted it in Australia; Australians are bound by the laws of Australia. The Wikimedia Foundation is bound by the law of the US, and Commons is bound by the additional rules the community created. Gutenberg Australia has every right to upload it to their servers, even when it's not legal for the WMF to host it on theirs.--Prosfilaes (talk) 03:17, 18 August 2016 (UTC)
i stand corrected about the 50, i’m confused about the not being retroactive. however for the US it is 70 so see you in January. Slowking4RAN's revenge 00:09, 19 August 2016 (UTC)
In the US, it's complicated, but for works published 1923-1978 (copyrighted&renewed or URAA-restored) it's 95 years from publication. So works published in 1939 will be out of copyright in the US in 2035.--Prosfilaes (talk) 20:21, 19 August 2016 (UTC)
if an australian wants to upload a local copy here, what do you say? sorry go to Gutenberg? ready for an EDP yet? Slowking4RAN's revenge 22:42, 19 August 2016 (UTC)
Yes, go host it on an Australian site or Wikilivres. We don't have an option; the WMF is bound by US law. An EDP is irrelevant; at best it could let us host government reports and other PD material that makes use of fair use of still copyrighted material, not straight out copy copyrighted works.--Prosfilaes (talk) 09:30, 20 August 2016 (UTC)
i would go for a "fair use" of the lesser term, but i take it there is no consensus. derivative first use by governments, does not seem much of a distinction to me. the library of congress has many copyrighted works online. Slowking4RAN's revenge 16:07, 20 August 2016 (UTC)
That's not fair use; that's just illegal. What the LoC can and does do may not have much relation to what we can do. But there are things like NTSB accident reports wherein the copyrighted material used is likely fair use in context.--Prosfilaes (talk) 09:49, 21 August 2016 (UTC)
no - scholarly use, with a fair use downstream restriction is legal. we stand like the LOC (or internet archive) as a library for scholarly use. your fair use minimization has little basis in law. it is an ideology, apart from the clear law of the 4 factor test. we could very easily put a NC on a work, but we choose not to do so for ideological reasons.
i take it you would restore the images here [34] care to finalize an EDP. Slowking4RAN's revenge 12:31, 21 August 2016 (UTC)
What's your source for "scholarly use, with a fair use downstream restriction"? The 4 factor test is pretty clear here; this is a non-transformative use that copies the entire work and replaces the original work in commercial use. We lose pretty solidly on all points. This opinion on a lawsuit versus HathiTrust is quite lengthy, going into details like who had access to the servers and how the backup tapes were encrypted, if all HathiTrust needed to say was "we're a library".--Prosfilaes (talk) 11:40, 22 August 2016 (UTC)
when we upload as fair use, that is a NC restricting commercial reuse downstream. we can limit downstream use if we choose. do you really want to side with author’s guild? my impression is Hathi Trust won in the end. w:Authors Guild, Inc. v. HathiTrust the court found their use to be fair use even if full copy and non-transformative. the fact that there is already a PD australia version online, means the commercial harm is small for a book that has an e-book for sale for $.99.[35] btw, there are multiple copies of both translations at internet archive, which have not been taken down. in effect we are only cleaning up the OCR of the copies there. in effect the internet makes a "PD of the lesser term": electronic works anywhere are available everywhere for pennies. the main reason for us to transcribe is to make available for wikipedia zero.Slowking4RAN's revenge 23:02, 22 August 2016 (UTC)
We, as in the volunteers at Wikisource, can not limit downstream use if we choose. Wikimedia does not permit NC restrictions on material without narrowly carved fair use exceptions.
I linked the opinion in Authors Guild v. HathiTrust; the court found their use to be fair use because it was transformative. The Internet frequently makes electronic copies of works available everywhere before they're officially released; that's not an argument for anything. If you want to clean up the OCR of copies available under Canadian law (life+50), Wikilivres is a perfectly legal option.--Prosfilaes (talk) 17:36, 23 August 2016 (UTC)
we volunteers can certainly adopt an EDP that allows fair use and CC-by-NC. i have 30 times your edits here - where are the 29 other people you will recruit, when i decide to take a vacation over at  ?? Slowking4RAN's revenge 02:14, 27 August 2016 (UTC)

IA upload tool is making djvu[edit]

Internet Archive has stopped making djvu. So IA upload tool is now converting the pdf file to djvu while transferring it to Commons. This is early stage yet, currently some djvu files are coming out as blank files. Bugs have been filed with Phabricator and Github, and hopefully the matter will be resolved soon. This is for general information. Hrishikes (talk) 11:32, 15 August 2016 (UTC)

Very big thanks for such good news and to all the contributors who build such tools! --Zyephyrus (talk) 01:29, 16 August 2016 (UTC)
Yes, thanks for the info. I have wondered about the pros and cons of PDF vs. DjVu for some time; it's my understanding that DjVu is better at compression, and is an open format (though there are some patents involved). With that in mind, I'm surprised and sad to learn that Internet Archive has stopped supporting the format. I found the announcement with their reasoning. I wonder if the issues they were having are related to the ones WM is now experiencing. Do you have links for the Phabricator or Github bugs? -Pete (talk) 02:38, 16 August 2016 (UTC) and Hrishikes (talk) 05:31, 16 August 2016 (UTC)
Don't forget that in the PDF-> Djvu, some scans will need to be 'flattened' first owing to more recent PDF format versions allowing for layers. In some scans which are ex google, There's been some clever stufff with portions of the scans being split up onto different layers, (presumably as a copyright trap, even on nominally PD works (sigh) ). I've encountered this with some IA djvus which seem to have been downconverted from Google Books PDF.

Another issue I've found with some PDF's is mis-aligned scan/content box outlines meaning that a strightforward page extraction clips pages in the wrong place.

These are things to bear in mind when writing the convertor. ShakespeareFan00 (talk) 10:10, 16 August 2016 (UTC)

is it user:nemo bis and user:tpt who did it? we should really send them an honorarium. or wikisource t-shirt. Slowking4RAN's revenge 00:18, 19 August 2016 (UTC)

Tech News: 2016-33[edit]

19:37, 15 August 2016 (UTC)

Braces formatting help, please[edit]

Please see the top area needing braces Page:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu/567. I cannot find any help pages or examples to tell me how to get the braces there, while also getting No. 1982 to the right of the braces. There is a similar, but somewhat different, situation on Page:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu/680. Can anyone guide me, please? Maile66 (talk) 21:39, 15 August 2016 (UTC)

I'm not sure why the whole thing is wrapped in a blockquote. But Help:Templates talks about how to use the {{brace}} template -- did you look at that? Mukkakukaku (talk) 22:28, 15 August 2016 (UTC)
Well, it's wrapped up in a blockquote, because it's a quote within a speech Houston was giving. However, I started with the Help page you have linked, because there are perhaps 20-30 pages in this book that I successfully coded. It's just this particular type, where there is something centered to the right of the braces that has thrown me. I've actually been looking at it since yesterday. And I still can't figure out how to arrive at what needs to be done. The other one I linked are two braced text side by side, one on the left, and one on the right. I can't understand how to do that , either. Maile66 (talk) 22:38, 15 August 2016 (UTC)
Looks like User:AuFCL offered a solution for Page:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu/567 on the page. Outlier59 (talk) 23:05, 15 August 2016 (UTC)
I did the two pages in entirely different styles. Choose whichever (or neither as you see fit!) you like best. AuFCL (talk) 23:08, 15 August 2016 (UTC)
User:AuFCL Wow! You were fast! Thank you so much. Maile66 (talk) 23:26, 15 August 2016 (UTC)

Disambiguation and Wikidata items — a conundrum with no perfect solution[edit]

Something that the community may wish to consider.

An item at Wikidata has one link per wiki to an item, and this restriction applies to items that are disambiguation pages. As such when we link a disambiguation page to a Wikidata item, we are having to make a determination whether this will be a main namespace disambig, an author namespace disambig, or another disambig, maybe in Portal: ns.

Example. We have Author:William Hutton and William Hutton both of which are disambiguation pages, and at Wikidata the disambiguation item is d:Q16213077. At the moment we link to the main ns page.

Which of these do we think is the preferred page to link? I prefer that we linked the Author: ns (disambiguation) pages, and main reason is that we are linking people to people (or at least people analogues), whereas the main ns. is a work about a person, rather than the person themself. Plus the main ns page can be a title about a fiction work, and then we have difficulties of separating fact and fiction.

Of course, we could challenge the concept of disambiguation pages in separate namespaces which has some merits, though would also challenge the concepts of namespace separations. Anyway, thoughts would be appreciated. — billinghurst sDrewth 07:10, 18 August 2016 (UTC)

I'm inclined to think this issue highlights a design flaw on our own part. A disambiguation page should disambiguate all of the relevant meanings of a search term, and a search term should have not more than one disambiguation page. We shouldn't have separate disambiguation pages for works, authors... portals?... etc. Pages like Author:William Hutton shouldn't exist. Disambiguation pages should exist only in the main namespace, and all disambiguation pages in other namespaces should be merged/moved to main. Hesperian 07:49, 18 August 2016 (UTC)
@Library Guy, @Charles Matthews: worthwhile waving this under your noses as you have experience in the place and people space. — billinghurst sDrewth 10:02, 18 August 2016 (UTC)
I generally agree with User:Hesperian. Well, qualifying that, I would use a cross-namespace redirect of Author:William Hutton to William Hutton, rather than deleting it. Charles Matthews (talk) 10:13, 18 August 2016 (UTC)
... which would require a change of policy as these currently fall into speedy deletion space. That said that we have been leaving such for a period before having them as dated soft redirects. — billinghurst sDrewth 13:34, 18 August 2016 (UTC)
I agree with User:Hesperian. I would rather not have the cross-namespace redirect, and author pages should correspond to actual authors, or redirects to such. I have thought it useful to have redirects from variations on an author's name to that author's page, but sometimes the name used on a work may be used by more than one of our authors. When the latter is the case, then the redirect should no longer be used, and a direct link from the work to the author page should be used, preserving the name used on the work as a label for the link. Library Guy (talk) 16:22, 18 August 2016 (UTC)
I disagree with the above. I think both disambiguation pages are appropriate as they serve different purposes. Different authors are disambiguated in author space, articles about those authors (or encyclopedia articles) with the same/similar titles should be disambiguated is mainspace, so both disambiguation pages (appropriately) should exist as they perform similar yet different ourposes. And as billinghurst originally suggested, a people-to-authorspace redirdct is appropriate.
Unless there's some nuance to this discussion that I'm missing. --Mukkakukaku (talk) 17:40, 18 August 2016 (UTC)

Encoding problem in archive[edit]

Hi enWS, when i digged into the archive i saw that in this discussion the original wikilink de:Seite:Ludwig Bechstein - Thüringer Sagenbuch - Erster Band.pdf/19 is broken due the broken German umlaut. I guess that this is a general problem. Regards, ---Aschroet (talk) 08:09, 18 August 2016 (UTC)

Classics Illustrated[edit]

Did any of these get renewed? And if not does have any scans? ShakespeareFan00 (talk) 19:47, 19 August 2016 (UTC)

Gilberton Company, seems to be renewed [46] Frawley has 3 renewed [47]. Elliot Publishing Co. will have to look at print copy. Slowking4RAN's revenge 02:06, 27 August 2016 (UTC)

Proposal: replace {{translation redirect}} with substituted {{dated soft redirect}}[edit]

We have had the translation redirect template in place for a couple of years, and I still don't find it a very functional template. It currently sits as a permanent cross namespace redirect that just says "moved" and has a generic link. I propose that we do as we do elsewhere with moves and replace these with a substituted dated soft redirect. This will allow TalBot process to go about its clean up business and remove the links. It also gives a clearer link what is the page, and that is far better in google cache then the current config. — billinghurst sDrewth 11:09, 20 August 2016 (UTC)

Image required for On the Vatican Library of Sixtus IV[edit]

Hi. Just checking my previous work as I continue on my wikidata data completion trek. I have a poor scan on an page at Page:On the Vatican Library of Sixtus IV.djvu/59 and I am wondering whether anyone can see an available scan at or at HathiTrust for that page. Thanks for anyone who looks.

As a note while I have attention, do remember that for proofread or validated texts that you can add that transcription status to the link by clicking on the grey keyhole like image when editing the link and choosing the corresponding status —not proofread/proofread/validated) — it is still manual at this time. <shrug> — billinghurst sDrewth 00:21, 22 August 2016 (UTC)

Could not find a full-page image online. I put in an ILL request for a scan with my local library; hopefully a copy is available somewhere. It may take some time. Londonjackbooks (talk) 01:02, 22 August 2016 (UTC)
I have absolutely no idea where they originally found it but I am pretty sure this is a copy of the missing image c/o Project Gutenburg. Technically this came out of another John Willis Clark work: it is "Figure 98." from "The Care of Books" (project 26378.)
On further examination it looks like this might be the "rawest" page scan (in colour no less!) AuFCL (talk) 07:14, 22 August 2016 (UTC)
As an aside there is something screwy revealed by Author:John Willis Clark. According to IA "The Care of Books" was published in 1909 but the title page reads 1901 and the end of the Preface is dated September 23rd, 1901. Manuscript on a dusty shelf a long time? (Maybe the floor plan of Cortile del Pagagllo is more original there than in "On the Vatican Library of Sixtus IV" after all?) AuFCL (talk) 08:37, 22 August 2016 (UTC)
Thanks for the image links. The "Care of Books" may have just had later editions or republications, with 1909 being the edition scanned. Or equally possible is that IA has made a mistake with their dates. — billinghurst sDrewth 10:26, 22 August 2016 (UTC)
LOC says 1901 edition also reprinted in 1973 by folcroft [48]. IA metadata use with caution. multiple editions sloshing around. here’s an OCLC master electronic copy link [49] Slowking4RAN's revenge 22:33, 22 August 2016 (UTC)
I never trust IA metadata. I was looking at a volume yesterday for which they had the wrong title, wrong author, wrong date, wrong publisher, etc. The fact had been commented on by a reviewer, but there doesn't seem to be a simple means for correcting errors like we can do here. --EncycloPetey (talk) 20:48, 23 August 2016 (UTC)

Tech News: 2016-34[edit]

21:17, 22 August 2016 (UTC)

Proofread of the month[edit]

I've returned after a few long hiatus and was looking at the proofread of the month to ease back into things and I noticed the project page still lists the project for July. I'm not sure how to update this or what the project is for August. Can anyone help? Thank you :) Marjoleinkl (talk) 11:31, 23 August 2016 (UTC)

That is because Index:The Fauna of British India, including Ceylon and Burma (Birds Vol 1).djvu, which was selected for July, was unfinished and a new selection was made for August, Index:The Mythology of the Aryan Nations.djvu. For the future, we should be careful (as a site) to not have two works active for POTM again. - Tannertsf (talk) 14:03, 23 August 2016 (UTC)
Thanks for the update, I’ll see what I can do to help :) Marjoleinkl (talk) 14:06, 23 August 2016 (UTC)
We were actually quite successful in the first half of the year in knocking out two works per month for POTM. It just happens that The Fauna of British India has more complicated formatting and image issues than we have normally been addressing. BD2412 T 14:28, 23 August 2016 (UTC)
Yeah I guess we were. I hate to see works done super fast though because what if someone wanted to do the book themselves? - Tannertsf (talk) 14:39, 23 August 2016 (UTC)
The goal of the project is to get works up and running. The only way we can do that, with the volume of works to consider, is to do it as fast as we can. BD2412 T 15:33, 23 August 2016 (UTC)
@Tannertsf: We select a project for the month. When the month ends, we move on to the next month's PotM, even if the previous month's work is incomplete. There have been one or two works that were close enough to completion that we left them in place for a few days, but we don't linger over unfinished works. If we did that, we'd still be working in the Flora of Antarctica from last year. --EncycloPetey (talk) 20:33, 23 August 2016 (UTC)
Ok, thank you for explaining. - Tannertsf (talk) 20:38, 23 August 2016 (UTC)
All that said, we really need to get on the ball to progress on the current project. BD2412 T 13:59, 25 August 2016 (UTC)
I tried having a look at Index:The Fauna of British India, including Ceylon and Burma (Birds Vol 1).djvu but it’s quite a complex work for someone just getting back into things. I’ll have a look at the selection for this month Marjoleinkl (talk) 06:30, 26 August 2016 (UTC)

@Marjoleinkl: welcome back. Do see WT:Proofread of the Month for what is planned for PotM, and please do contribute a point of view for future works. We put forward proposed themes for the months a year ahead, and we take suggestions for each month consistently. We do like fresh faces contributing there as fresh faces brings fresh opinions and participation. PotM is important for us to bring in and bring back casual and occasional users not just those of us 'rusted on'. — billinghurst sDrewth 23:55, 23 August 2016 (UTC)

Non-free images in otherwise copyright-free material[edit]

So there was a brief discussion here about this issue, but it didn't appear to get a concensus: Wikisource:Scriptorium/Archives/2014-05#Dealing with non-free images in transcriptions of freely licensed works.

Effectively I have the following scenario: Australian aviation accident reports are released under the Creative Commons Attribution 3.0 Australia Licence, with the exception that the following are not released under that license: "the Coat of Arms, the ATSB logo, photos and graphics in which a third party has copyright". The copyright statement(s) in full can be found here on the government website.

So, I figure I have the following options:

  1. Upload the entire PDF as-is to Commons, and...
  • add a note in the Commons description which images are not free of copyright
  • on the Index talk page add a note about not transcribing the non-free images
  • in the text itself putting some sort of placeholder image to indicate a non-free image (so that the reader knows to go look at the original source for the image; important if the text is referencing a photo or something)
  1. Self-censor the PDF prior to uploading, removing any logos and non-free photos and replacing them with either a placeholder image or blank page as needed.
  2. ... other option(s)?

It would be nice if we had a concrete policy in place for a situation like this. Or if we had one, if the six pages about copyright that we have could be updated to call it out. --Mukkakukaku (talk) 13:26, 23 August 2016 (UTC)

Australian Coat of Arms
Commons is Commons; you really should discuss their policies there. However, they don't have fair use, so you'll most likely have to delete the copyrighted images from the PDF.
We do have a policy, in that we have no fair-use policy, so by WMF rules, we can't host works that aren't 100% PD. In some cases we may be able to talk about de minimis, but in the general case, the photos from the aviation reports have to be deleted. I've looked at US aviation reports and have been discouraged by the same problem, that many of them would need a number of important maps and photos deleted.
I will say the Coat of Arms and ATSB logo are likely to pass de minimis for Commons, and aren't that important for reproducing here. Commons actually hosts a copy of the Coat of Arms that will probably do for replacing the CoA on Wikisource.--Prosfilaes (talk) 17:55, 23 August 2016 (UTC)
OK, I asked at Commons and got a response indicating that Commons can only host it if the non-free images are removed. Does anyone know of any software for modifying PDFs that will allow me to do this? Eg some sort of PDF editing software that allows actual editing and not just wholesale removal of pages? --Mukkakukaku (talk) 21:22, 23 August 2016 (UTC)
let’s adopt a EDP that allows fair use images here and put a CC-by-NC on them. then upload the work here. see also the work mentioned above Wikisource:Scriptorium/Archives/2014-05#Dealing_with_non-free_images_in_transcriptions_of_freely_licensed_works; see also m:Licensing_policy_FAQ_draft#Unfree_content_not_under_an_.27exemption_doctrine_policy.27 Slowking4RAN's revenge 01:46, 27 August 2016 (UTC)

Project Scanning books[edit]

Hi, I am thinking to make a grant application (either to the WMF or Wikimedia France, or both) to scan books not available online. Until now it is only for books in French. Do you think making a multilingual grant request would be useful? Would you like to have some books scanned? If yes, could you make a list? Feel free to ask other Wikisources. Regards, Yann (talk) 19:09, 23 August 2016 (UTC)

What I would really like is to have access to a good scanner and training on scanning and creating scan files. I have a computer, and several books that really need to be scanned (and aren't in Hathi or IA). Perhaps a collection of tutorial videos, or local events that teach scanning could be part of the grant application? --EncycloPetey (talk) 20:29, 23 August 2016 (UTC)
Wikimedia France has a book scanner in Paris. That's one of the possibilities to scan books. Another one would be to ask GLAMs to do it against some money. The French National Library does it for 45 € per book (cost to be confirmed). Regards, Yann (talk) 16:18, 24 August 2016 (UTC)
20151120Conference Wikisource 02.JPG
you should try the rapid grants to see if they will fund book scanning. m:Grants:Project/Rapid
typically, the research library should have a book scanner setup. the benefit being that it automatically crops and delivers a pdf to your usb drive to take away. there are do-it-yourself rigs which we had a demo of at wikisource conference. also flatbed scanners with a usb are cheap for images, but they are one page pdf at a time. Slowking4RAN's revenge 15:14, 25 August 2016 (UTC)
Not an immediate for English Wikisource, but presumably French Wikisource has Dumas and Jules Verne originals? ShakespeareFan00 (talk) 17:02, 25 August 2016 (UTC)
And something that's a more pressing concern is Volume 8 of a specific edition of the New International Encylopedia to replace/substitute for a volume which appears to be damaged within the Internet Archive's set. Index:The New International Encyclopædia 1st ed. v. 07.djvuand Index:The New International Encyclopædia 1st ed. v. 09.djvu

being the volumes either side of the 'missing' one. :) ShakespeareFan00 (talk) 17:06, 25 August 2016 (UTC)

Although not books as such, I was going to suggest consideration be given to the scanning of other "text" resources such as (not exhaustive) :
  • Instruction manuals. (which can be a single sheet/booklet).
  • guide booklets.
  • pamphlets. (in looking at some material in a museum the amount of printed ephemra the UK government generated (even prior to 1965 was quite suprising! )
  • (old) examination papers. (And the printed answers if they existed.)
  • auction catalouges etc...
  • small-ads. ( Whilst the layout on Wikisource is not ideal, small ads are a gold mine for social historians.) :)

Scanning printed ephemra may well have to be doene through GLAM though as the most likely source of these are archives and record offices, but these are historical materials often overlooked. ShakespeareFan00 (talk) 17:20, 25 August 2016 (UTC)

In short , getting a grant for a multi-lingual 'Scanning Fund' would be useful (provided the WMF is prepared to work with other organisations like Internet Archive and GLAM partners.) ShakespeareFan00 (talk) 17:25, 25 August 2016 (UTC)
On the topic of ephemera: there exists commons:Template:Inscription for transcribing small single-image items with only a small amount of text. I sometimes think it's better to bring things over here to Wikisource anyway (for discoverability, categorisation, and general completeness) but there's certainly a line somewhere between what should be on Wikisource and what left only on Commons. Sam Wilson 23:38, 25 August 2016 (UTC)
  • Google Books has scanned the second edition of John Ogilby's translation of The Works of Publius Vergilius Maro, but not his Homer: His Iliads Translated. (Ditto for Homer: His Odysses Translated and The Fables of Æsop Paraphrased in Verse by the same author.) Problem: Homer: His Iliads Translated is a very old and rare book. Maybe you can find it in one or two libraries (University of Toronto, or Rochester) but they might not give you permission (or have the means) to scan it. Anyway, it would certainly be at the top of my wish list. ~ DanielTom (talk) 21:09, 25 August 2016 (UTC)
  • There are microfilm and microform copies of Ogilby's Iliad and probably his Odyssey. Those are on the edge of something someone really doesn't want us to copy but can't stop us (at least in the US.) His Æsop was reprinted by the Augustan Reprint Society, and until the mid-1980s, none of their works I saw had copyright notices, so you can probably get a copy of that and reprint the whole thing, modern introduction and all.--Prosfilaes (talk) 21:54, 26 August 2016 (UTC)