Wikisource:Scriptorium/Archives/2016-09

From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Announcements

250,000 Validated Pages

We reached 250,000 validated pages on Monday 4 April, with this edit [1] by Akme. Beeswaxcandle (talk) 07:00, 5 April 2016 (UTC)

Hurrah! That's great. :) Now, on to the next ¼ million... — Sam Wilson ( TalkContribs ) … 00:21, 6 April 2016 (UTC)

2000 validated indices

Sometime in the past week we have past 2000 validated works (currently 2008), with a further 1,139 being proofread. Congrats to all. — billinghurst sDrewth 13:40, 9 May 2016 (UTC)

It looks like it was Index:Rebecca of Sunnybrook Farm (1903).djvu on 2 May. Beeswaxcandle (talk) 09:35, 10 May 2016 (UTC)
Interestingly two and a half months later we have 2,085 validated works and 1,135 proofread only works. 80 works validated in 2½ months is impressive, and ever so slightly we reduce the proofread only. — billinghurst sDrewth 16:06, 23 July 2016 (UTC)


Coming event: Wikisource OCR demo

The OCR tool for Indic language Wikisources will be demonstrated by Kaldari at the next MediaWiki CREDIT showcase on Wednesday, 7 September 2016 at 18:00 UTC. The session will be broadcast on Youtube (link TBA) and live discussion will take place on Etherpad (side-channel discussion will be in #wikimedia-office on IRC).

Proposals

Importing pre-1923 works from Canadian Wikilivres

Canadian Wikilivres:Category:1922 has so many pre-1923 works. As Special:Import does not allow uploading XML data, should we copy and paste the texts and the history from Canadian Wikilivres, or should we allow importing from Canadian Wikilivres? If American copyright term stays the same, many works published in 1923 will enter the public domain in 2019 in the USA, so we should prepare a system to move works from Canadian Wikilivres. Thanks.--Jusjih (talk) 19:17, 18 June 2016 (UTC)

Wait. As an Old Wikisource importer, oldwikisource:Special:Import allows me to upload XML data there. I just indirectly brought A Woman's History, but upload XML data here requires becoming an importer. Where should any trusted users apply?--Jusjih (talk) 19:33, 18 June 2016 (UTC)
Are they scan-backed? Beeswaxcandle (talk) 06:44, 19 June 2016 (UTC)
Not always. German Wikisource requires scans with texts, but we and Canadian Wikilivres do not. If anything has questionable source, we may consider deleting for non-copyright reason.--Jusjih (talk) 21:09, 20 June 2016 (UTC)
If we want the extended import facility we can reach a consensus to identify and appoint suitable people to be temporarily granted the right via a request to m:SRP. It shouldn't be overly problematic. We could also explore whether we can have a transwiki import via Special:Import, we can do it to internal wikis, and it may be possible with external wikis. That I don't know. — billinghurst sDrewth 06:06, 22 July 2016 (UTC)

Make the OCR gadget a default gadget

The OCR gadget is enormously useful and I didn't even know it existed until recently. It seems like it would be a good idea to enable this gadget by default so that more people will discover it. Any thoughts? Kaldari (talk) 00:05, 14 July 2016 (UTC)

Nice to hear from you every year or two. I'm not very technical, and I have no idea what that javascript does. I think you might want to take this proposal to MediaWiki for vetting by techies before bringing it here. Outlier59 (talk) 00:24, 14 July 2016 (UTC)
We had it as default for a while, then we turned it off, and I forget why exactly. I know that we backed off numbers of our gadgets for being the default. Many don't require it as a default, though maybe we can better mention it on our information to users. — billinghurst sDrewth 06:39, 14 July 2016 (UTC)
The reason the OCR gadget is not on by default is that it should only be used when there is no text-layer on an imported .djvu or .pdf file. It doesn't do as good a job as that done by the OCR at Internet Archive. Occasionally although there is a text-layer in a file it doesn't show up in the Page: namespace. If this happens to you, drop a note on this page and someone will have a look and in the majority of cases fix the file. Beeswaxcandle (talk) 06:52, 14 July 2016 (UTC)
Yeah, this seems sensible. I think I've only ever used the gadget about twice. Although, I've yet to delve into that part of things since the IA stopped making djvus; has that changed anything? — Sam Wilson ( TalkContribs ) … 07:58, 14 July 2016 (UTC)
I actually had a mild panic attack when I came back from a wiki break and couldn't find the button. It doesn't help that the default header text references the use of the button, but then the button is nowhere to be found. It doesn't help that the header when transcribing says "If no text layer is automatically made available, click the {OCR image} button on the toolbar to generate one." The word "button" is actually a link to your preferences, but that's not really obvious. At minimum, can we get that header text ammended? Maybe to say something like, "Don't see the button? Update your preferences!" ...which has the added benefit of giving more visibility to the available gadgets. Mukkakukaku (talk) 17:15, 26 July 2016 (UTC)


Making <pages> more flexible?

ShakespeareFan00 recently approached me with a multi-page-table-transclusion issue which, upon further research, has led me to realise the current implementation of the <pages> tag has a number of built-in (or hard-coded if you prefer) aspects which are not always desirable. In particular there are two issues which concern me:

  1. the output of <pages> is always enclosed in <div>–</div>, even when such enclosure (as in the middle of a continuing table) is simply wrong. Fortunately in most cases the mediawiki parser subsequently corrects the "error" and the end-result still (mostly—but not always) "works."
  2. necessary embedded metadata used to later produce page number links back to the originating Page: name space is always passed through MediaWiki:Proofreadpage_pagenum_template (nice and controllable at least by administrators) but then the results of that pass are then further enclosed in <span>–</span> which is immutable and only works properly in a textual context. It certainly fails in tables—and it is one of the root causes for some of the stranger recommendations to use {{nop}} to join table pages.

I have certainly not started writing code for any of this but do not anticipate it will be all that difficult to modify the ProofreadPage extension to continue to behave identically as before, yet accept two–or more–new parameters, tentatively titled:

pagenumbertemplate
defaults to Proofreadpage_pagenum_template and controls the MediaWiki: name space (thus automatically fully protected) template-like item used to process page number metadata.
pagenumberenclosure
defaults to span and controls the HTML element(s) used to wrap the above output. Perhaps an opening and closing pair expressed in JSON might be better?
pagesenclosure
defaults to div and controls the HTML element(s) used to wrap the entire <pages> output. I envisage none might be a useful alternate option to deliver raw output where desirable (e.g. for tables.)

Most (perhaps all) changes required—if this proposal be accepted—would affect but a single function, render, of class PagesTagParser in ProofreadPage/includes/PagesTagParser.php.

To set expectation levels, creating and applying these changes would result in <pages> becoming usable for one "purpose" per invocation. It would still not be possible to arbitrarily mix (for example) textual and tabular-extraction transclusions on a single tag.

Any thoughts/objections? AuFCL (talk) 04:42, 18 August 2016 (UTC)

I can think of a 'gotcha' already. On some multi=page tables you have the situation where the first part of the table (that's split up due to transclusion limits) falls after 'text' content. and where you would need a normal startingpages enclosure, but not necessarily a standard end one. Conversely you may also have a page where the end-of table proceeds normal text. Here you may need a normal end enclosure, but not necessarily a starting one. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
I disagree. You can always avoid this situation by introducing additional section markers and transcluding only like-with-like content with a single <pages> invocation. In this case it is nearly certain an intervening structural element (e.g. table closure followed by ordered list commencement) would be required before commencing another <pages> extraction. I hope you would never expect the system to handle ramming a block structure into a non-block context without consequent loss of fidelity? You cannot do that in HTML, let alone the wiki sub-set. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Noted, this would be a documentation issue, rather than a technical one then. ShakespeareFan00 (talk) 21:33, 18 August 2016 (UTC)
There's also the issue of "ribbons" (Table rows which are relevant in Page: namespace, but which when indvidual sections are pulled together to construct a Main namespace page from parts of the table would not be, ( such as row with Chapter, Page headings in small type in TOC pages. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
As discussed elsewhere such things are never included and may be considered to be logically enclosed in implicit <noinclude> blocks. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Being able to tweak the enclosures might also be useful in respect of pulling parts of multi-page lists ( The currently stalled Transcription of Ruffhead's statutes uses mutli-page lists.). I appreciate the above related to tables, but it's another use case that might need to be considered. (I'm not entirely happy with that works approach though. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
See above. Don't mix unlike content blocks in a single operation. Worst case(s) I currently envisage:
  1. text: already handled by the current hard-coded implementation (<span> wrapping works for possibly 99.99% of cases including all inline, and fully-complete—opened and closed—block structures)
  2. table extractions: partially handled by the current implementation (<span> works well inside table cells; but fails spectacularly when inserted between </tr><tr>. The roulette wheel of wiki. Who wants to bet on black?)
  3. ordered lists extractions: the current implementation does not work at all here. Only complete lists may be transcluded without resort being made to multiple counter restarts.
  4. definition lists: probably doesn't work. Fortunately not used much outside of discussion pages. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Comment : Would it be feasible to have an opening/closing pagesenclosure option? This follows on from my thoughts earlier. having a nostart and noend might resolve one issue. ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)
And finally, It would be nice with the changes to de-overload {{nop}}, so that it doesn't need to be used for both a paragraph break, AND a de-facto "following input is at the start of new line so the parser understands it correctly", which is from where what caused this discussion to happen.

ShakespeareFan00 (talk) 13:05, 18 August 2016 (UTC)

I think that this discussion belongs at mul:Wikisource talk:ProofreadPage as you are talking about something that affects all the Wikisources as it is universal code. Some of the issues that you discuss about page numbering are somewhat due to our implementation of that feature, so we would need to ensure that what is done is not impacting other WSes negatively.

When the discussion is moved there we can use the mediawiki message delivery system to put a message on all the WS Scriptoriums. — billinghurst sDrewth 13:42, 18 August 2016 (UTC)

Agreed. However let us try to thrash out the major issues before parading what might turn out to be a bad idea more widely? I do not pretend this suggestion is a panacea, nor do I want people to think it is a substitute of machine over natural intelligence. The current implementation of <pages> works well enough for maybe as many as 9,999 out of 10,000 cases. I am simply proposing an additional flexibility which will permit it to work for a few more, currently pathological, cases—so in this sense it is an improvement on an imperfect case; but with no pretence of perfection itself. AuFCL (talk) 21:04, 18 August 2016 (UTC)
Okay with me, I just wanted to ensure that we set appropriate expectations at the outset. — billinghurst sDrewth 22:39, 18 August 2016 (UTC)
Is there a way to move an entire disscussion thread cross wiki? (I've got no objections to an admin doing so). :) ShakespeareFan00 (talk) 13:46, 18 August 2016 (UTC)
COPY ... PASTE ... COMMENT ... wikilink with [[Special:Permalink/nnnnnn]]. I would wrap it in {{cquote}} if they have it and what you are transferring is small. Otherwise it is a summation of the preliminaries, and the outocmes, and a permalink as background reading of the detail. — billinghurst sDrewth 22:39, 18 August 2016 (UTC)

Additional Issue 1: Headers & footers

If there's going to be an update made to proofread page, can I make a request that a consideration is made concerning the possibility of reviewing how header and footer content is potentially handled as well?

Can someone explain how headers/footers are currently handled internally? As I was wanting to suggest that there should be a way of seperating PAGE based incldueonlys and noincludes from the conceptually different header and footers (An aside here is that under certain circumstances line feeds, nops etc have to be inserted in the page content or footer, to get the parser to see it as seperated from the header or page content respectively. (Table handling is one such area where this is most noticeable. I've also sometimes when using some templates had to force {{-}} clearance in a footer, even though the template was in the PAGE content. This is proabably a much bigger issue to solve though :( )? ShakespeareFan00 (talk) 10:26, 18 August 2016 (UTC)

Umm just add a clear line to first part of the footer, no need for special code. The first component of footer needs to run into the preceding paragraph for some situations. — billinghurst sDrewth 13:46, 18 August 2016 (UTC)
<pages> always strips off the Page: space header and footer and throws them away (recall the "<noinclude>" hints which appear on all header/footer edit boxes?) so I don't really see this as an issue. And I certainly don't envisage altering this behaviour.
Billinghurst is quite right above. My only misgivings are that unprotected new lines are both not very visible, and prone to automated processes or "cleanup"/"tidy" script removals resulting in unnecessary angst. (Also don't forget the precedent set by Visual Editor always axing an unaccompanied </div> in a footer. At least you can say this for {{nop}}: you can see it!)
An unappreciated aspect of the wiki-syntax is that people forget that at a technical level the table closure symbol recognised by the parser is not really |} at all but in fact (new-line)|}! Similarly |+, |- etc. AuFCL (talk) 20:32, 18 August 2016 (UTC)
@AuFCL:, see the note about Tablecrux I left at your talk page :) ShakespeareFan00 (talk) 20:38, 18 August 2016 (UTC)

Bot approval requests

Repairs (and moves)

This should be at Index:Ministry to US Catholic LGBTQ Youth - A Call for More Openness and Affirmation.pdf ? ShakespeareFan00 (talk) 08:33, 20 August 2016 (UTC)

sorted Beeswaxcandle (talk) 09:54, 20 August 2016 (UTC)

Other discussions

What free software and web resources do you use when contributing to Wikisource?

I was considering getting back into Wikisource and I was wondering if you guys used any free software or web resources in the course of your work that you would recommend to others. Abyssal (talk) 16:50, 8 July 2016 (UTC)

This is what I use:
  • Firefox—a browser is all you really need to get started.
  • GIMP for extracting images from scans
  • Inkscape for extracting images from PDF files
  • AutoHotKey for works that contain many repetitions of a tedious string, and also for simple insertion of special characters
  • typegreek.com website for Greek characters, though I use AutoHotKey for that now too
  • shapecatcher.com and fileformat.info websites for searching for Unicode characters
  • Notepad++ for regex find-and-replace - also regexr.com website for learning and testing regex
  • DjVuLibre for manipulating and viewing DjVu files
  • PDFsam - just started using this to manipulate PDF files, good for getting rid of the pesky Google scan page at the beginning of all their scans
  • AutoWikiBrowser for tedious bulk edits
—Beleg Tâl (talk) 18:05, 8 July 2016 (UTC)
Thanks, those are really helpful! Do you know of any free offline OCRs that are any good? Abyssal (talk) 19:06, 8 July 2016 (UTC)
I use IrfanView (a free download for Windows) for image manipulation and conversion. For OCR I use TextGrabber on an iPad. Beeswaxcandle (talk) 23:11, 8 July 2016 (UTC)
For ocr-ed djvu, DjVuToy can be used. However, this application uses Microsoft Office Document Imaging (MODI) for doing ocr. If you don't have it, it can be added as described here. Hrishikes (talk) 02:22, 9 July 2016 (UTC)
Hey @Beleg Tâl:, when you say that Inkscape can be used to extract images from PDFs, can it be used to extract all the pages at once? I can only get it to open one page at a time. Abyssal (talk) 15:18, 11 July 2016 (UTC)
As far as I know it can only do one page at a time. —Beleg Tâl (talk) 15:21, 11 July 2016 (UTC)
@Abyssal: I use PDF24Creator for extracting all pages at a time. Hrishikes (talk) 04:32, 12 July 2016 (UTC)
@Hrishikes: Thanks. I've downloaded PDF24Creator. Abyssal (talk) 12:47, 14 July 2016 (UTC)
This is a nice topic because there are so many helpful programs out there that we may not be aware of. I've been creating some decent quality ocred djvus myself lately since archive.org stopped creating them. For example this file was turned in this Index:Evolution and Natural Selection in the Light of the New Church.djvu OCRed djvu file with the programs below. It's not perfect but it's decent enough. Maybe we should create a help page where all these programs are mentioned and also some tips on how to use them.
  • downthemall which is a firefox addon download manager. This is useful when scans are available, but each page is an individual file. You can use batch descriptors to easily add all the files you want to download to the manager, and there is also a renaming mask which allows you to rename each file as you like before being saved.
  • scantailor This is an excellent program for processing scanned images. It allows for separating double page scans, cropping etc and is mostly automated.
  • gscan2pdf. This is Linux only, this is a most excellent program which was hard for me to find. It is a gui and can open most image formats including pdf and djvu. It can ocr the files and then it can bind the files and save them as pdf or djvu. Very excellent which has allowed me to create many djvus. Jpez (talk) 07:05, 10 July 2016 (UTC)
  • BUB :Book Uploader Bot This tool supposedly will automatically upload books from various sites to archive.org. You can then use ia-upload to download the book from archive.org to wikicommons. It works well for me for Google books, it didn't work at all for me for hathitrust though. I haven't tried any of the other sites mentioned. I like it for google books because it gets rid of the Google notice and adds all revelant data (author, publisher) etc automatically. By the way, has anyone by chance been successful transferring a book from Hathitrust with this tool? If it could work with Hathitrust it would be a great help. Jpez (talk) 09:33, 31 July 2016 (UTC)

Seeing display difference between logged-in and not

Looking at The Production of Security while logged in gives a significant difference for me with not logged in (checked in firefox amd chrome). I cannot see that I have anything in particularly in my local common.css file or my global.css file that would have an impact. Do others see a difference, or is it just me? Thx. — billinghurst sDrewth 06:39, 17 July 2016 (UTC)

Two differences for me: when logged out, the "Upload file" option disappears from Tools section; and the "Create a book" option appears at the top of "Download/print" section (it is the last item when logged in). Hrishikes (talk) 09:45, 17 July 2016 (UTC)
The title page was {{center|{{xxxx-larger block|{{xx-larger block|The Production<br><b>{{larger|of Security}}</p>}}}}}} [[category:not transcluded]]. Very big on top of very big, with some odd html tagging. I tried cleaning it up. See how it looks now. Outlier59 (talk) 12:09, 17 July 2016 (UTC)
Sorry, that page wasn't transcluded (duh!). More very big on top of very big at Page:The Production of Security.pdf/4, so I modified it. Probably too big for browser or skin to handle correctly. Feel free to revert. Outlier59 (talk) 12:36, 17 July 2016 (UTC)

Hmm, I should have been more specific. On the Page:The Production of Security.pdf/4, I am seeing a large gap in the first para when logged out, and a tighter para spacing when logged in (irrespective of the changes prior or now. Noting that the additions of nested larger will just increase the %size of text and it shouldn't be problematic. It seems to be something with the line height / line spacing. — billinghurst sDrewth 14:41, 17 July 2016 (UTC)

For me, there is no difference, in Chrome. Hrishikes (talk) 17:25, 17 July 2016 (UTC)
I just tried changing my skin to monobook, and I'm seeing that tight spacing that you're seeing, Billinghurst. The default is vector. So it looks like it's a difference between the skins. Outlier59 (talk) 21:02, 17 July 2016 (UTC)
In Firefox, logged in, the main namespace font looks like Times Roman (serif) and the same text in the page namespace is Arial (sans serif). I am just saying. — Ineuw talk 07:54, 18 July 2016 (UTC)
┌──────┘
Buried within mediawiki:Vector.css is the fragment:
#bodyContent.mw-body-content {
	position: relative;
	font-size: 0.8750em;
	font-size: calc(1.00em * 0.8750);
	line-height: 1.6;
	z-index: 0;
}
Note the high-lit line line-height: 1.6;. Monobook hold-out users do not get this line as may be demonstrated by enclosing the transcluding line in <div style="line-height:normal;"> as I have done. Either this is your solution or recreate and repopulate mediawiki:Monobook.css with suitable content. Modern or Cologne Blue users will be experiencing similar issues. AuFCL (talk) 09:52, 18 July 2016 (UTC)

'Image missing' template floating to the top on transcribed pages?

I added the {{image missing}} template to two pages that I was proofreading. Then I checked in the regular namespace where this particular work is already transcribed, to see if it looked all right, and I found that the template had "floated" to the top of the page.

The work is here: Thiphanate-Methyl in Air (5606).

The template is present on these pages: 5, 6

This is what I see:

It doesn't appear related to browser or logged-in status. I'm logged in on Firefox and I see it, and logged out in Chrome and I still see it. And it's 99% probably not a cookie thing because this is a new machine and this is the first time I've logged into wikisource since I got it.

Has something changed with this template, or is this some weird new "expected behavior"? - Mukkakukaku (talk) 18:23, 18 July 2016 (UTC)

@Mukkakukaku: It is a known behaviour that has occurred with changes in the css/framing/extension (which I do not know), and it is a little unusual, but that is how it currently is working and it relates to Module:Message box (which it is out of my knowledge base). Once the images are added they will appear in line properly and the matter should resolve. — billinghurst sDrewth 02:24, 19 July 2016 (UTC)
Actually I don't think it has to do with the Lua module itself. I read through the code and it's pretty straightforward. I did notice that on a slow connection, the box actually renders in the correct place, and only after that does it move to the top of the page -- which makes me think that there's some JavaScript or something running later in the execution chain that is moving all elements of a particular "type" to the top.
But anyway, it's good to know that this is a known issue. It's just very confusing because the text on the template says that there's an image missing "at this place in the text" and yet the boxes are all hanging around at the top of the page above the header, and there's no indicator of anything "missing" at the given place in the text. -Mukkakukaku (talk) 13:13, 19 July 2016 (UTC)
Yes, and I believe it is also due to how our headers and footers work, and the containers that are set (/me waves hands to explain all that css gobbledygook black magic.) [Things were somewhat easier to decipher in some regards when all our css was all in the one file.] — billinghurst sDrewth 13:37, 19 July 2016 (UTC)
@Billinghurst, @Mukkakukaku: see: line 111 on MediaWiki:Common.js, Zdzislaw (talk) 21:57, 19 July 2016 (UTC)
Yep, that's it! Good eye, @Zdzislaw:. It's grabbing all the amboxes and shoving them up into the page header. The only reason I can think of that being a good idea is if we only had page-level amboxes and not section-level (or in-the-middle-of-the-text-level) amboxes. Nixing that line should fix the issue, but it will cause all ambox's to remain where they were initially placed. Depending on what the original intention of the coder was, a more nuanced solution may be needed (like maybe not using ambox? The comment says "envelope hatNotes & similar into main navigation header container". Only ambox is affected by this particular line of code.) Mukkakukaku (talk) 23:06, 19 July 2016 (UTC)
@George Orwell III: who was doing the heavy lifting here. I can guess that some of this is around things like {{no source}}, {{copyvio}}, {{sdelete}}, etc., however, that is only a guess and the reasoning was not discussed with the community. There is clear value on this being a community consensus. — billinghurst sDrewth 23:45, 19 July 2016 (UTC)

20:18, 18 July 2016 (UTC)

The Last of the Tasmanians needs a map — Hathi Trust?

Hi. At Page:Last of the tasmanians.djvu/21 we appear to have a folded map (by design or by accident is unknown) which has not been properly scanned. I have checked the only other full version available at Google, and the map page is not shown at all. I am wondering whether someone with reasonable access to the Hathi Trust collection can check their collection to 1) see if the map is consistent in the same edition, and 2) can we get a version of the map separately to replace the rubbish version that we have. The page in our scan seems to follow directly after the list of illustrations. Thanks. — billinghurst sDrewth 02:18, 19 July 2016 (UTC)

@Billinghurst: Full map is available here. But the map is not mentioned in the list of illustrations. It is, however, a part of Daily life and origin of the Tasmanians by the same author and mentioned in the list of illustrations of that work. HathiTrust maps (when available) in all accessible versions of both books are defective. Hrishikes (talk) 04:36, 19 July 2016 (UTC)
Done Image converted, reproportioned and uploaded as a png. Thanks for the pointer. — billinghurst sDrewth 21:19, 19 July 2016 (UTC)

Which tags are excluded in a page footer?

To resolve a page spanning text break in the main namespace, I tried to enclose text using <div></div>, placing the closing tag in the footer, but it was deleted when I previewed the page. I managed to resolve the problem using <span> in the same manner, but was wondering what other codes are not valid to be placed in the footer. This is the page where I used it with success. — Ineuw talk 17:16, 21 July 2016 (UTC)

It isn't just happening to you. I believe this is new behaviour and (thus far) only seems to affect </div> (my money is on this being some half-baked attempt to address phab:T138604 and no doubt over-enthusiastic system "trimming" is responsible. I forget the reference (related to phab:T133294?) but recently the internal database storage format (content model?) for Page: changed and this too could pertain.) My advice at this stage is to continue as if the stripped </div> is still present. AuFCL (talk) 22:09, 21 July 2016 (UTC)
Thanks @AuFCL:. You are right as usual. First off, yesterday's attempt to use <span> failed. Now, I did what you recommended and is fine (especially in the main namespace, where it counts). Furthermore, the strange is that in the following page's header, it retained the opening <div style="line-height:130%;">. What a goulash. Knowing how much they love me at phabricator, should I file a bug report?  — Ineuw talk 03:58, 22 July 2016 (UTC)
W.r.t. last, don't bother. This stuff is being driven purely by Visual/Editor considerations and any hint that (horrors!) unbalanced HTML should be catered for in any way at all will be met with universal disparagement. You have put up with me ranting on this topic before so I shall not rehash that bit. So the header is kept; what: you expected consistency or something? How unreasonable of you! AuFCL (talk) 04:06, 22 July 2016 (UTC)
I would suggest that template the use may be more resilient to the new trimming. So you can always use {{div end}} or any of the other 'block' variations in footers to close an open div. I think one of the things that we are now also seeing with the html5 updates is that these raw tags that have been used out of scope are somewhat problematic to clean up manually, whereas where they are templated they do become easier to amend globally. — billinghurst sDrewth 06:00, 22 July 2016 (UTC)

Index:A-Kentucky-Woman-Dec1892-NationalBulletin.jpg

Does anyone know why Index:A-Kentucky-Woman-Dec1892-NationalBulletin.jpg isn't showing a pagelist? --EncycloPetey (talk) 19:10, 21 July 2016 (UTC)

My fix attempt was not successful. --EncycloPetey (talk) 19:15, 21 July 2016 (UTC)

I changed [[File: to [[:File: to prevent it from displaying the images; it appears to work now. —Beleg Tâl (talk) 20:27, 21 July 2016 (UTC)
Neither name space references nor validation status correct. Fixed both. AuFCL (talk) 21:26, 21 July 2016 (UTC)
@EncycloPetey: I also rejoined pages at Why Democratic Women Want the Ballot. If the intent was to present the final transclusion as explicitly-separated pages this may not be what you wanted. AuFCL (talk) 22:30, 21 July 2016 (UTC)
My primary concern was that the individual proofread pages were not linked in the Index, and I couldn't think how to fix that. Thank you for correcting this. --EncycloPetey (talk) 22:35, 21 July 2016 (UTC)
Is the information at mul:Wikisource:ProofreadPage insufficient? — billinghurst sDrewth 23:59, 21 July 2016 (UTC)
Yes. --EncycloPetey (talk) 00:27, 22 July 2016 (UTC)
Perhaps selected aspects of Help:Index_pages#Using_individual_image_files ought to be combined with/linked to the help page Billinghurst references above? AuFCL (talk) 04:25, 22 July 2016 (UTC)

Index:Life and Select Literary Remains of Sam Houston of Texas.djvu

Index:Life and Select Literary Remains of Sam Houston of Texas.djvu is uploaded on Commons, and I created the Index here. My experience of this type is scant, and help pages aren't that helpful if the editor is unfamiliar with any terminology. I checked progress as Text Layer Requested, as a guess of what it needs. I have two issues with this:

  • There is no side-by-side, just the text on the left-hand side.
  • What is on the left-hand side has no formatting at all.

Please advise on how I might have erred, and how do I correct this. Exactly what is the index status supposed to be after I first create an index that already has the file on Commons? Maile66 (talk) 16:58, 22 July 2016 (UTC)

The "scan resoution in edit mode" was set to 0. I removed it, so now the page scans show up. I have no idea what that property is for, but I've noticed that whenever I try to use it, this happens. -Mukkakukaku (talk) 17:44, 22 July 2016 (UTC)
Thank you. That has resolved the issue. Maile66 (talk) 18:12, 22 July 2016 (UTC)
Normally there is no need to use the scan resolution so leave it empty so that the default width is used. On very high resolution scans there are some times we need to force it lower. — billinghurst sDrewth 00:05, 23 July 2016 (UTC)
  • Page headers on this. I'm thinking consistency is what is wanted. I started off with "larger" in the page headers. Then I noticed another editor who is gradually editing some pages without the "larger" in there. I removed that element from the previous pages I edited. Please let me know what the criteria is before I edit any other pages. Maile66 (talk) 23:55, 23 July 2016 (UTC)
On the talk page for the book, list formatting guidelines for the book. Show a link to a sample page (or two) for formatting. While you're in the process of working out the formatting, you might want to put an "under construction" tag on the talk page, to discourage others from wanton editing.
If someone comes along and edits without paying attention to your formatting guidelines or in-process notice, ask them to hold off on editing, because you're actively working on guidelines. If they mess up the text (as in diff -- initial M changed to L), either "undo" the edit or fix it. I suggest "undo" in that case, because the editor ADDED an error during "validation." Some editors like to jump in while other editors are working on a book. Sometimes this is helpful, sometimes not. If it's not helpful, tell the editor that it's not helpful. If it's helpful, send a thanks.
If you're not sure whether the header should be "larger" or something else, ping a couple of other users and ask them what they think. If you pinged me and asked me what I thought, I would suggest xx-larger with centered italics -- which you can set up on the Index file header. Is it REALLY xx-larger with centered italics? Beats me. But it's probably close enough. It's your call. So make the call and move on.
Try to keep discussions about this book on Index talk:Life and Select Literary Remains of Sam Houston of Texas.djvu.
Above all else, ENJOY what you're doing here! Outlier59 (talk) 02:15, 24 July 2016 (UTC)

Missing pages

  • Per this talk thread, I found pages missing from the djvu file as I was working on the Contents page. I checked the upload at Commons, and the pages are missing from there. There could possibly be more, but I'm not going to edit this Wikisource file anymore. That was the only djvu on this book that I found to upload. I believe this version has all the pages, because I've used it as a resource at Wikipedia. I only remember doing one or two other djvu uploads, and they were short and problem free, so I'm a bit inexperienced in general on uploads. Can anyone help or advise? Maile66 (talk) 21:47, 25 July 2016 (UTC)
You can directly upload lifeselectlitera00cran from Internet Archives to Commons using IA-upload. There is a djvu file. Upload to commons with name "Life and Select Literary Remains of Sam Houston of Texas (1884)", then create a new Index file in Wikisource. You can probably quickly copy most of what you've already done to the new Index file, then you should be good to go. That's my thought, anyhow. :)
Hi @Outlier59:, that looks intriguing, but the link is dead. Would be nice to upload directly without having to download first, I'd like to know about that if there's an option! -Pete (talk) 01:40, 26 July 2016 (UTC)
I swear half the time I can't figure out this link formatting stuff. Truly aggravating. Anyhow, the Index file is now at Index:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu, if anyone wants it. Outlier59 (talk) 01:52, 26 July 2016 (UTC)
Sorry @Outlier59:, I should have specified -- I meant the toollabs link, not the text itself. -Pete (talk) 01:58, 26 July 2016 (UTC)
Not a problem, but please do me and Maile66 a favor and correct the link formatting in this discussion for toollabs. Outlier59 (talk) 02:41, 26 July 2016 (UTC)
Hmm, not sure what you're saying. Does the IA-upload link you placed above work for you? It doesn't for me. I get a page on Commons that says: "Bad title

The requested page title was invalid, empty, or an incorrectly linked inter-language or inter-wiki title. It may contain one or more characters that cannot be used in titles." -Pete (talk) 03:02, 26 July 2016 (UTC)

Maybe try toollabs:ia-upload instead? I can only assume the Commons indirection was a mistake. AuFCL (talk) 03:29, 26 July 2016 (UTC)

@Maile66:, sorry to take us off track. In case this didn't get suggested before, on Commons, I would advise uploading a good scan to the same name as the existing one. That is, when looking at the existing file's page on Commons, look for the "Upload a new version" link.

(On my other subtopic: @AuFCL: yes, the ia-upload link above works for me...oddly, I had trouble finding it when simply searching the tool labs myself before I asked. But it was probably an error on my end somehow.) -Pete (talk) 15:42, 27 July 2016 (UTC)

@Peteforsyth: Everything is cool right now. Outlier59 already uploaded a good file on Commons, and Indexed it at Index:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu, as he indicated above. It's a much more accurate version than what I had uploaded. I'm in the process of making sure all the pages are there. Then I'll ask for a Commons delete on the incomplete file I originally loaded. Should I then have the old Index deleted, or just leave it for a record? I'm concerned someone might waste time trying to proof the old one. Maile66 (talk)

First contribution to Wikisource

Hi everyone!

I'm making my first contribution to Wikisource and I would like to know if some could check what I have do a give to me some advice to improve the future transcriptions. The work is Sixteen years of an artist's life in Morocco, Spain and the Canary Islands. I ask you for advices now because I have made up to the first chapter.

Thanks in advance. I await your answers!

Regards, Ivanhercaz | Talk 02:09, 23 July 2016 (UTC)

I've replied on your talk page with some comments. Best, Mukkakukaku (talk) 12:50, 23 July 2016 (UTC)
Small formatting issue Table of contents pages such as Page:Sixteen years of an artist's life in Morocco, Spain and the Canary Islands.djvu/9 and Page:Sixteen years of an artist's life in Morocco, Spain and the Canary Islands.djvu/10 won't connect properly since the content in them is inside of {{small}} and {{hi}}. Are these templates even necessary? —Justin (koavf)❤T☮C☺M☯ 14:24, 26 July 2016 (UTC)
The contents can be made to look exactly like the original with {{dtpl}} if anyone's interested. Jpez (talk) 09:19, 31 July 2016 (UTC)

19:54, 25 July 2016 (UTC)

pline woes

Judging by the troublesome results I'm getting on Page:Acharnians and two other plays (1909).djvu/171, I gather that setting a text-indent with div affects the location of the float:right in {{pline}}.

Is there a simple fix I'm not seeing? Is there a fix that can be put in place using {{pline}}, or at least without re-editing the entire work to use sidenotes?

I wouldn't have expected text-indent to alter the right margin, so perhaps something else is going on? --EncycloPetey (talk) 17:55, 26 July 2016 (UTC)

It looks like text-indent just shifts everything on that line. Maybe a style parameter could be added to {{pline}} and then an opposite text-indent could be used? Or maybe wrap the {{pline}} with <span style="text-indent:4em;"></span>? —Beleg Tâl (talk) 18:17, 26 July 2016 (UTC)
That's not what's happening here. The text-indent shifts only the first line of a paragraph, and these uses of {{pline}} aren't on the first line of the paragraph, unless the software is treating them like independent paragraphs. What's frustrating me most is that I've never seen {{pline}} do this before, and I've been doing similar formatting on many poetically formatted works for a while now. Notice that all the text on the page has text-indent, just to different values. --EncycloPetey (talk) 18:34, 26 July 2016 (UTC)
Wrapping the pline with a <span style="text-indent:0;">{{pline|400|r}}</span> does the trick. The problem is that the floated line number is inside the div with the text-indent applied to it, so it implicitly inherits that property (the "cascading" part of the "cascading style sheets.") It may be something we want to apply to the template in general instead of hacking on like I just did with line 400 as an example on page 17. --Mukkakukaku (talk) 18:55, 26 July 2016 (UTC)
Done Thanks. I've adjusted the template accordingly, but only for right-floating line numbers. I'm hesitant to adjust anything for the left-aligned ones unless we have test cases to look at. --EncycloPetey (talk) 19:17, 26 July 2016 (UTC)

Uploaded as a test page? It seems to be non-english in any event. ShakespeareFan00 (talk) 21:07, 29 July 2016 (UTC)

It appears to be a page of a Spanish translation (1956?) of Nicola Abbagnano (1901-1990)'s Storia della filosofia (1946–1950). I expect it's copyright encumbered. Prosody (talk) 00:19, 30 July 2016 (UTC)

Two new developments

In Bengali Wikisource, there are two new developments. One, in the main space, there is a bidirectional arrow beside every interwiki link on the left panel, if you click on the arrow, side-by-side view of the work in both wikisources can be seen. Two, a book image is visible beside the up-arrow (index file link) at the top of every transcluded page in the page namespace, clicking which takes one to the transcluded page of the main space. I think these two would be good for English Wikisource too. Hrishikes (talk) 08:07, 30 July 2016 (UTC)

The birdirectional arrow is part of enWS and has been for years, it is mw:Extension:DoubleWiki — billinghurst sDrewth 12:14, 30 July 2016 (UTC)
Please see Gitanjali and its Bengali version, bn:গীতাঞ্জলি. The arrow is visible in bn WS and not en WS. At least it is that way for me. The arrow is also visible in French version of the work, but not in Telugu and Chinese. Hrishikes (talk) 12:27, 30 July 2016 (UTC)
Those links are only temporary. The cross-Wikisource links only exist because the Wikisource links at Wikidata have been incorrectly placed all on a single data item for the original work. Translations have different publication data, and are in a different language, and are therefore supposed to have separate data items from the original work. So the reason these links are rare at English Wikisource is that we have added the data items at Wikidata correctly. Once the links at Wikidata have been "corrected", you will no longer be able to get side-by-side texts. This is an unfortunate, but necessary, issue of using Wikidata to support interwiki links for Wikisource. --EncycloPetey (talk) 17:42, 30 July 2016 (UTC)
I had mentioned two developments, but the discussion is focussing on one of them only. Anyway, I don't think datalinking is the explanation here. If the Bengali and English version are shown side-by-side at Bengali Wikisource but the same two items are not shown that way at en WS, that means it cannot be explained by Wikidata only. Because of course all the five versions are linked, that's why side-by-side view is possible in bn and fr WS, but the same is not possible at en, te and zh. That means the facility is not enabled at those three wikisources. For the second development, see this page, I have chosen an image page, so that knowledge of Bengali script is not necessary. I don't know the phabricator link for this, but it seems useful. Hrishikes (talk) 18:37, 30 July 2016 (UTC)
I didn't say that datalinking was the explanation for the current absence of arrows at some Wikisource projects. What I pointed out was that, if the data items are set correctly at Wikidata, then there won't be any side-by-side links from any Wikisource because none of them will be wikilinked. So the availability of side-by-side pages from those projects that have it is only temporary. --EncycloPetey (talk) 19:00, 30 July 2016 (UTC)
For this tool, only the interwiki menu at the left pane matters. It can be arranged either through Wikidata or locally. See mul:कपालकुण्डला, both Bengali and English versions can be seen side-by-side, without it being linked at Wikidata. Hrishikes (talk) 04:20, 31 July 2016 (UTC)
While this is true, it is terribly inconvenient. It requires the sites to link works manually, and to do this individually for every single mainspace page (chapter) in the work, assuming of course that the two works are transcluded in the same format on both language sites. We really need a cleaner way to handle this. --EncycloPetey (talk) 04:57, 31 July 2016 (UTC)
It would seem that the doublewiki extension is functioning only when the links are provided by Wikidata, and when they are direct interwikis they fail. This is in contravention of the requirements for WS interwiki. A bug should be raised in phabricator: to get that reversed. At wikidata, noting that editions should not be directly linked on the book page, each edition should be linked from its own edition page. Wikidata has pretty well given up on us and linking, their design hasn't worked for out interwikis. — billinghurst sDrewth 05:16, 31 July 2016 (UTC)
This is not so. In wikisources where it is enabled, the tool works whether the items are linked directly or through Wikidata. See examples above. Hrishikes (talk) 05:41, 31 July 2016 (UTC)
Interesting. Anyway, it is enabled here and has been for years, you can see it in Special:Version. Maybe someone is jiggered it in our css, as it used to work fine. — billinghurst sDrewth 12:55, 31 July 2016 (UTC)
Inclusion in the version page does not indicate enabling. The url shortener is also mentioned there. Is short url present for any page here? Hrishikes (talk) 17:41, 31 July 2016 (UTC)
The addition of a link back to the NS0 work page is very useful. It'd be nice to have here. How is the link destination being determined? Does the script look at the link in the #ws-title element on the Index page, and use the link found there? Sam Wilson 03:30, 31 July 2016 (UTC)
I am not aware how it is arranged. You may ask Bodhisattwa, you recently met him in Italy, I think. However, if there is a sectioning in the page namespace and both sections have been transcluded, then two book icons are displayed at the top, like here. Hrishikes (talk) 04:04, 31 July 2016 (UTC)
Ah, so it is perhaps everything from 'what links here' from NS0 (e.g. from your example). That's a good idea! Sam Wilson 06:47, 31 July 2016 (UTC)
There is also another. See the bnWs Main Page, all items in the New Texts section have epub/mobi/pdf download options alongside them. Good things from other wikisources are being deployed there. Hrishikes (talk) 07:05, 31 July 2016 (UTC)
Please don't experiment on the main page. If you want to play, please do it in a sandbox. FWIW that version was very much OUCH. Way too busy. I am also unsure why you would need to offer to download all those works from the main page, it would very much detract from the traffic that you would get into your site, and instead you would just rack up numbers on WSEXPORT and nothing for your wiki. — billinghurst sDrewth 12:57, 31 July 2016 (UTC)
Nonetheless it was an interesting demonstration—however briefly presented. Looked very professional under Vector. Why OUCH? What did it do to Monobook, and could that be fixed as well? AuFCL (talk) 17:17, 31 July 2016 (UTC)
Professional? It looked like an overcrowded mess. --EncycloPetey (talk) 17:46, 31 July 2016 (UTC)
We have sandboxes for the purpose of demonstrating changes, and structural changes to the main page should be always be tested and discussed prior to implementation. For those who want to play please use the existing Main page/sandbox, and feel free to build a sandbox and testcases for Template:New texts, and if you don't know how, please ask for assistance. The trialled version of the template if wanted to be viewed is this revision. — billinghurst sDrewth 23:13, 31 July 2016 (UTC)
@Samwilson, @Hrishikes: there is a js script mul:MediaWiki:TranscludedIn.js that is used to show tabs, in namespace Page, pointing to the texts that transclude that page in NS0. Zdzislaw (talk) 20:28, 31 July 2016 (UTC)

What is this tooltip trying to say?

So if you look at a transcribed work in the main namespace, at the top left is a bar with a series of colors which correspond to the statuses of the transcluded pages. Today I hovered my mouse over those colors for the first time and saw that I got a rather interesting tooltip:

Are those supposed to be percentages? If so, maybe they should have a '%' sign on them (and maybe fewer decimals?)

That's my best guess as to what they refer to, especially since the work in question only has 11 pages. Anything else is beyond me.

(For what it's worth, my mouse, which doesn't show up in the screenshot, is hovering over the strip of green and yellow. I'm using firefox, but it should be reproducible in all browsers since it's a generated value and not styling related. Mukkakukaku (talk) 19:45, 26 July 2016 (UTC)

Good catch! I don't think I've ever hovered over that thing. But yep, it certainly looks a bit odd. I guess file a bug with the proofreadpage extension? Sam Wilson 09:49, 27 July 2016 (UTC)
I'm also using FireFox, but I get no text at all when I hover over one of the colored bars in the Main namespace. I've tried a couple of different works, and I get no popup messages. --EncycloPetey (talk) 18:58, 27 July 2016 (UTC)
... weird. I have the tooltip in both Chrome and Firefox. Looking at the source, it's created using an HTML table, with the title attribute set (which renders the tooltip on hover.) It should be standard behavior across browsers, unless for whatever reason the server isn't giving you a page that has the title attribute set on that particular element. I see it both logged in and not. No idea why you don't see it. Mukkakukaku (talk) 20:08, 27 July 2016 (UTC)
I have it. So here's an additional issue. I went to a couple of other random Aircraft Accident Report pages. In all cases, the sum of the pages that were validated, proofread only, and not proofread was always 100 pages. In the two other cases I checked, one was 100/0/0 while the other was 0/100/0. In this case, we currently have fractions that are the repeating decimal versions of 81+911 and 18+211, which also add to 100, as far as it goes. (The picture above shows a different breakdown, but they still add to 100.) StevenJ81 (talk) 21:01, 27 July 2016 (UTC)
Quick note for StevenJ81 (probably of no interest to anybody else): It is no surprise these factors all add to 100% as that is precisely the way they are calculated in the first place. In essence the function prepareArticle within ProofreadPage.body.php gathers up the counts of pages in each category of proofreading state, and then calculates the percentage proportion of each divided by the total. So all you are really doing is (partially) reversing the calculation PHP has done earlier. AuFCL (talk) 21:33, 27 July 2016 (UTC)
Actually the numbers for Aircraft Accident Report: Eastern Air Lines Flight 663 add up to ~81.4 since I think it's ignoring the "problematic" pages. --Mukkakukaku (talk) 22:01, 27 July 2016 (UTC)
Well spotted. The raw "Problematic" figure is 18.518518518519% which pretty much rounds the total out to 100 again. I just noticed that there a bug report already outstanding for this portion of the issue. AuFCL (talk) 23:09, 27 July 2016 (UTC)
@AuFCL: I figured that. But then the tooltip should have percent signs available, too. On its surface, the tooltip seems to be suggesting that there are 100 pages, and that x, y and z are the numbers of those pages in each category. StevenJ81 (talk) 17:01, 28 July 2016 (UTC)
I believe I have now figured out the history of this. An earlier bug report trying to improve/standardise the HTML inadvertently introduced this issue (phab:T76284—be warned you will need to read between the lines, as the topic meanders somewhat) yet kept on using the language-specific strings now appropriate only to Special:IndexPages (see there for the intended usage.) Later on the language/translation people introduced the tooltip—clearly unaware that the semantics of the reported numbers had altered—presumably since their efforts had commenced.
Mainly driven by the motivation of "making the simplest possible change", I have proposed at phab:T119740 (mentioned above) that the raw page counts be substituted for the percentage values, and the current tooltip layout be maintained. Kindly comment over there if you do not like this suggestion. AuFCL (talk) 05:58, 29 July 2016 (UTC)
@EncycloPetey, @Mukkakukaku, @Samwilson, @StevenJ81: (with apologies to anyone I missed) The new version (1.28.0-wmf.13) of mediawiki rolled into wikisource today, with the tooltip changing to display absolute page counts rather than percentages. This works for me but more importantly are any of you still witnessing anomalies? AuFCL (talk) 23:31, 3 August 2016 (UTC)
No change for me: I'm still not seeing any percentages or other values when I hover over the bar, just as before. --EncycloPetey (talk) 00:16, 4 August 2016 (UTC)
@EncycloPetey: At least it isn't doing anything unusual or bad from your perspective? As a complete aside I did a little searching (you use Firefox if I recall?) and uncovered a setting under about:config—browser.chrome.toolbar_tips which appears to control this exact behaviour. I happen to have it currently set to "true" and hover displays are enabled (I am assuming it is set to "false" on your set-up.) I am definitely not recommending you change it; merely advancing this as a possibility for the differing behaviour. AuFCL (talk) 00:39, 4 August 2016 (UTC)

Two index pages for 'N Rays'?

Hello all. Does anyone know what the situation is with "N" Rays? The top level page includes from Index:N rays - Garcin.djvu but the three existing subpages include from Index:"N" Rays (Garcin).djvu. I suspect the first of these Index pages can be deleted, as there is more progress made on the second of them (and I'm assuming they're the same thing; they look like they are). Thought I'd raise it here in case anyone knows what's up. :-) Thanks! Sam Wilson 09:28, 27 July 2016 (UTC)

@Samwilson: I would be checking for differences in editions either by years, or place of publication and therefore potential spelling differences. To me there is evidence of duplication through lack of identification of an existing version as you indicated. — billinghurst sDrewth 14:48, 28 July 2016 (UTC)
Thanks @Billinghurst; I shall do some more thorough checking and either make a new edition, or raise one of the Indexes for deletion. Sam Wilson 01:12, 29 July 2016 (UTC)
I've deleted the lesser-quality of the two (after discussion), and fixed up the mainspace page to point to that. Sam Wilson 00:45, 4 August 2016 (UTC)

Page numbering

Can we please have a page that gives ONE agreed standard for page numbering in Index pagelists? I've had concerns expressed on my user talk page, that a pagelist I provided in good faith wasn't the best example for new contributors. Across wikisource page numbering seems to vary depending on the contributors (and scan uploaders). There needs to be ONE standard that is enforced universally. ShakespeareFan00 (talk) 10:40, 28 July 2016 (UTC)

I checked Help:Page numbers and although it mentions using a hyphen for blanks it gives vague advice about how to handle pages (such as titles, front and rear matter adverts etc.) which aren't in a "known" sequence. The advice on what get labled with what should be considerably expanded. (Also based on past experience a note about avoiding the use of period i.e "." in page list items should be added.) ShakespeareFan00 (talk) 10:45, 28 July 2016 (UTC)
I thought the help page was pretty unambiguous: if the page has no numbering or other sequence indicator, "-" is usually used. That pretty much leaves is up to the editor's discretion. My rule of thumb generally is not to change the convention already being used, and to use a consistent numbering/labelling scheme when I'm the one setting the standard.
That being said, I'm sure we can propose some changes to the help page for clarity, along with some tips for the quirks of the pagelist widget. (Eg. Multiple words, symbols, etc.) -- Mukkakukaku (talk) 11:20, 28 July 2016 (UTC)
Here's an example of how I think we should label pages Index:The Story of the Iliad.djvu. For pages without numbering I use an en dash, I used to label Title, Contents etc. but I prefer to not anymore since I believe the page's number should be displayed instead. I think labeling contents, title etc can be helpful for transcribing purposes, but once finished I think it's best we use the actual page numbers, and blanks marked as blanks like the example I provided. I agree with ShakespeareFan, we should have one clear way of doing this so all works can be consistent with each other. I don't think this would be too difficult to do. Jpez (talk) 11:46, 28 July 2016 (UTC)
Page numbers wherever possible, as they reflect the work, and enable specific linking to anchored pages, and to the style that the work has used.

I use endashes as they are more clickable than (little) hyphens. I dislike the use of the labels [Image], etc. as I find them stating the bleeding obvious, and ultimately useless for anchors, and I will endash them. If others choose to use them, so be it, however when they become essays in themselves rather than neat labels, ugh! To the over-enthusiasm of some contributors to do index pages on works that others have uploaded, well, that is a conversation that has been had previously. We should all have awareness about boot-stomping through work that another person has initiated.

Re the desire for a binding rule for <pagelists>, no! We have always tried to express guidance and allow some reasoned variation. We should be specific about the purpose of page numbering and the outcomes that are achieved by properly doing this task and the navigation and referencing that it allows. — billinghurst sDrewth 14:42, 28 July 2016 (UTC)

I agree with this: guidance is better than binding rules. I, for example, will always use hyphens instead of dashes for empty pages, as they are easy to type and there is limited benefit in making an empty page link more clickable. I will also always label image plates with "Image" of "Img" or "Plate" to make it clear that they have no page number but also are not blank, and I dislike the use of dashes for this purpose. To each their own! So long as it works and is used in a way that corresponds with the purpose of the tool, there should be no problem nor any need for a one-method-fits-all approach. —Beleg Tâl (talk) 14:53, 28 July 2016 (UTC)
Although I have a set of preferences for numbering pages in an Index, other editors here do not agree with all of them. And in some cases, I have found that my own preferences are not necessarily the best solution for a particular work. There are myriad ways that publishers number pages, format content, and include inserted materials. Thus, we can offer guidance towards "best practices", but a "mandated standard" is less than desirable. --EncycloPetey (talk) 15:34, 28 July 2016 (UTC)
Further to the above, there are some general recommendations on Help:Index pages#Parameters in the section on the Pagelist tag. If it is felt that these should be linked to from Help:Page numbers, then do so. I find that having everything to do with filing out the Index: parameters together is useful for pointing new wikisourcerors to, rather than sending them to multiple help pages. Beeswaxcandle (talk) 20:01, 28 July 2016 (UTC)
feel free to renumber my indexes. i’m just happy when the page numbers are close. when the work is done the reader will not notice. Slowking4RAN's revenge 03:34, 29 July 2016 (UTC)

It might help to consolidate/merge Help:Beginner's guide to Index: files, Help:Index pages, and Help:Page numbers. Newbies can get lost in this maze. Outlier59 (talk) 01:25, 31 July 2016 (UTC)

The Beginner's Guide exists specifically to be a trim quick-and-dirty explanation for beginners. Merging everything with that would defeat that function. The other two items are seeking to accomplish different things, and I'm skeptical of merging the two for that reason. --EncycloPetey (talk) 00:18, 4 August 2016 (UTC)

Is there a point to WikiProjects?

I was thinking about creating a WikiProject, and then I thought: with so few consistently active contributors (myself included among the not-so-consistent), are WikiProjects even worth the effort?

(I was thinking about making a WikiProject about aviation, since I've been doing a lot of work on accident reports (CAB, NTSB, etc.). But I don't want to bother if the community doesn't think it's worth the effort to make the project. I'll happily toodle along and do it piecemeal.)

-- Mukkakukaku (talk) 17:19, 1 August 2016 (UTC)

WikiProjects would be much more effective here if we could coordinate activities with members of some of the very active WikiProjects on Wikipedia (such as w:Wikipedia:WikiProject Aviation). We could certainly pull some of their numbers over for a bit for help in preparing source documents relevant to the field. BD2412 T 18:08, 1 August 2016 (UTC)
True, this is a possibility. But it also has the potential for attracting lots of one-time contributors rather than consistent users. I could easily see posting a "hey come help us work on {topic} over at wikisource" sort of message on a project talk. I would then expect to see a brief spike of interest for up to a week, and then it would taper off. If we were lucky, we'd end up with one or two contributors who might hang around in the long run.
(Ironically enough, I came here the opposite way, as a part of w:Wikipedia:WikiProject Aviation's accidents task force and decided that this would be a great place to centrally archive accident reports for use in writing articles.)
That being said, I'd be happy to create an aviation project and then try recruiting on the english Wikipedia, as an experiment of source. I really just didn't want to create the WikiProject and have it turn into yet another bit of disused meta content. --Mukkakukaku (talk) 18:57, 1 August 2016 (UTC)
The majority of our WikiProjects have been focused on a single multi-volume work, so that consistency can be co-ordinated across all the volumes. The most successful one has been the DNB. Consistency of look isn't so important for a topic, so I wonder if topic based work would be better co-ordinated through a portal, rather than a wikiproject. Beeswaxcandle (talk) 08:53, 2 August 2016 (UTC)
As it says here Wikisource:WikiProject A WikiProject is a collection of pages devoted to co-ordinating long-term tasks on Wikisource. I think WikiProjects is a great tool for large works, or for long term projects, for example to transcribe many works on a single subject and have all indexes gathered in one place. I'm thinking this could take many many years to completer in some cases, if ever. Through the years the wikiproject will be there to have all the needed information gathered in one place for future and current users. Jpez (talk) 09:33, 2 August 2016 (UTC)
Is it generally accepted that Portals can sometimes have lists of works that are in progress? I've sometimes found that to be helpful, when trying to figure out what's missing and what needs to be worked on. An extreme example that I recently created would be Portal:Western_Australia#Works_relating_to_Noongar_language. It's a sort of portal and wikiproject in one. Sam Wilson 09:58, 2 August 2016 (UTC)
What I'm looking to set up is a collaborative space to coordinate work on aviation topics. These topics would include accident reports (which I've been working on so far), court cases/lawsuits, government regulations, aircraft certifications, research, news articles, and so forth. This collaborative space would allow people interested in aviation to identify, locate, and collaborate on documents of interest, or critically important documents that are missing from the archive.
The reason that I think a portal is not the right answer, is that portals are currently being organized via the Library of Congress classification system, and there's nowhere within that system for a single aviation topic. There's "naval aviation" under the naval science category, "aeronautics" and "astronautics" under technology, "airlines" and "air transportation" are under "social sciences", government laws and regulations are under their respective categories, the investigative/administrative agencies (FAA, NTSB, CAB, AAIB, ICAO, etc.) are under politics, the accidents themselves are probably under history -- or maybe under the agency that investigated them putting them in the politics hierarchy. (That, and portals are really underutilized and not very visible.)
This brought wikiprojects to mind, since this is what we traditionally used for this purpose in the English language wikipedia. But if that's not what they're used for here, or not how we'd like for them to be used for here, then I'm open to suggestions. i just don't think portals are the way to go this situation, especially given how they're used today. Mukkakukaku (talk) 20:17, 2 August 2016 (UTC)

21:48, 1 August 2016 (UTC)

Over 100 discussions in this list

If anyone else is having trouble working through this unwieldy 100+ list of topics, please discuss at Wikisource talk:Scriptorium. -- Outlier59 (talk) 01:18, 2 August 2016 (UTC)

Encountering delays from Wikisource when trying to edit

Anyone else seeing Wikisource response delays when trying to edit? I've been hitting the "submit" button multiple times to get a response for the past couple of days. Outlier59 (talk) 23:38, 3 August 2016 (UTC)

I haven't had any such problems, nor have I noticed anything unusual. --EncycloPetey (talk) 00:19, 4 August 2016 (UTC)
Apparently it was summer thunderstorms in my geographical area that messed up the ISP. Storms expected throughout August. Sorry to bother you about it. -- Outlier59 (talk) 01:06, 13 August 2016 (UTC)

This is nearly complete, If someone's prepared to do the last 3 pages of the catalouge at the back it can be marked for validation. ShakespeareFan00 (talk) 18:45, 6 August 2016 (UTC)

Clarification

Reading ShakespeareFan00's comment, I have a question. Help:Index pages#Parameters says that for done/validated status, "completion of any advertisement pages is optional"; for proofread/to be validated, it says "all text pages have been proofread at least once". If as SF00 implies, advertising pages must be proofread for the work to be proofread, this is inconsistent with validation and seems silly to me. Can someone clear up the confusion? BethNaught (talk) 18:53, 6 August 2016 (UTC)

If the adverts are optional. Then this one can be progressed:) Nice to get the catalogue as well though ShakespeareFan00 (talk) 19:19, 6 August 2016 (UTC)
Adverts have always been optional here. They are not a part of the work and are not required to be completed before marking an Index: "To be Validated" or "Done". Beeswaxcandle (talk) 23:42, 6 August 2016 (UTC)
And I have just completed the 'advert' entries, you have a complete cataloge for Methun c. 1907. (which by virtue of it's authors being unknown staff of the publishers is also PD.). Not that wikisource collects publisher ephemera. ;) ShakespeareFan00 (talk)
Actually, it's not complete from a Wikisource perspective. There are no author or work links and the layout is unfriendly. Beeswaxcandle (talk) 23:52, 6 August 2016 (UTC)
@BethNaught: The overarching status of not proofread > proofread > validated relates to the work proper, "the author's work" not the edition, which is publisher's work. The argument will be buried in the archives, and I think about 2009. To keep track of works that have advertising and to know the status it generally falls to Category:not transcluded, Category:advertising not transcluded, and Category:fully transcluded. The front and end matter of editions are of historical and informational interest, so we have an interest in having them transcribed, and the better their page status, the more useful, and the ability to transclude. So SF00 misspoke with their statement; not a biggie in the scheme of things. — billinghurst sDrewth 02:05, 7 August 2016 (UTC)
You mean author/work links from the catalogue section? OK I'll concede on that. Also the layout used in that section (i.e plainlist) was the best I could think of. What would you suggest would be a better layout for it? ShakespeareFan00 (talk) 08:47, 7 August 2016 (UTC)

No transclusion

Index:Herodotus and the Empires of the East.djvu Added in the TOC but seeing no transclusion at all. Suggestions? ShakespeareFan00 (talk) 09:13, 7 August 2016 (UTC)

I used what the help said to use in respect of the pages tag. ShakespeareFan00 (talk) 09:13, 7 August 2016 (UTC)
Use direct transclusion from Page: space. As far as I know <pages> never works inside an Index: page—even if it did it would be re-entrantly building the Index: from itself, and mediawiki has never been too hot on recursion. AuFCL (talk) 10:14, 7 August 2016 (UTC)
Yup. Just re-checked: I thought I remembered seeing this somewhere. The ProofreadPage extension explicitly kills parser recognition of the <pages> tag inside Index: and Page: name spaces. If you really want the reference they are lines 42–48 of .../ProofreadPage/includes/Parser/PagesTagParser.php:
		// abort if the tag is on an index or a page page
		if (
			$pageTitle->inNamespace( $this->context->getIndexNamespaceId() ) ||
			$pageTitle->inNamespace( $this->context->getPageNamespaceId() )
		) {
			return '';
		}
AuFCL (talk) 10:25, 7 August 2016 (UTC)

Alert: Minor change made to "Mediawiki:PageNumbers.js"

Hi to all. A display issue of left hand side page numbers was recently identified (AuFCL) where titles with a question mark (?) in the title was corrupting the link from main namespace to Page: namespace. Hesperian has made a code change that seems to resolve this issue. We ask that all users report any problems that they have with page number links, and ask that a little testing be done where the Index:/Page: pages of a work have characters that are not alphanumeric. Thanks. — billinghurst sDrewth 04:14, 8 August 2016 (UTC)

Querying if given the multiple languages this might be better suited to mulWS than here? ShakespeareFan00 (talk) 10:06, 8 August 2016 (UTC)

 Support Outlier59 (talk) 00:35, 10 August 2016 (UTC)
 Oppose There's English there that's worth hosting on its own here. Why wouldn't we want a copy of the Confession of Saint Patrick here?--Prosfilaes (talk) 08:40, 18 August 2016 (UTC)
 Oppose also. The three languages are easily separated and hosted on each own WS. —Beleg Tâl (talk) 13:50, 18 August 2016 (UTC)

15:40, 8 August 2016 (UTC)

Should existing text-only (no scan) works be retained in addition to scan-backed works?

I've run into this situation a number of times and am curious as to what the consensus is. The situation is as follows:

  1. There exists on enWS, in the main namespace, the text of a work, added in the early(-ier) days of the site (2008-ish). The talk page reveals that the source of the text is Project Gutenburg or similar.
  2. There is no indication, either within the Project Gutenburg or the work, of the publisher, year, or so on.
  3. A scan is located of the same general work. It is not the exact same text as the already existant work, so it is not a candidate for Match & Split.

In this scenario, should the newly transcribed work replace the Project Gutenburg/no-backing-scan-version, or be retained in addition to the other, with disambiguation provided via the {{versions}} template?

In the case of an exact version match, with identical text, I have in the past gone the Match-and-Split route. (Eg. with King Solomon's Mines.) But in this case I'm looking at now, though the other version currently on enWS purports to be from the same year as the scan I'm working on, I've found a number of textual and typographical differences in the texts. (Which is to say, though the version on enWS purports to be from 1905, my actual 1905 scan is different, so I find myself doubting the "provenance" of the existing version.)

--Mukkakukaku (talk) 18:36, 8 August 2016 (UTC)

  • @Mukkakukaku: I am not a regular on Wikisource so I do not know practice here. Based on my other wiki-experience, I would say to mark the text-only version as deprecated and let the scan-backed work take its place as the authoritative version. Since there is no space limit in Wikimedia projects, there is no reason to delete the text-only one until and unless there is some supporting evidence that it is an actual bad copy, and not a correct copy of some alternate version. I would not oppose the text-only version be deleted for a reason, but if no one offers a reason to delete it then keep it in limbo as an alternate version that is named in a way that makes it unlikely to be found. Categorize it as an alternate version, and link to it somehow in the documentation notes of the alternative version. The mistake to avoid is losing a transcription of an alternate version, until and unless someone makes an argument with evidence that it was totally a mistake. There are no space limits in wiki projects. Blue Rasberry (talk) 20:56, 8 August 2016 (UTC)


Hi Mukkakukaku,
  • If the text-only version is definitely the same text as the scan-backed version, then replace the text-only version with the scan-backed version.
  • If the text-only version is definitely a different text from the scan-backed version, then keep the text-only version in addition to the scan-backed version.
  • If you cannot figure out whether the text-only version differs from the scan-backed version, because no source has been given and there's no apparent textual evidence, then we replace the text-only version with the scan-backed version, on the grounds that there's no point keeping two copies of something if we can't even tell if they are the same or not.
Hesperian 00:51, 9 August 2016 (UTC)
To note that if it is a Gutenberg text, often the earlier version were not specific on the edition, or at least not outwardly identifed. I believe that @Prosfilaes: has access to backroom data and can identify some editions with that data. — billinghurst sDrewth 02:35, 9 August 2016 (UTC)
I think Hesperian's guidance is best. Blue Rasberry (talk) 14:24, 9 August 2016 (UTC)

book upload - public domain reprint - redacting newer content

When public domain and copyrighted text are mixed in a book which will be uploaded to Commons, how should copyrighted text be addressed? Is blacking it out a recommended option?

(This is a cross-post from Commons:Commons:Village_pump/Copyright#book_upload_-_public_domain_reprint_-_redacting_newer_content)

My question is actually for a case for Hindi Wikisource but I speak English and wanted to post here.

There is a famous poetry text from India from about year 1300. The text is in the public domain. On paper it might be 100 pages. It has been reprinted numerous times. There is a reprint from 2015 of the original text without translation. The reprint begins with copyright information and a preface, as is common with reprints of old texts. Following that, the old public domain text is printed verbatim, except that there are footnotes explaining the meaning of the many archaic terms. The book ends with indexing.

It is proposed that the public domain old text be uploaded to Commons then imported to Wikisource. The challenge to address is the procedure for removing copyrighted additions of text to the newly reprinted material. Is anyone aware of any example of a precedent for this?

One way to do this might be to scan the entire book, including both copyrighted and public domain parts. At this point, all contemporary additions to the original text might be blacked out, so that there is accounting for all pages in the reprint but the copyrighted parts are obviously removed. Another option could be to avoid putting the scanned pages in Commons at all, and instead, do the transcription of text from the book to a new digital file which serves as the master in Commons and Wikisource. So far as I understand, Commons and Wikisource prefer to have source documents whenever possible, including scans of documents which are transcribed for use in Wikisource.

Can anyone say whether they have or have not seen any such instance of this sort of book in Commons? Does anyone have an idea or preference for how uploading a book of this sort should be done? Is anyone aware of any book in Wikisource with redacted copyrighted portions? Blue Rasberry (talk) 20:50, 8 August 2016 (UTC)

Uh... You should type it into a digital file and only put the public domain contents there.Wetitpig0 (talk) 10:50, 12 August 2016 (UTC)

Save/Publish

Whatamidoing (WMF) (talk) 18:03, 9 August 2016 (UTC)

IA upload tool is making djvu

Internet Archive has stopped making djvu. So IA upload tool is now converting the pdf file to djvu while transferring it to Commons. This is early stage yet, currently some djvu files are coming out as blank files. Bugs have been filed with Phabricator and Github, and hopefully the matter will be resolved soon. This is for general information. Hrishikes (talk) 11:32, 15 August 2016 (UTC)

Very big thanks for such good news and to all the contributors who build such tools! --Zyephyrus (talk) 01:29, 16 August 2016 (UTC)
Yes, thanks for the info. I have wondered about the pros and cons of PDF vs. DjVu for some time; it's my understanding that DjVu is better at compression, and is an open format (though there are some patents involved). With that in mind, I'm surprised and sad to learn that Internet Archive has stopped supporting the format. I found the announcement with their reasoning. I wonder if the issues they were having are related to the ones WM is now experiencing. Do you have links for the Phabricator or Github bugs? -Pete (talk) 02:38, 16 August 2016 (UTC)
https://phabricator.wikimedia.org/T142939 and https://github.com/Tpt/ia-upload/issues Hrishikes (talk) 05:31, 16 August 2016 (UTC)
Don't forget that in the PDF-> Djvu, some scans will need to be 'flattened' first owing to more recent PDF format versions allowing for layers. In some scans which are ex google, There's been some clever stufff with portions of the scans being split up onto different layers, (presumably as a copyright trap, even on nominally PD works (sigh) ). I've encountered this with some IA djvus which seem to have been downconverted from Google Books PDF.

Another issue I've found with some PDF's is mis-aligned scan/content box outlines meaning that a strightforward page extraction clips pages in the wrong place.

These are things to bear in mind when writing the convertor. ShakespeareFan00 (talk) 10:10, 16 August 2016 (UTC)

is it user:nemo bis and user:tpt who did it? we should really send them an honorarium. or wikisource t-shirt. Slowking4RAN's revenge 00:18, 19 August 2016 (UTC)

19:37, 15 August 2016 (UTC)

Braces formatting help, please

Please see the top area needing braces Page:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu/567. I cannot find any help pages or examples to tell me how to get the braces there, while also getting No. 1982 to the right of the braces. There is a similar, but somewhat different, situation on Page:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu/680. Can anyone guide me, please? Maile66 (talk) 21:39, 15 August 2016 (UTC)

I'm not sure why the whole thing is wrapped in a blockquote. But Help:Templates talks about how to use the {{brace}} template -- did you look at that? Mukkakukaku (talk) 22:28, 15 August 2016 (UTC)
Well, it's wrapped up in a blockquote, because it's a quote within a speech Houston was giving. However, I started with the Help page you have linked, because there are perhaps 20-30 pages in this book that I successfully coded. It's just this particular type, where there is something centered to the right of the braces that has thrown me. I've actually been looking at it since yesterday. And I still can't figure out how to arrive at what needs to be done. The other one I linked are two braced text side by side, one on the left, and one on the right. I can't understand how to do that , either. Maile66 (talk) 22:38, 15 August 2016 (UTC)
Looks like User:AuFCL offered a solution for Page:Life and Select Literary Remains of Sam Houston of Texas (1884).djvu/567 on the page. Outlier59 (talk) 23:05, 15 August 2016 (UTC)
I did the two pages in entirely different styles. Choose whichever (or neither as you see fit!) you like best. AuFCL (talk) 23:08, 15 August 2016 (UTC)
User:AuFCL Wow! You were fast! Thank you so much. Maile66 (talk) 23:26, 15 August 2016 (UTC)

Encoding problem in archive

Hi enWS, when i digged into the archive i saw that in this discussion the original wikilink de:Seite:Ludwig Bechstein - Thüringer Sagenbuch - Erster Band.pdf/19 is broken due the broken German umlaut. I guess that this is a general problem. Regards, ---Aschroet (talk) 08:09, 18 August 2016 (UTC)

Proposal: replace {{translation redirect}} with substituted {{dated soft redirect}}

We have had the translation redirect template in place for a couple of years, and I still don't find it a very functional template. It currently sits as a permanent cross namespace redirect that just says "moved" and has a generic link. I propose that we do as we do elsewhere with moves and replace these with a substituted dated soft redirect. This will allow TalBot process to go about its clean up business and remove the links. It also gives a clearer link what is the page, and that is far better in google cache then the current config. — billinghurst sDrewth 11:09, 20 August 2016 (UTC)

21:17, 22 August 2016 (UTC)

Announce: New functionality: cross-wiki search results

The Discovery Search Team wants to enable search results ​on Wikipedia ​that will include articles​ ​gathered across all sister wiki​​ projects – within the same language – but we need your feedback.​

​​​Please read the specifics of how this new functionality might work​ ​ and​ add​ ​comments, concerns, or alternative ideas for design options​ on the talk pages​.​

See an image that shows one of many example display options that have been mocked up, after considering what other wiki communities have done.

Thank you for your time​ and cheers from the Search Team!​

Deborah Tankersley, Wikitech-ambassadors mailing list

Help