Wikisource:Scriptorium

From Wikisource
Jump to navigation Jump to search
Scriptorium
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 299 active users here.

Contents

Announcements[edit]

Initiative for coordinating documentation, practices and solutions of misc. Wikisource versions[edit]

Hello, everybody is invited to participate in the project of coordinating documentation, practices and solutions of misc. Wikisource versions. Although the page itself is only in English so far, all comments in any language are welcome in the talk page and will integrated in the page thereafter. It will also be internationalized once enough attention will have been dedicated to it. Thank you in advance for all your contributions, and please be bold to spread the word everywhere you feel apropriate. Cheers, Psychoslave (talk) 03:20, 14 August 2018 (UTC)

Proposals[edit]

Bot approval requests[edit]

Approval request for new bot task[edit]

I'm requesting approval for UnderlyingBot (talkcontribs), operated by me. The bot would execute two tasks, both exclusively relating to An Etymological Dictionary of the German Language:

  1. divide the OCR text into sections (as in this diff) and
  2. create new subpages based on those sections (like this one).--Underlying lk (talk) 07:39, 26 June 2018 (UTC)
  • Pictogram voting comment.svg Comment Please refer to WS:BOTS to see what items are needed in a request. --EncycloPetey (talk) 19:50, 26 June 2018 (UTC)
    • purpose: fixing common errors in the OCR import of the work An Etymological Dictionary of the German Language, such as Tentonic -> Teutonic, and make it more easily proofreadable by humans, more in general;
    • scope: pages about the Etymological Dictionary exclusively;
    • programming language or tools: pywikibot, mostly the replace.py script;
    • degree of human interaction involved: both automated and semi-automated.--Underlying lk (talk) 19:24, 1 July 2018 (UTC)
  • Pictogram voting comment.svg Comment
Point 1: I am fine with it (with the comment that it would be good to include as many replacements as possible in a single edit). BTW, you might find pagefromfile.py quite useful as well.
Point 2: technically you should do it once pages are Proofread unless there is a special reason for that.— Mpaa (talk) 19:34, 1 July 2018 (UTC)
About point 2, the subpages are already there by now. The bot will be responsible for applying automated fixes to the OCR text.--Underlying lk (talk) 03:14, 3 July 2018 (UTC)
  • Pictogram voting comment.svg Comment I would like to see you document in your bot user space the replacements that you are undertaking. Having such available is always useful for next time around. — billinghurst sDrewth 04:02, 3 July 2018 (UTC)
I made a list of the regex replacements here, and will keep it updated when it changes.--Underlying lk (talk) 04:45, 3 July 2018 (UTC)

If there are no objections, I will set the bot flag.— Mpaa (talk) 17:09, 7 July 2018 (UTC)

Flag set.— Mpaa (talk) 21:42, 9 July 2018 (UTC)
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. — Mpaa (talk) 13:26, 29 July 2018 (UTC)

Repairs (and moves)[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

Other discussions[edit]

section problems[edit]

i have been going through old encyclopedias adding sections, i.e. [1]. however, it appears the wikieditor is converting the section tag to "##" and then breaking them, [2] making subsequent edits much harder. this is a recent problem. is there a work around? Slowking4SvG's revenge 22:05, 8 July 2018 (UTC)

I've always used the double-hashtag notation, as I find it easier to use. I believe there is (or was) some setting in the Preferences to adjust how sections are treated when editing, but cannot find it at the moment. --EncycloPetey (talk) 22:13, 8 July 2018 (UTC)
it is wrapping the initial ## in nowiki, [3] forcing section tags to save sections ; turned off "easy LST" gadget, but problem remains for me. Slowking4SvG's revenge 22:06, 10 July 2018 (UTC)
It worked fin when I edited just now, but I think the problem may have been that you had two separate "s7" sections (one of them I changed to s6). The duplicate section name may have been causing the problem. --EncycloPetey (talk) 22:41, 10 July 2018 (UTC)
no, it is a wikitext editor setting that is so opaque, i cannot undo - i.e. [4] i guess i will stop saving any page with a section tag. Slowking4SvG's revenge 23:56, 10 July 2018 (UTC)
@Slowking4: Check your "Gadget" Preferences settings. I have checked "Easy LST: Enable the easy section labeling syntax in the Page: namespace" on mine under Page namespace options. Do you have this activated? It could be the issue. This is the setting I couldn't locate before. --EncycloPetey (talk) 00:31, 11 July 2018 (UTC)
no dice. all editing gadgets off = same problem. the wiktext editor is highlighting the initial ## in blue and preventing saving as a section tag. it also loads previous section tags as ## and then breaks them. this is in firefox and chrome. Slowking4SvG's revenge 00:45, 11 July 2018 (UTC)
I'm also using Firefox, so that at least doesn't seem to be the issue (at least not on its own). Are you using AWB or some other editing tool? I tend to edit manually and don't get highlighting. If it's an editing tool issue, I won't likely be able to help with solutions. --EncycloPetey (talk) 00:58, 11 July 2018 (UTC)

Pictogram voting comment.svg Comment For your two identified links when I am viewing them in standard editing mode, I am not seeing any issues with the pages, ie. no breakages. Can you please edit some more pages and save so we can have some actual diffs. I am wondering whether it is just your editor rendering things incorrectly, rather than the output generated, though don't have enough data to make a credible opinion. — billinghurst sDrewth 01:15, 11 July 2018 (UTC)

here is a snip of edit window Page:Appletons' Cyclopædia of American Biography (1900, volume 7).djvu/44
Wikisource section failure in wikitext.png
and as saved Page:Appletons' Cyclopædia of American Biography (1900, volume 7).djvu/47
Wikisource section failure in saved.png
i expect it an obscure wikitext setting. Slowking4SvG's revenge 17:10, 11 July 2018 (UTC)
When I was fixing up lint errors in some of my old sectioned works I found that using the page migration tool by itself worked fine, but if I clicked preview it messed it up.... MarkLSteadman (talk) 18:42, 11 July 2018 (UTC)
ok, i found it - in the edit toolbar. there is a "syntax highlighting" button that toggles on and off. i do not recall turning that on. and in addition to highlighting it adds some nowiki code. do not recall it working that way before. sorry about the drama, very obscure. Slowking4SvG's revenge 01:06, 14 July 2018 (UTC)

"Technical administrators" are coming: we need a plan[edit]

For those of you who are not aware of the ongoing consultation at Meta, a change is coming which will affect who can edit CSS and JS pages in the MediaWiki: namespace. A new group called "technical administrators" will be created and only they will be able to do this. Important quotes from the consultation page:

"By default, bureaucrats and stewards will be able to grant group membership to users, the same way it works with admins. How to appoint new technical administrators will be left at the discretion of each local community (or the global Wikimedia community if that community creates a global policy through the usual means)."

"A new set of permissions (editsitecss and editsitejs) is being introduced to MediaWiki; to edit a .css or .js page in the MediaWiki namespace, both the old editinterface permission and the corresponding editsiteXXX permission will be needed. Admins and other user groups who currently have editinterface will receive the new rights for a short migration period (so that the transition can happen without any disruption) but eventually won't have them, and the software will enforce that no other groups than technical admins can have it."

"After this consultation ends, there will be a migration period (probably two weeks) in which the technical administrator user group will exist but normal administrators will still be able to edit CSS/JS. Please make sure your community is aware of this so they can add people to the technical administrator group during that time, and have a process for deciding who gets added. (What that migration process should be is left to each local community; it could be as simple as adding every administrator who asks for it.)"

"Also, please make sure your wiki has some documentation and election process for the new group past the migration period. Again, this could be as simple as asking newly elected administrators whether they also want to be technical administrators and whether they are familiar with Javascript and basic security practices. In any case, it is recommended to make the bar for technical admins at least as high as for admins (in terms of trust and user behavior), and maybe even higher (see the user group page for more advice)."

The consultation runs to 23 July and the migration period after that is expected to be 2 weeks. That gives us one month to decide a policy governing this group and implement it. I was thinking it would be helpful if people could express their initial ideas first, and then we can crystallise those comments into a formal policy proposal. BethNaught (talk) 15:19, 9 July 2018 (UTC)

  • My opinion: we should add "technical administrator" to the Restricted access policy separately from administrators. Access and revocations should be discussed in the same way as for other restricted access roles. Users should be able to make "rolled-up" applications, i.e. request adminship and techadminship at the same time. Likewise confirmations should occur together (as for bureaucrats, for example). At any rate, we should use this as a starting point for simplicity, and re-evaluate later on if desired. As for current admins, any active admin who has a record of editing site JS or CSS without problems should be grandfathered in without a discussion (although still subject to confirmation). BethNaught (talk) 15:31, 9 July 2018 (UTC)

We have tried to keep things simple, non-hierarchical, where adminship is access to tools, not an elevated personal status. Though being aware of the issues that occurred which and has driven this technical change, the community should be more restrictive of that right. So, my thoughts are

technical
  1. rights in independent groups (eg. admin and techadmin)
  2. techadmin group can be assigned by local bureaucrat
    • noting that 'crats can now assign people to groups with an automatic expiry, so temporary assignations are easy and possible
  3. ability for local 'crats to remove techrights group directly
    • whereas we continue the existing practice of the standard removal of admin group to be on recommendation of 'crats to stewards
  4. specify the name that we wish to have in place for this group (per project page)
procedural
  1. yes, address function of both groups at Wikisource:adminship in different sections
  2. applications for adminship/techadminship would state the right(s) requested on application to, and consensus of, the community
  3. existing admins should identify if they require techrights, with justification, and be grandfathered
    • run process at Wikisource:Administrator (we can use global message bot to ping all admins on their talk pages), and no response means standard admin rights, 'crats retroactively able to assign
  4. enWS qualifications for assignation of the techadmin rights can be determined at our rate independent of above process as this is not a particularly active zone (bar to access should not be extreme IMNSHO)

billinghurst sDrewth 23:21, 9 July 2018 (UTC)

  • Pictogram voting comment.svg Comment it is the technical that needs to be resolved first, and we need to administrative to be in place prior to the switch being flicked after two weeks. — billinghurst sDrewth 06:09, 12 July 2018 (UTC)

Index view of page status—diff between logged in and logged out[edit]

For me, if I view the page Index:The empire and the century.djvu logged in I am seeing the page status with appropriate proofreading colours. Whereas if I am logged out, I see no page status colouring. Is that what others are seeing? — billinghurst sDrewth 23:00, 10 July 2018 (UTC)

I saw no colours either way on desktop. However in mobile mode I see them both logged in and logged out. BethNaught (talk) 23:04, 10 July 2018 (UTC)
Okay, thanks. I can confirm that for mobile. — billinghurst sDrewth 00:03, 11 July 2018 (UTC)
User:Zdzislaw and me created a workaround for this problem: some JavaScript code that purges index page when the problem is noticed. Users who are troubled by this problem may copy the workaround fom my common.js to their own common.js or import it from sourceswiki where I make it available as a. This workaround should be removed once T199288 is fixed. Ankry (talk) 06:58, 26 July 2018 (UTC)

March of the Volunteers[edit]

  • Isn't this in the wrong language?
  • The lyrics are still copyrighted until January 2019 in China and Taiwan, and probably until January 2036 in the United States. Should this be deleted?

Jc86035 (talk) 08:17, 11 July 2018 (UTC)

the proper venue for that discussion is here c:Commons:Deletion requests/Files in Category:March of the Volunteers. Slowking4SvG's revenge 11:54, 11 July 2018 (UTC)

Book only available on Google Books[edit]

This pamphlet is relevant to my work and I'd like to set in motion a transcription project, but I find it on Google Books and nowhere else. I could ask the British Library for the PDF, but I hope that someone may have the ability to import the book from Google Books- I seem to remember there being a tool to do this, although I don't see it in the Help pages. Thanks in advance for any help, MartinPoulter (talk) 20:23, 11 July 2018 (UTC)

You may be thinking of the Book Uploader Bot. --EncycloPetey (talk) 20:43, 11 July 2018 (UTC)
If you would he happy with the PDF, you can just download it from GBooks itself—from the cog menu. BethNaught (talk) 21:16, 11 July 2018 (UTC)
@MartinPoulter: c:File:Description and Use of a New Celestial Planisphere.pdf -- Hrishikes (talk) 10:00, 12 July 2018 (UTC)

Converted Template:custom rule[edit]

I have converted Template:custom rule from a table-based template to a div-based template as it wasn't centering in mobile view. I built test cases and checked a range of uses and it looks fine. If anyone finds examples of it breaking then please let me know on the template's talk page adding me as a ping. Thanks. — billinghurst sDrewth 22:45, 11 July 2018 (UTC)

image centering in mobile fixed too[edit]

On the same note, via a phabricator ticket, Londonjackbooks and I have had the universal mobile skin fixed so it centres images where wikicoded to be centred, it had been left aligning them. Another little tick feels good. — billinghurst sDrewth 22:48, 11 July 2018 (UTC)

Good news! Mobile view looked very ugly before, I'm glad it's sorted! Jpez (talk) 13:01, 12 July 2018 (UTC)
@Jpez: nobody had said anything so it has been unaddressed because of that, and for how long??? Can I encourage people to identify the ugly bits and put them before the community for our attention. — billinghurst sDrewth 04:41, 13 July 2018 (UTC)

Consultation on the creation of a separate user group for editing sitewide CSS/JS[edit]

See #"Technical administrators" are coming: we need a plan above. BethNaught (talk) 11:13, 12 July 2018 (UTC)

What questions concerning the strategy process do you have?[edit]

Hi!

I'm Tar Lócesilion, a Polish Wikipedia admin and a member of Wikimedia Polska. Last year, I worked for Wikimedia Foundation as a liaison between communities and the Movement Strategy core team. My task was to ensure that all online communities were aware of the movement-wide strategy discussion. This year, my task similar. Phase II of the strategy process was launched in April. Currently, future Working Groups members are being selected, and related pages on Meta-Wiki are being designed.

I’d like to learn what questions concerning the strategy process would you like to be answered on the FAQ page? Please answer here, on my talk page, or on a dedicated talk page on Meta-Wiki. Thanks!

If you have any questions or concerns, please, do ask!

Thanks, SGrabarczuk (WMF) (talk) 18:29, 14 July 2018 (UTC)

author link[edit]

When is using {{al}} problematic, and when is it not? Or is it just a matter of personal style preference? In the past, maybe some years ago, I had used the template, but had it changed by someone to [[Author:]] (I have since forgot who, maybe @Billinghurst:?) I meant to ask why, but never did. Thanks, Londonjackbooks (talk) 14:04, 15 July 2018 (UTC)

The name {{al}} does not have clarity to its name, so has those issues, and next to zero people use the full {{author link}} version—count looks to be 3,556 for former, and 51 for latter. So it lacks some clarity for namespace work, and it will fail when used when we have non-personal authors where the pages are in the portal namespace.

If I replaced it, it was probably due to the template not being readily bot'able compared to standard wikilinking to the author namespace, and it was easier to make a change that way for what I was doing. Not certain that I particularly recall doing it, though often it is easier to do a preliminary substitution on something, then linearly apply a regex to a standard replacement than code a more complex regex to take in all the permutations and combinations. [Well for my grade of regex'ing]. Similarly if we are doing a deep search for author links, something like Special:Search/insource:"Author:William Shakespeare" is far better than trying to faff around with trying something like Special:Search/insource:"{{al|William Shakespeare}}" and then the other form of the template; with all the implicit search weaknesses. One area where having one form would just be easier, though not one where it becomes impossible to work, just a rise in the level of difficulty. Waving a magic wand, and I would subst: the lot of them every so often, though not to the point that I am going to have the discussion/argument/... — billinghurst sDrewth 04:21, 16 July 2018 (UTC)

Thanks for the explanation :) Londonjackbooks (talk) 04:48, 16 July 2018 (UTC)

Saxon character set?[edit]

Anyone aware of a Saxon character set? At Page:Surrey Archaeological Collections Volume 1.djvu/312 I have characters which need setting into the unusual. — billinghurst sDrewth 08:18, 16 July 2018 (UTC)

w:Insular script -- Hrishikes (talk) 08:54, 16 July 2018 (UTC)
Thanks. Which then leads to MUFI character recommendation v. 4.0 (MUFI = medieval unicode font initiative) which indicates that they are not current characters within Unicode. Anyone have thoughts on how they would like to see them transcribed? Just standard, or do we want to try to dig through Template:ULS -> mw:Universal Language Selector? — billinghurst sDrewth 10:53, 16 July 2018 (UTC)
ULS is already implemented via {{insular}}. —Beleg Tâl (talk) 12:33, 16 July 2018 (UTC)

Tech News: 2018-29[edit]

16:01, 16 July 2018 (UTC)

ia upload generated scans seem always to be off[edit]

When the OCR is not available to use, getting the uploaded OCR'd via IA-upload tool seems to always generate OCR which is off by one in the beginning and probably missing pages as the scan progresses.

I am not sure if it is something I am doing or not doing.... If it was my software, I would be suspicious of the part that drops the first page except that does not seem to be the direction that the scans are off.

How do I progress from here?--RaboKarbakian (talk) 21:52, 18 July 2018 (UTC)

I think that under some conditions IA-upload tool fails. See https://phabricator.wikimedia.org/T194861. The djvu file needs to be manually fixed.— Mpaa (talk) 07:23, 20 July 2018 (UTC)

Ebook-only tag?[edit]

The ws-noexport tag works well enough to exclude parts of a page from the ebook (and print) version. I'm looking for something to achieve the opposite, so that some content will only show up in the epub/pdf (this would be helpful to link pages within the ebook, among other things). {{Only in print}} promises to achieve this, but it doesn't seem to be working.--Underlying lk (talk) 21:55, 18 July 2018 (UTC)

Translation published before the original work[edit]

May I ask somebody who is familiar with US copyright to look at this case? Is this book OK for Wikisource? Ankry (talk) 14:18, 20 July 2018 (UTC)

Based on that discussion, it should be ok under {{PD-US-no-notice}}. —Beleg Tâl (talk) 12:43, 21 July 2018 (UTC)
I didn't see that we established that it was first published in the US, and thus not restored by the URAA.--Prosfilaes (talk) 03:00, 27 July 2018 (UTC)

Anyone else having personal common.js issues?[edit]

I am having failure in the operation of my scripts from my common.js page. They worked a couple of days ago, and fail today, and it looks as though they are just not seen, as I can move them to my global.js page and they work fine. Is anyone else having issues? — billinghurst sDrewth 14:38, 21 July 2018 (UTC)

Cleanup scripts or other? I still have not eliminated redundancies from transferring my data to global—tried my scripts from here and they seem to work ok. All my customized edit buttons work ok as well. Londonjackbooks (talk) 14:46, 21 July 2018 (UTC)
Pictogram voting comment.svg Comment Not sure if this is related, but yesterday I would occasionally have the software render labels in Polish instead of English, e.g. on Index pages. The effect was not limited to Wikisource, but on Wikisource a hard purge solved the problem each time it occurred. --EncycloPetey (talk) 15:52, 21 July 2018 (UTC)
Thanks, sounds like I have a personal issue, though cannot say why. Reverting to a point in time hasn't helped. I must be missing something obvious. — billinghurst sDrewth 01:45, 22 July 2018 (UTC)

H. M. Elliot[edit]

I think it will be benificial if H. M. Elliot's work "The History of India as Told by Its Own Historians" is covered and made available here. It is one of the most authoritative, even if quite biased, source for history of Muslim rule of India to early colonisation period. The authors are long dead and the publisher no longer exist. It was published first in UK. Any advice on how to go about it will be helpful. MonsterHunter32 (talk) 16:00, 22 July 2018 (UTC)

See Help:Beginner's_guide_to_adding_texts.— Mpaa (talk) 17:22, 22 July 2018 (UTC)
here is a search of internet archive [9] appears to be 3 volume work --Slowking4SvG's revenge 00:34, 23 July 2018 (UTC)
This is a 8-volume work: Links at Author:Henry Miers Elliot. Hrishikes (talk) 01:26, 23 July 2018 (UTC)

Tech News: 2018-30[edit]

09:44, 24 July 2018 (UTC)

No TOC in Mobile View[edit]

(moving from user talk page) Greetings! The page Translation:Likutei_Moharan when viewed in Mobile, does not show the TOC. The code has the {{toc|limit=3}} code and the TOC works great in Desktop, but I don't see it in Mobile. Any idea? Thanks, Nissimnanach (talk) 02:14, 25 July 2018 (UTC)Nissimnanach

Bump. Also this problem seems general and not dependent on the toc limit -- that is, this page too in Mobile did not show toc for me. Nissimnanach (talk) 22:03, 25 July 2018 (UTC)Nissimmananch
Looks like this is a feature of the 'Minerva' skin used by the mobile site (eg. en.m.wikisource.org), which does not display the built-in Table of Contents. See, for example: https://en.wikisource.org/w/index.php?title=Translation:Likutei_Moharan&useskin=minerva. This is also the same behavior found on the other wiki sites (eg. wikipedia itself), which use the same or substantially similar theme. --Mukkakukaku (talk) 06:40, 3 August 2018 (UTC)
Thanks but so is that legit? Wouldn't you like to have the TOC on mobile devices, especially when a phone is say 3x5" compared to a 17" PC screen! Am I in the right place here to complain about this or should this discussion then be moved to a Minerva developers'/community page which where would that be. Nissimnanach (talk) 14:35, 5 August 2018 (UTC)Nissimnanach
If you browse Wikipedia in mobile, you'll see that instead of having a TOC they instead decided to make all of the sections 'collapsible', and have all of the sections be closed. So when you go to a longish article, what you'll see is the introductory paragraph(s), followed by all of the section headers in the article. It works well enough in Wikipedia for their use case (articles). Not so much here.
This is rather an intrinsic part of the Minerva style design. I'm not entirely sure where the appropriate venue is to complain about it. That being said, you may want to consider using a template like {{Auxiliary Table of Contents}} and moving each chapter to a sub-page of the work. --Mukkakukaku (talk) 16:02, 5 August 2018 (UTC)
Thanks. I've divided the page into three major parts, and added that TOC template to the main page. Each of those three parts has its own sections and toc. How can I get those TOC's transcluded into the main page TOC? I've tried

{{CompactTOCalpha-fromsubpage}}

which didn't work (it's on the page code currently) and I don't want the alpha so it seems there needs to be a TOC-fromsubpages|limit=# template.. Nissimnanach (talk) 21:34, 5 August 2018 (UTC)Nissimnanach

For last question, moved to new topic below, how to transclude TOC from subpages Nissimnanach (talk) 21:04, 8 August 2018 (UTC)Nissimnanach

Google OCR[edit]

Following up a discussion from WS:Scriptorium/Help#OCR text layer is off for ~second half of book.

Thanks to @Hrishikes: suggestion, I've installed the Google OCR. When it works, it seems to usually do a very good job. However, two things:

  • Often, it seems to have no effect whatsoever. Is this a known bug? Anything I can do?
  • Occasionally, the original OCR seems to offer better results. I'll try to document an example, if folks think that would be useful.

@Billinghurst: Is the proposal to simply enable a gadget, which would place the Google OCR button alongside the existing one (as the instructions Hrishikes gave me do)? Or is it to permanently replace the existing one? If the latter, my questions above seem more pressing. -Pete (talk) 18:01, 26 July 2018 (UTC)

@Peteforsyth: I have no specific plan, I simply gadgetised Google OCR to make it more readily available. It is up to the community to determine what happens in the gadget space. [I am aware that Mpaa has done some magic to leverage Google OCR with pywikibot, which is aligned with the tool and any of its strengths and weaknesses, though not directly with the gadget.] — billinghurst sDrewth 21:31, 26 July 2018 (UTC)
I would keep both buttons. So one can easily compare which is better on a case by case basis. I see no harm in it.— Mpaa (talk) 07:58, 27 July 2018 (UTC)
@Peteforsyth: Installing Google OCR in common.js of the user and choosing the gadget should have same effect; it should not replace the existing Tesseract OCR, except by specific inactivation in Mediawiki:common.js. Now about the quality. It would be helpful if you could provide specific example where Tesseract output is better than Google output. I have used both, but am yet to see such a phenomenon, at least I don't remember. Anyway, various ocr applications give different outputs. Most books here are from IA and have pre-existing ABBYY ocr. The ocr buttons are for Tesseract ocr and Google Cloud Vision ocr. Google Cloud Vision ocr recognises the language of the Wikisource domain and gives output accordingly. For example, it gives excellent Bengali ocr in Bengali Wikisource, but does not recognise Bengali script in English Wikisource. Sometimes it is overloaded (= Google ocr is external, it fetches the ocr from Google; not internal like Tesseract), and doesn't give any ocr. But you will get the ocr if you try after some time. For doing ocr of the whole book semi-automatically, you can use Google Drive ocr, by using the OCR4wikisource script. Google Drive ocr can also be used by direct uploading of the pages as images or short pdfs to Google Drive and then opening with Google Docs. The output is slightly different between these two types of Google ocr. I have also used Microsoft Office Document Imaging for creating ocr layer for djvu files created offline. All these ocrs have different outputs, but generally, ABBYY and Google are best, IMHO. Hrishikes (talk) 10:26, 27 July 2018 (UTC)
@Hrishikes: I've been on the lookout for places where Google OCR underperforms. I agree that it's generally very good, but I know you're interested in the anomalies, so here are a couple examples. On this page of poetry, it seemed to get very confused by indented lines, and put them in odd places in the text. On this page, it somehow missed the word "still," and instead put simply the letter "s". (The first change visible in the diff.) I've encountered both these errors on various pages, these are fairly representative examples. FWIW. -Pete (talk) 21:57, 7 August 2018 (UTC)
@Peteforsyth: Yes, I am aware of such issues. When I encounter these (see this page, for example, in ocr mode), I just correct them and move on. Hrishikes (talk) 03:22, 8 August 2018 (UTC)

Thanks all for all the clarifications. @Billinghurst: I did think you were discussing something more ambitious than you've described; making a gadget out of a useful user script is of course an uncontroversial benefit.

@Hrishikes: Very helpful info. On reflection, I'm not sure I want to stand behind my earlier statement that Google OCR is sometimes worse; I don't remember very clearly, but it may be that I'm thinking of cases where both OCR buttons yield less-than-satisfactory results, with different kinds of problems. Here's a page where that's the case. (Note, perhaps its yellow background has something to do with it.) -Pete (talk) 07:07, 30 July 2018 (UTC)

@Peteforsyth: Uploading the page image to Google Drive gets better OCR, I have checked. I did not put the output in the page as you have already edited it. Hrishikes (talk) 07:36, 30 July 2018 (UTC)
@Hrishikes: Interesting. Please feel free to do so if it's still handy; I only saved the text layer with very minimal edits, I haven't done anything substantial there yet. -Pete (talk) 07:45, 30 July 2018 (UTC)
@Peteforsyth: Please check. Hrishikes (talk) 07:53, 30 July 2018 (UTC)
Yes...vastly better, thank you! -Pete (talk) 08:02, 30 July 2018 (UTC)

Boys' Life[edit]

I'm new here. If this magazine is still being published today under the same name, are old versions of it (from 1923 and under) in the public domain? I'd like to make freely accessible here old versions of the magazine, as I find them very entertaining historical reads. Learning about young people from the 1910s-20s and the Scouting movement back then is my intention with this pursuit. PseudoSkull (talk) 19:31, 26 July 2018 (UTC)

Under US law, anything published before 1923 is in public domain, including old issues of magazines. The license may not extend to other countries, depending on the author's date of death, but for en.WP, we concern ourselves with US copyright status. --EncycloPetey (talk) 19:34, 26 July 2018 (UTC)
@PseudoSkull: And just in case I'm talking down to you, please let me apologize beforehand: the fact that it's published today may mean that someone has a trademark on the name "Boy's Life" and certain trade dress associated with it but that trademark cannot serve as a perpetual copyright on previous work. That may be where a confusion is setting in: whether or not something is continuously published today will not impact that old copyrighted material but may make a big difference in trademark issues. —Justin (koavf)TCM 19:38, 26 July 2018 (UTC)
First copyright renewals for periodicals says "Boys' Life: issues renewed from July 1934 (v. 24 no. 7); see 1962 Jan-Jun; contributions renewed from Oct. 1923; see Jan-Jun 1951". We'd have to be careful, but we could go about a decade beyond 1923. I don't know where he's seeing Oct. 1923; the first renewed contribution I see is in 1927. 1923 works will also be clearly public domain in January.--Prosfilaes (talk) 03:08, 27 July 2018 (UTC)
don’t see much scanned at Internet Archive. [13] one standout issue 1924-01 [14] ; they appear at google books [15] but not scanned? i see they have an archive site with scans. https://boyslife.org/wayback/
maybe a trip to the library is in order, in addition to Dance Magazine. Slowking4SvG's revenge 19:43, 27 July 2018 (UTC)
@Slowking4: Don't worry; all volumes are on Google Books. I'm doing the monking, since it actually helps with my reading, if you guys don't mind me doing that. See my contributions in my user space. PseudoSkull (talk) 02:08, 28 July 2018 (UTC)
i would suggest uploading them to internet archive, and then to commons using the IAuploader, (then we can have side by side transcription. see also Help:Beginner's guide to adding texts. Slowking4SvG's revenge 02:12, 28 July 2018 (UTC)
@PseudoSkull: Without side by side scans, then the works will never be validated, and will not feature at our site. To get quality-rated works, we do recommend scans. — billinghurst sDrewth 05:22, 29 July 2018 (UTC)
Well, I finished what I guess you would call the first stage of rewriting the story "The Lost Express". I feel good that I accomplished that. I understand I'm probably gonna need to do a lot of extra things to these before they can be officially published here, including at least one personal proofread and probably a peer proofread as well, but I will work on that later. I look to do this whole reading-writing thing more often now, so that's where my contribution to Wikisource comes in. By the way, unrelated, but you guys should probably give that story a read. It was full of great creativity and suspense and was a very enjoyable reading experience for me, so just throwing it out there in case you want to enjoy it yourselves. PseudoSkull (talk) 06:43, 29 July 2018 (UTC)
@PseudoSkull:: see comments above about side by side scans. You will soon find out that you will have to redo a lot of work if you do not start with the right step.— Mpaa (talk) 10:06, 29 July 2018 (UTC)
So you mean you want me to give it a second scan? (as in, buy this and scan it)? I mean surely the book is legitimate in the first place, I won't have to redo any steps due to just the nonexistence of 2 scans right? PseudoSkull (talk) 14:54, 29 July 2018 (UTC)
You were given advice about the process above. See Help:Beginner's guide to adding texts, step 1. Get a file with scans where you (legitimately) find it.— Mpaa (talk) 16:17, 29 July 2018 (UTC)
cut and paste text layer is so 2011, and gutenberg-like. we have been going through and redoing the old ones, i.e. [16] by doing the setup work, we can host complete issues in a e-reader / phone friendly format, with a good provenance to the scan.
do not need to buy issues. the old scans are at their archive site and google books.[17] (and we normally book scan at library) please experiment with uploading complete issues
ok did first one https://archive.org/details/2ThyM8T1J4C, bouncing at commons for the moment.[18] try some other issues - i can help if you have questions. Slowking4SvG's revenge 17:23, 29 July 2018 (UTC)
File:Boys'_Life_Mar_1,_1911.djvu.— Mpaa (talk) 21:00, 10 August 2018 (UTC)
thanks Index:Boys' Life Mar 1, 1911.djvu , we could use a commons template:magazine for the issn number, like book Slowking4SvG's revenge 16:49, 14 August 2018 (UTC)

Original contributors' works[edit]

As a public domain and free-knowledge enthusiast, I have been thinking of a place to publish some of the things I wrote as a kid for the whole world to see...to use for whatever purpose they want to. No matter how juvenile or immature these works may have been, I somehow feel it is my right to make them freely accessible.

Also, I am in the process of writing a novel right now, and want to make it publicly available when it's done, somewhere.

Your inclusion criteria mentions "added value", a concept which I'm afraid I don't understand. What "value" are you talking about? Does community consensus alone verify that a work is important enough to be included here, or does some technical criterion, such as the work actually having been officially published somewhere?

I understand that my (or most other users') writing probably doesn't belong in the mainspace, but I just want to clearly understand what exactly draws the line and what doesn't. Also, if I copy my free content into subpages of my user space, is that going to get deleted as clutter? If not, unrelated to Wikisource, what is the best place to publish original content to the public domain, which is easily searchable, findable, and navigable? Probably FAQ types of questions, but please understand I'm new. PseudoSkull (talk) 20:05, 26 July 2018 (UTC)

The "added value" clause is interpreted in a number of ways. For instance, we invoke it for unpublished journals or letters of famous individuals. Its spirit is to allow for items that are likely to be of significant interest to our users in some way, or which have some academic or historical value by virtue of their provenance. Original contributions, however, tend to be frowned upon. We don't bar them entirely, but we usually want the works to have undergone some sort of vetting process, such as a thesis which has had academic review, or works whose author has gone on to achieve fame or notoriety. We do also have some efforts here to provide Wikisource-created original translations of published works, but that is a different animal with its own set of guidelines.
I'm not sure what sorts of places would welcome original self-publication, with a wide audience, or without assessing fees to publish. That's a question better answered by someone else. --EncycloPetey (talk) 20:26, 26 July 2018 (UTC)
I'm not sure who you are but it's possible that you are a notable person whose unpublished work could be reprinted here. (e.g. scientific work not published). I personally would have no problem with you having personal writings in the username space assuming 1.) you didn't have some wildly excessive amount, 2.) they don't otherwise infringe on copyright, and 3.) they weren't obscene. But really this isn't the platform for first publication except in some very narrow circumstances. —Justin (koavf)TCM 20:28, 26 July 2018 (UTC)
To clarify, I am not a notable individual by anyone's standards, and my works are not scientific or academically notable (if academic at all). I only want to publish some things I wrote when I was younger because I want to allow people to access and use them freely. It is for no purpose of personal pride in these works that I want to make them accessible (in fact, many I today think are ridiculous), but only for the mere reason of making them accessible.
I was wondering if the userspace here is a good place to do this, and to link to on other wikis. Sounds like it is; by analogy, entry-looking neologisms are allowed in reasonable amounts on Wiktionary userspace, articles on non-notable topics allowed in the Wikipedia userspace, so it makes sense to do the same here. There is no overly excessive amount, probably less than 15, and even that's a stretch. PseudoSkull (talk) 21:34, 26 July 2018 (UTC)
We have given lots of latitude in user namespace. Be reasonable, be practical, don't stretch friendships, don't break copyright. This is covered at WS:WWI#Original contributionsbillinghurst sDrewth 21:42, 26 July 2018 (UTC)
(ec) @PseudoSkull: I think that you can say we apply a "notability" standard. Generally we are reproducing work that has passed a review process be it fiction or non-fiction; or documents of historical significance/notability (ie. stood the test of time), so best guidance is WS:WWI. Within that there will be argument, and I would say that where it is a Wikimedia-related publication then we have possibly been more accommodating, eg. conference presentations. Maybe also peak at Wikisource:For Wikipedians for some flavour. — billinghurst sDrewth 21:40, 26 July 2018 (UTC)

I would also recommend posting to https://everything2.com/Justin (koavf)TCM 22:02, 26 July 2018 (UTC)

Size of editing window[edit]

Do I remember that one used to be able to set length/width size limits in Preferences for one's editing window? Has this option gone away, is it somewhere other than preferences, or am I not remembering correctly? Thanks, Londonjackbooks (talk) 08:53, 29 July 2018 (UTC)

Your memory serves you right, but the feature to set the number of rows for editing was removed a long time ago. I set my windows sizes, text width, and font size in my common.css and common.js. Please feel free to copy. Ineuw 10:17, 29 July 2018 (UTC)
(ec) Yes, it used to be on the editing tab. I did a quick search on w:MediaWiki 1.30, mw:MediaWiki 1.31 and mw:MediaWiki 1.32 for window size though didn't see anything obvious. You may wish to try alternate searches to see where it disappeared. — billinghurst sDrewth 10:25, 29 July 2018 (UTC)
If I can set dimensions myself using css or js pages, that would be ideal (but @Ineuw:, I am not sure which entry at your pages is what I am supposed to copy. It's not evident to me). The width of my window is fine, but I would like to play with the length (it needs to be shorter—that is, fewer lines—than it currently is). @Billinghurst: Wondering how would my searching for the case of the disappearing feature help me accomplish what I am looking to do? Thanks both! Londonjackbooks (talk) 20:11, 29 July 2018 (UTC)
@Londonjackbooks: Waited for you to let me know what you needed. I can copy and install it for you. Are you editing side by side where text edit is in a column on the left, or over and under where the text edit is in the lower window? — Ineuw talk 21:20, 29 July 2018 (UTC)
@Ineuw: I edit using side-by-side. Thanks, Londonjackbooks (talk) 04:27, 30 July 2018 (UTC)

New user group for editing sitewide CSS/JS[edit]

Tech News: 2018-31[edit]

14:05, 30 July 2018 (UTC)

Help for identifying an English poem, please[edit]

Hello,

I contribute on fr.wikisource, and I found this poem, handwritten on a book of French poetry by Alfred de Musset. It is NOT Musset's.

After a quick search, I found that this poem was published in the Atlantic monthly, #130, july 1922, maybe page 789[1]. But I cannot access this scan from France.

Could someone please help me identify the author of this poem ? Thanks for your help. --Hsarrazin (talk) 18:22, 1 August 2018 (UTC)

  1. https://books.google.fr/books?id=okcwAQAAMAAJ
@Hsarrazin: The author of the poem "The Name" is Jean Kenyon Mackenzie. On previous pages of the Alfred de Musset book you linked are the previous two poems appearing in that Atlantic Monthly. Despite Google Books not allowing views, Hathitrust has it in full view https://babel.hathitrust.org/cgi/pt?id=uc1.31970021659070;view=1up;seq=799. -Einstein95 (talk) 03:09, 3 August 2018 (UTC)
I had posted the following reply yesterday, but a bot "archived" it away.
The poem appears on page 789, as you cited it, and is titled "The Name". It is the third of five poems on pages 788-780, all by Jean Kenyon Mackenzie d:Q50661279 --EncycloPetey (talk) 03:33, 3 August 2018 (UTC)
Thanks, EncycloPetey. I do not think of Hathi Trust generally, because a lot of books cannot be seen fron France. But this one is fine. Thank you… :) --Hsarrazin (talk) 10:24, 4 August 2018 (UTC)
I wonder wether we'll edit it of frws, because they are in English. Anyway, I will put them on wikidata with link :)

{{Upright}} vs. {{Normal}}[edit]

Just came across these two templates that not only do the same thing, but also consist of the same code. It appears by a large margin that {{normal}} (and it's redirect {{n}} ) is the vastly preferable one of the two. -Einstein95 (talk) 03:23, 3 August 2018 (UTC)

Because they're identical, I've taken the liberty of making {{upright}} a redirect. —Beleg Tâl (talk) 00:06, 4 August 2018 (UTC)
@Beleg Tâl: You might recall now {{upright/doc}} is orphaned and clean that up too one day? As of a few minutes ago (c.f. Special:WhatLinksHere/Template:Upright/doc) nothing at all references this page. (O.K. now this note does!) 114.73.42.69 02:08, 4 August 2018 (UTC)
Yes check.svg Done and thanks —Beleg Tâl (talk) 02:10, 4 August 2018 (UTC)

Something is wrong with the POTM[edit]

The page images for this month's POTM, Index:The Sikhs (Gordon).djvu, are three pages removed from the pages from which the preview text is coming. Can anyone fix this? BD2412 T 17:17, 3 August 2018 (UTC)

Seems like a job for @Mpaa:. For what it's worth, I'm willing to learn to do this, if it's something you can guide me through...seems like we're always relying on you to deal with stuff like this, and I'm not sure that's entirely fair. But it's much appreciated :) -Pete (talk) 18:27, 3 August 2018 (UTC)
Done.
No black magic, just patience. I use the djvulibre set of tools to extract the text layer (usually djvused), and a python script to realign page refs vs. page text. Sometimes djvused crashes if the text layer is malformed. In that case I use djvutoxml to generate xml file and compare it with the original on IA for the pages that generate the crash. Some this also crashes ... and then, on rare occasions, I need to insert some debug prints and recompile djvulibre tools.
Beware that IAupload renames the djvu internal files, so it is a bit tricky to compare the original xml on IA with the one generated by djvutoxml.
I think there is something fishy somewhere in IAupload tool, which is triggered under some conditions. I have some ideas and suggested an improvement on a ticket in PHAB but it got no attention yet (I would be glad to help but I do not have access to the tool).— Mpaa (talk) 19:25, 3 August 2018 (UTC)
and need to sort pagination for illustrations. i would expect POTM is be sorted before jumping the queue. please. Slowking4SvG's revenge 23:54, 3 August 2018 (UTC)
I notice IA upload tool gives an option to use the original text layer or create a new one from the archive's jp2 or pdf file, and curious which was used in the example. My concern is this: Does the tool, using the 'original' setting, produce a File that may differ from a manual upload? CYGNIS INSIGNIS 04:38, 4 August 2018 (UTC)
Because IA didn't have a djvu file, I set the tool to create one from the jp2 files. The tool then produced the faulty offset version. I had completely forgotten that it does this sometimes and so didn't think to check it before going to bed. The tool does this strange thing whereby it inserts some page images with no text into the djvu, but not into the text layer. This then results in the offset occurring. Beeswaxcandle (talk) 05:07, 4 August 2018 (UTC)
Thanks for clarifying the background. I had been using the tool for multiple volumes, with high confidence in the files selected by BHL from IA, and didn't want to create a problem where none existed. BTW, the library that generated the general's text did provide _djvu.txt and _djvu.xml files separately, which the bot wouldn't recognise, I assume that would be another way to create the file if it comes up again. CYGNIS INSIGNIS 06:55, 4 August 2018 (UTC)

Adapting Template:pd/1996 or a new template[edit]

As per previous conversation started by Prosfilaes, from next year US-published works published in 1923 will be out of copyright, and progressively year by year others will follow. We need to start working on whether we will adapt Template:pd/1996 to have wording that says that the work is out of copyright, and reconfigure that template to set triggers. Or whether we are going to implement a new template for post 1922 works. (Full coverage at copyright tags.) — billinghurst sDrewth 09:34, 5 August 2018 (UTC)

Don't the 1996-series of template primarily apply to works published outside the US? For works inside the US, we've been using the 1923-series of templates, and I would assume that it's the 1923-series that would need to be adapted to accommodate US-published works from 1923. It would be odd to have "published before 1923" to be a reason a work is in PD, if works published before 1924 is the actual set of works in PD. --EncycloPetey (talk) 15:55, 5 August 2018 (UTC)

You are correct that the pd/1996 has been non-US first publications to this point, and there would be complications in updating the template. Template:pd/1923 is set, and incrementing Template:PD/19xx is possible, though becomes a lot of templates. It is why I brought up the issue as we have to get the wording right, and look to the easiest means to progress through the years. As 1977 is the next US copyright milestone, maybe it is something like pd/1978 with both a year of birth AND year of publishing as parameters, where year of publishing flicks between copyright and not copyright.

billinghurst sDrewth 22:46, 5 August 2018 (UTC)

Tech News: 2018-32[edit]

19:39, 6 August 2018 (UTC)

Release of AkBot block[edit]

I would like the block on my bot's account AkBot to be released. I do not intend any edits in this wiki using the bot account, so the bot flag is not needed. However, I would like to receive IRC cloak for the bot account which is not possible for blocked accounts. @Billinghurst, @EncycloPetey: you may wish to comment as you were involved in the block few years ago. Ankry (talk) 08:17, 7 August 2018 (UTC)

Yes check.svg Done Please do be sure to meet the requirements of running a bot before doing so again. --EncycloPetey (talk) 14:32, 7 August 2018 (UTC)

Portal:American literature[edit]

Today I discovered that we had never started a Portal:American literature, so I started one.

The new portal needs a lot of work, and anyone who wishes to help may find Library of Congress Classification/Class P#Subclass PS useful. --EncycloPetey (talk) 16:24, 7 August 2018 (UTC)

How to Transclude TOC from Subpages?[edit]

The page Translation:Likutei_Moharan has its content split into three subpages.

Each subpage has its own Level Two sections and TOC. How can I get those TOC's transcluded into the main page TOC?
Because the work is in progress and some sections are not yet done, a static TOC on the main page is not desired; a transcluded TOC is wanted.

I've tried {{CompactTOCalpha-fromsubpage}} which didn't work (it's on the page code currently), and I don't want the alphabetical style.
I don't see a sort of "TOC-fromsubpages|limit=#" template.

How can this be done?

Nissimnanach (talk) 21:15, 8 August 2018 (UTC)Nissimnanach

where is the match and split? where is the commons source? you could transclude from base page if you had it. for example, you have a title page for this work only File:Likutey Moharan.gif -- Slowking4SvG's revenge 10:53, 14 August 2018 (UTC)
@Nissimnanach: I don't think this is actually possible. The TOC on each subpage is automatically generated from the section headers on that page, so it would only appear on the main page if you were to transclude the section headers manually. The list of subpages should be sufficient. —Beleg Tâl (talk) 12:34, 14 August 2018 (UTC)
We probably could do it with section tags, or with <includeonly> tags, my question would be why? We are better off to just manually apply it rather than force the system to perform black magic through complex page calls. You have the ToC on the subpage, just copy, paste and reformat. — billinghurst sDrewth 23:20, 14 August 2018 (UTC)

Glasgow Advertiser - 13 August 1792 - Richard Arkwright[edit]

I'm still learning. How cold Glasgow Advertiser - 13 August 1792 - Richard Arkwright be improved? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:20, 8 August 2018 (UTC)

On en.Wikisource, we don't have categories by author or people. Also, it isn't necessary to categorize by work, if the work has a base page built into the name: e.g. Blackwood's Magazine/On the Cockney School of Poetry VI.
Also missing a license; now added. --EncycloPetey (talk) 22:18, 8 August 2018 (UTC)
@EncycloPetey: Thank you. I've moved it to Glasgow Advertiser/1792-08-13 - Richard Arkwright and create a parent page. However, without a category, how can we associate the text with (other Wikimedia content about) Richard Arkwright - not least the related Wikidata item, d:Q294153? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:09, 9 August 2018 (UTC)
(1) If the person in question is a published author, then we create an Author page in the Author namespace, and list the item under "Works about Arkwright". (2) Otherwise, you can create a data item for the obituary, and link the obituary's data item to the subject's data item using the Wikidata property "described by source". (3) There is also a {{wikisource-inline}} template on Wikipedia that allows for the listing of individual Wikisource works in the References or External links section. --EncycloPetey (talk) 14:36, 9 August 2018 (UTC)

ſ[edit]

In English texts, like 'Description and Use of a New Celestial Planisphere', should we change "ſ" to "s", or transcribe ("tranſcribe"?) the character as written? Substitution certainly aids readability for a modern audience. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:06, 9 August 2018 (UTC)

You will find that opinions on this vary among contributors. I would recommend using the {{ls}} template: this renders as ſ in page space and s in mainspace, but there is a script to let users choose how they see the template. BethNaught (talk) 13:35, 9 August 2018 (UTC)
While there is opinion, the community's consensus is that the output in the main namespace should typically be "s", unless the work requires the production of it; covered in Wikisource:Style guide/Orthography. BethNaught mentions an option for Page: ns production, with options. — billinghurst sDrewth 13:53, 9 August 2018 (UTC)
@Pigsonthewing: What BethNaught and billinghurst said. And there are charset issues with using the raw long s character directly, so if included it should be by way of the template. However, to give you an answer that I think addresses your question directly, nobody is going to complain if you modernise this character to "s". --Xover (talk) 14:43, 9 August 2018 (UTC)
The underlying thought is that preserving archaic orthography typically matters only when the text is highly significant in some way, either historically or academically, such at the original handwritten text of the US Constitution, or the First Folio of Shakespeare. For most works, the shape of an "s" in the text is irrelevant. --EncycloPetey (talk) 15:08, 9 August 2018 (UTC)
It's marginally orthography; it's as much typography. When G.W. revised English orthography in Magazine, he left the long-s as is. When the long-s was dropped, nothing else in English changed. It's more a part of certain font styles than actual orthography.--Prosfilaes (talk) 00:45, 10 August 2018 (UTC)
Handwritten text is not a matter of typography; it does not involve the setting of type. Please read again what I wrote, which includes a well-known handwritten example. --EncycloPetey (talk) 15:16, 10 August 2018 (UTC)
Handwritten text is subject to variations, positional and otherwise, usually far more than typeset material. It's not typography--I don't know of a neat word for it--but it's a matter of handwriting style as much as a matter of orthography.--Prosfilaes (talk) 19:16, 10 August 2018 (UTC)
The "neat" word is orthography = "writing correctly". It is only the more modern usage of the word that tends to limit its definition to spelling; the broader sense of the word includes handwritten forms. Some people segregate handwritten forms to graphology, but that split is not universal. --EncycloPetey (talk) 00:35, 11 August 2018 (UTC)

Typo on the front page[edit]

"Observations upon the Dublin Bills of Mortality (1863)". Date of text is 1683. Celuici (talk) 20:19, 10 August 2018 (UTC)

Fixed, thanks.— Mpaa (talk) 20:41, 10 August 2018 (UTC)
My mistake! Sorry! --Dick Bos (talk) 16:42, 12 August 2018 (UTC)

Google notice[edit]

Do we still consider the Google notice at the beginning of a scan from that source as something that must be fixed prior to proofreading? Or can we just mark those "no content" and ignore such pages?

I'm working my way through the category of indices to check and there's a lot whose only problem is the presence of that Google scan notice is present. I'd rather not have to mark them as 'to be fixed' since the backlog in that category is insane.

Thanks. Mukkakukaku (talk) 20:12, 11 August 2018 (UTC)

The main problem with it is that it pushes all the ensuing pages out by one. This means that left and right pages show up incorrectly as right and left. This is why the iaupload tool now offers the option of deleting the first page. There is also the minor issue of some Commons admins deeming that page to be copyright and therefore the whole book is not permitted to be on Commons. So, yes, they should be fixed (unfortunately). Beeswaxcandle (talk) 22:59, 11 August 2018 (UTC)
Commons does not necessarily delete such files. Please see c:Commons:Deletion requests/Google books with Google first page -- Hrishikes (talk) 02:56, 13 August 2018 (UTC)
That deletion discussion is weird. The conclusion is "delete" but the rationale provided is "Public Domain no matter what Google claims. Faithful repro doesn't create new copyright. Just a generic page added." ... Either way. I'm marking those files as needing fixing. Mukkakukaku (talk) 02:59, 13 August 2018 (UTC)
As I understand previous thought at Commons on the matter, it is the notice from Google that is under copyright and can't be reproduced. So it's the notice itself that is the problem—not the claim made by Google, which is deemed to have no validity—but the notice page stating the claim. --EncycloPetey (talk) 03:14, 13 August 2018 (UTC)
What about the "new" form of the Google notice which doesn't make a copyright claim? Like the one on the first page of this file? I've been treating all forms of Google notice the same, but it only occurred to me that these new ones that don't make a claim might be ok. (Though I personally am on the side of cleaning up the scan because it's just cruft.) Mukkakukaku (talk) 03:52, 13 August 2018 (UTC)
My opinion is that it should go away. The Google logo is potentially an issue, but even if it isn't, the left/right and odd/even pagination will be off for the entire work because of that extra Google page. --EncycloPetey (talk) 15:19, 13 August 2018 (UTC)

Permissions for MpaaBot[edit]

Hi. I was investigating the possibility of changing Proofread status via pywikibot and I had problems using MpaaBot instead of Mpaa (see https://phabricator.wikimedia.org/T173385).
Digging into it in the ProofreadPage extension, I think it narrows down to MpaaBot not having the 'pagequality' rights.
My understanding is that such right comes automatically for all users, see https://en.wikisource.org/wiki/Special:ListGroupRights.
If so, how is it that MpaaBot does not have it?

I get the following via pywikibot.

Mpaa
"groups":["abusefilter","bureaucrat","sysop","*","user","autoconfirmed"]
"rights":["abusefilter-modify","noratelimit","oathauth-enable","override-antispoof","suppressredirect","deleterevision","deletelogentry",
"editcontentmodel","edituserjs","editusercss","block","createaccount","delete","deletedhistory","deletedtext","undelete","editinterface",
"editsitejson","edituserjson","import","move","move-subpages","move-rootuserpages","move-categorypages","patrol","autopatrol","protect",
"editprotected","rollback","upload","reupload","reupload-shared","unwatchedpages","autoconfirmed","editsemiprotected","ipblock-exempt","blockemail","markbotedits","apihighlimits","browsearchive","movefile","unblockself","mergehistory","managechangetags","deletechangetags",
"editsitecss","editsitejs","tboverride","titleblacklistlog","transcode-reset","transcode-status","globalblock-whitelist","nuke","skipcaptcha",
"abusefilter-log-detail","massmessage","read","edit","createpage","createtalk","writeapi","viewmywatchlist","editmywatchlist","viewmyprivateinfo",
"editmyprivateinfo","editmyoptions","centralauth-merge","abusefilter-view","abusefilter-log","vipsscaler-test","reupload-own",
"minoredit","editmyusercss","editmyuserjson","editmyuserjs","purge","sendemail","applychangetags","changetags","pagequality","spamblacklistlog",
"mwoauthmanagemygrants","collectionsaveasuserpage","collectionsaveascommunitypage"]
MpaaBot
"groups":["bot","*","user","autoconfirmed"],
"rights":["noratelimit","bot","autoconfirmed","editsemiprotected","nominornewtalk","autopatrol","suppressredirect","apihighlimits","writeapi","skipcaptcha",
"read","edit","createpage","createtalk","abusefilter-view","abusefilter-log","reupload-own","move-rootuserpages",
"move-categorypages","minoredit","purge","applychangetags","changetags","reupload","upload","move"]}}}
It comes automatically for users. It does not come automatically for bots. If you want to add that feature to your bot, we'd need to have a request with rationale. We've usually considered changing the proofread status to be a highly significant change, implying a user proofreading against a scanned text. I don't see why a bot would need to mark proofread status. --EncycloPetey (talk) 02:47, 13 August 2018 (UTC)
Proofreading against a scan can be done manually offline. The result can be uploaded in bulk via API instead of the standard page-by-page web interface. The bot is just like a CLI tool to post content.— Mpaa (talk) 08:37, 13 August 2018 (UTC)
Mpaa, can PageQuality be applied through the API? Wondering whether this may be a front end issue that PQ needs to go via the interface, or an equivalent of the interface. There are number of restrictions around PQ so guessing that it is something bundled up with those rules. I can test with sDrewthbot at some point to see if it can force things with that. — billinghurst sDrewth 00:19, 14 August 2018 (UTC)
Yes, it can (with the same rules as standard interface). But not for Bot accounts because they miss the 'pagequality' right. BTW, I think it is something good to have, not a negative thing.— Mpaa (talk) 07:01, 14 August 2018 (UTC)
One more info. The difference is not in being a standard account or a bot account, but how the account is logged in. If the account is logged in with OAuth, the set of allowed rights is limited. I wonder where to ask to add "pagequality" as a right for OAuth authentication.— Mpaa (talk) 09:05, 14 August 2018 (UTC)
@MarcoAurelio: You are more around OAuth than others and have a modicum of Wikisource interest, are you able to advise here with regard to a right that comes from mw:Extension:ProofreadPage? — billinghurst sDrewth 23:14, 14 August 2018 (UTC)
See also https://phabricator.wikimedia.org/T201904 .— Mpaa (talk) 15:32, 15 August 2018 (UTC)

Inline HTML broken[edit]

Page:The Columbia River - Its History, Its Myths, Its Scenery Its Commerce.djvu/151

Why? ShakespeareFan00 (talk) 08:08, 13 August 2018 (UTC)

Because the text was copy-pasted from Gutenberg, then matched-and-split, instead of being proofread from the source scan. --15:17, 13 August 2018 (UTC)
And also because the correct syntax for images is [[File:PictureName.jpg]] and the correct syntax for <br> does not have a space after < —Beleg Tâl (talk) 15:24, 13 August 2018 (UTC)
ok. i tend to replace missing image with raw image, to encourage crops. and then use crop tool and FI template. did your one example. the bot of page creations are nice, but it needs improvement / copyediting for hard things like image formating. Slowking4SvG's revenge 03:42, 14 August 2018 (UTC)

Tech News: 2018-33[edit]

17:53, 13 August 2018 (UTC)

Scans with missing pages[edit]

Inside of Category:Index - File to fix is a sub-category Category:Scans with missing pages. Is it possible to sort Index pages in there? (A template would be ideal if nothing else.) It appears to be unused, but it would be quite useful to be able to categorize pages in there since they're a distinct kind of fixing (as opposed to misaligned text layers, duplicate pages, page re-orderings, and google scan page removals). Mukkakukaku (talk) 05:04, 16 August 2018 (UTC)

It is filled from Template:Missing pages which designated for use on index pages. — billinghurst sDrewth 06:24, 16 August 2018 (UTC)
I don't think that template is working, then. I added it to Index:15 decisive battles of the world Vol 1 (London).djvu but the category remains empty. :( Mukkakukaku (talk) 23:52, 16 August 2018 (UTC)
Nevermind, the template wasn't detecting the namespace correctly. I took care of it (though there might've been a better solution, I'm only like 60% on this template syntax stuff.) Mukkakukaku (talk)

OK that being said, should we be proofreading the files with missing pages? The template indicates we should be -- "Placeholders have been inserted, so proofreading can be done without needing to move content later, if the missing pages are found." -- but I was under the impression that these files should be marked as needing fixing, and only returned to the proofreading pool once the missing pages were located and surgery performed on the file. There's a number of older works apparently utilizing the template in the "placeholders inserted, go ahead and proofread" workflow, while the newer ones I've begun tagging are marked for fixing. --Mukkakukaku (talk) 03:54, 17 August 2018 (UTC)

To me it is all about whether we can fix the work or not. If we can fix the work, then we should mark it for repair, or replace it.

That said, I have proofread works where they are missing maps through scans, or missing a page or two, and I have proofread as the remainder of the work was of sufficient novelty or interest, and there was no alternative version available. Sure it won't be a work capable of gaining our quality stamp, however, I would rather have it short a page or two, than not at all, and maybe we will get the missing pages in time. — billinghurst sDrewth 04:01, 17 August 2018 (UTC)

Agree with this. Example, Index:FirstSeriesOfHymns.djvu is missing one page, but the rest of the work is worth hosting regardless, and that one page can be easily repaired if a scan of that page is located. —Beleg Tâl (talk) 13:05, 17 August 2018 (UTC)
Seems like we might want to formalize a policy. What if they're missing illustrations? Index, table of contents, actual pages of a novel, appendices, etc. I'm sure it's a judgement call at times, but it seems to me that actual content pages are more "valuable" than non-content pages. (PS: I fixed your link to First Series of Hymns.)
Counter example: I recently deleted a copy of a Beatrix Potter book that was missing some illustrations. For those unfamiliar, half of the charm is the fact that they're classic hand-illustrated children's works, with very distinctive and recognizable illustrations of the animal characters. The works are only about 30 pages long, each, alternating text and illustrations. What if we were missing a page or two of illustrations? That's almost 10% of the work right there, even if it's not the text of the work, the illustrations are half the point anyone reads those books. (Actually I don't even recall the plots of most of them, just the illustrations.)
Counter example 2: Index:CAB Accident Report, United Airlines Flight 16.pdf. I personally wouldn't want this official government report proofread until someone tracks down the missing page 17 because I think the content is material to the integrity of the overall work, even when only a single page is missing.
Imagine reading something, a novel, only to get to a certain point and to find that the content is just not there because the scan was faulty and we proofread it anyway -- I personally wouldn't want that experience and would be extremely disappointed. Since it doesn't seem anyone is really working the backlog in the files for fixing (there's some easy stuff in here I marked for fixing five years ago and nobody's touched since), I think this isn't an altogether safe workflow to rely on. The First Series of Hymns isn't even categorized into the Indices with Missing Pages category, so it's invisible. Easily fixed by adding the template, true. But the point is that this isn't on anyone's radar for addressing. --Mukkakukaku (talk) 18:24, 17 August 2018 (UTC)
It is always a qualitative call, and we often make those calls. We have WS:PD for that reason. Trying to write a specific "policy" statement is too tricky. If we were to say anything it should be a qualitative statement within WS:WWI. The best we can talk about the deletion determinations that we have made based on quality. — billinghurst sDrewth 09:09, 18 August 2018 (UTC)

Unable to change status of What Can I Do^ - NARA - 534471.jpg[edit]

Yesterday I had a go at proofing Page:What Can I Do^ - NARA - 534471.jpg, however it doesn't appear to be associated with an index, so I am unable to change the pages status from "To be Proofread" to "To be Validated."

Is this a problem with the file or as expected for a single page? Is it possible to change the pages status? Sp1nd01 (talk) 08:35, 16 August 2018 (UTC)

@Sp1nd01: We have to force an index page, which I have done Index:What Can I Do^ - NARA - 534471.jpg. As we have to manually code the page: rather than use <pagelist /> the Index: name is flexible, though here it is just easy to match it. Transclusion is a little different in the code and is explained at mul:Wikisource:ProofreadPage, so if you have an issue getting it going, then please get back to us. — billinghurst sDrewth 04:12, 17 August 2018 (UTC)
Thank you for the help, I have created the pagelist and transcluded it. I think its now fully complete. unsigned comment by Sp1nd01 (talk) .

Visual Editor[edit]

There is most likely an issue with Visual Editor (see Wikisource:Scriptorium/Help#Wuthering_Heights discussion and related bug). I think we should consider to stop using it for now as (some) pages will need rework. In pl.ws they decided to block such breaking VE edits using AbuseFilter (see my talk page discussion).— Mpaa (talk) 15:32, 18 August 2018 (UTC)

Is it the VE itself? I see that this page has a double "to be proofread" header, and I see this quite a lot on pages. I assume you weren't using the VE to create these pages? So there may be an additional issue causing the duplication or complete removal of the page information about page status. --EncycloPetey (talk) 18:22, 18 August 2018 (UTC)
I am not sure but I think so, pages in question are tagged with 'visualeditor'. For the one you mention, I submitted a bug: https://phabricator.wikimedia.org/T202200 .-— Mpaa (talk) 18:37, 18 August 2018 (UTC)

TemplateStyles and book formatting[edit]

TemplateStyles are enabled now. Can we consider using this to make CSS for specific books and implement semantic HTML? For example, using real h2 tags instead of {{center}} and {{smaller}}. Suzukaze-c (talk) 03:17, 19 August 2018 (UTC)

That would be an odd choice of tags. h2 is for second-level headings, while all {{center}} and {{smaller}} do is either center the content or change the font size and make no implications as to page hierarchy. If anything they'd be replaceable with CSS text-align:center or margin:0 auto, and font-size:0.9em. Mukkakukaku (talk) 04:51, 19 August 2018 (UTC)
I am talking about emulation of header tags using these formatting templates, such as at Page:TRC Canada Interim Report.pdf/10. Suzukaze-c (talk) 06:51, 19 August 2018 (UTC)
The issue is that we have no standard H2, so how can we push that universally. We should definitely be looking to utilise TemplateStyles, however, I don't think that we are looking to override existing default styles. — billinghurst sDrewth 13:59, 19 August 2018 (UTC)