Wikisource:Scriptorium/Archives/2014-12

From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Announcements

Proposals

Standard handling of Errata

Some time ago a question was posted in the Help section of Scriptorium as to methods of handling published Errata. Before I go overboard (as if I haven't already!) has anyone any objections to my creation and usage of this template {{errata}}? Right at this instant there is but one page Page:Investigationofl00boolrich.djvu/73 (and yes, it is a nightmare of formatting, but that is not the issue at present) making use of the new template, so there ought to be fairly minimal impact to removing it if any serious objections come up. AuFCL (talk) 11:39, 19 October 2014 (UTC)

Wikisource:WikiProject DNB handled it with a footnote, see also template:DNB errata; i.e. Abbott, Edwin (DNB00). Slowking4Farmbrough's revenge 23:30, 2 November 2014 (UTC)

Utilising global spam filters

I put a suggestion to administrators at and about utilising global spam filters. These filters are a replication of what is used locally. If general users have a comment, or a question, it is probably appropriate to add them at the noticeboard. — billinghurst sDrewth 06:09, 15 November 2014 (UTC)

BOT approval requests

Following on from the discussion Wikisource:Scriptorium/Archives/2014-09#Automated import of openly licensed scholarly articles the WPOA team involved has utilised a bot to undertake some data imports into the WS: namespace to trial their processes. They have hunted me down at Wikimania2014 to seek feedback on the next steps that they need to undertake to meet our requirements. We have made some updates to their processes, and are ready for more testing over the next few days so we can deal with this in a sprint format. Letting the community know of the things taking place. They have some great technical people who can really build tools, and some of the tools that we are talking about have good prospects for helping with other data population to and from wikidata. — billinghurst sDrewth 20:07, 7 August 2014 (UTC)

I am not against this bot but I am a bit disappointed that I got no feedback on an issue I noticed some months ago, see Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central#New_Family_of_Bluish_Pyranoanthocyanins. I would have expected at least an acknowledgment, if they want to run a bot here.--Mpaa (talk) 18:02, 31 August 2014 (UTC)
@Maximilianklein, @Mattsenate:billinghurst sDrewth 02:21, 1 September 2014 (UTC)
I think this bot has some issues. I sampled a few random pages. This is one of them Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/Capturing_Natural-Colour_3D_Models_of_Insects_for_Species_Discovery_and_Diagnostics. Note also the empty template at the bottom. I hope we will get an answer.--Mpaa (talk) 18:26, 4 September 2014 (UTC)
Sampled again: this is from Sep, 9: Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/The_Invisible_Prevalence_of_Citizen_Science_in_Global_Research_Migratory_Birds_and_Climate_Change or Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/Global_Diversity_of_Sponges_(Porifera).
Left a warning to @Maximilianklein:.--Mpaa (talk) 18:58, 9 September 2014 (UTC)
BTW, still disappointed by lack of attention from a bot eager to run ...--Mpaa (talk) 19:05, 9 September 2014 (UTC)
Thanks for your comments, Mpaa, and sorry for the lack of feedback on our part - I hadn't noticed your messages before. Since User:Maximilianklein, User:Mattsenate and User:Klortho mostly work on the code base, criticisms and suggestions more closely related to the Wikisource end are best directed to me - I am here on an almost daily basis now, triggering the bot, checking the imports and keeping things in sync with our software development.
Yes, the bot has some issues, and we are working on them at several levels:
  1. the conversion from the XML at PubMed Central into MediaWiki-importable XML,
  2. customizations for Wikisource (example),
  3. the upload procedure,
  4. integration with Commons, Wikidata and Wikipedia.
Many of the issues only became apparent once we actually started to import articles here - we have two years of experience with importing files into Commons, but doing full articles is more complex by nature. Furthermore, there are inconsistencies in the XML that publishers deliver to PubMed Central, and while we had mapped them out in some detail for multimedia imports (see talk), it turned out that these inconsistencies affect full text imports even more than expected, and thus much of our coding is actually focused on building workarounds for these issues, while we continue to fix actual bugs in our system and add new features. We also engage in a working group that tries to address these XML inconsistencies at their origin, i.e. with the publishers. This will hopefully make automated imports more straightforward in the future. Finally, continued integration of Wikimedia projects with Wikidata also affects our workflows, and many details (e.g. whether and how the metadata for the imported journal articles and their authors should all go onto Wikidata) are not clear at the moment.
For all this, it is vital that we can do test imports in order to fix, refine or otherwise improve our workflows. For the moment, these imports will always go to subpages of our WikiProject, and we will move things over to the main namespace only manually, for articles that have been fixed as far as we can see.
Thoughts on any of this shall always be welcome as we move forward. -- Daniel Mietchen (talk) 06:57, 10 September 2014 (UTC)
@Mpaa:, I am still working on the bot, as @Daniel Mietchen: said, it is an iterative process that needs to be have periodic, still imperfect, test imports. I am also happy to address issues you find with the bot if they are constructive and specific. You can put them on my talk page or on directly on the github issue tracker. Thanks for giving the content a critical eye, we need it. Maximilianklein (talk) 21:27, 11 September 2014 (UTC)
Welcome.--Mpaa (talk) 00:22, 12 September 2014 (UTC)

Help

Repairs (and moves)

Other discussions

HHVM revealing new weirdness

Is anybody else forced into using the simple labeled section editing ( ## John Smith ## ) mode instead of the "old" style <section begin="John Smith" /> labeling - no matter the gadget setting to disable/restore it is set to - when HHVM (HipHop Virtual Machine) is enabled or is it 'just me' again? -- George Orwell III (talk) 06:12, 19 October 2014 (UTC)

You haven't indicated the type of issue that you are seeing. I can add section labels (in Page: ns), and I can view existing pages (main ns) with section labels. I am using HHVM and have seen no quirkiness. — billinghurst sDrewth 06:41, 19 October 2014 (UTC)
The issue is unless I disable HHVM, every edit/creation needing a begin or ending section tag is converted/saved as is depicted in the first gray box. I've never used this format (with the # number signs) nor was it ever "enabled" here - manually or otherwise - and if it ever was somehow enabled, I could "switch back" by using the gadget enabling the "old style" labeling (of course its confusing, its is another ThomasV piece of garbage left over from days long gone. Proper <section begin & <section end tags are actually the "default" or the "norm" loaded via core extension(s). That was overridden site wide in Base.js long ago only to be disabled site wide again via a default selected gadget).

I'm not saying it doesn't work - I'm saying its never been my preference and I'd never adopt using that resource wasting, load corrupting method of labeling on principle alone. Turning off HHVM restores my years old preference when it comes to section tags in short. -- George Orwell III (talk) 07:03, 19 October 2014 (UTC)

Unless you have since "fixed" something I am not seeing these issues (I currently have both HHVM enabled and "Use the old syntax in the Page namespace" selected. Old (<section> syntax seems to work and is not apparently "rendered" into #### upon opening edit session. In fact I deliberately took no steps to protect that #### above if that may serve as a test-case.) AuFCL (talk) 07:29, 19 October 2014 (UTC)
I've never set any preferences or disabled anything, but I have always had my section labeling converted to hashtags when I insert them. --EncycloPetey (talk) 15:16, 19 October 2014 (UTC)
The 'enable-just-to-disable' hash tag nonsense was put in place probably before you got here so that is why you believe it is normal behavior. The particulars can be found in MediaWiki:Base.js which is basically forced to load before anything else does in our MediaWiki:Common.js file. Any doubts? Let's stop forcing Base.js altogether and see what happens then shall we? -- George Orwell III (talk) 15:27, 19 October 2014 (UTC)
i was having section problems in IE (before HHVM) forcing me to use <section begin="John Smith" />. is this a pseudo-improvement? Slowking4Farmbrough's revenge 17:19, 19 October 2014 (UTC)
I had LST issues the first or second day HHVM was rolled out too but that issue went away by itself. That was nothing like what started [for me] yesterday however.

I just want to be clear - I'm sure HHVM is merely revealing never before seen issues that were always lurking about rather than the outright or primary cause of them. Obviously it will become the standard once the bugs are all stopped out so there is no reason to "fear" using it while its still in beta. I've retitled this section so it hopefully doesn't come off so "accusatory" as it initially may have been. -- George Orwell III (talk) 17:46, 19 October 2014 (UTC)

As for using <section begin="John Smith" /> - that goes back to my original point - regardless of having/using the hash tag approach to section labeling or not, the LST and [to an extent] the Proofread page extensions use & recognize section tags by design & nothing else. Period.

Hiding/converting/applying section labeling using hash tag syntax is a cosmetic or maybe comforting thing for editors I guess but a big honking waste of resources imho when it comes to the extensions themselves -- George Orwell III (talk) 18:06, 19 October 2014 (UTC)

The hashtags were put in to make it easier for people who didn't understand the begin and end section tags, which do have their own quirkiness, and were causing some issues for some users. For some they will be easier as they are just required at the start of a section and automatically close a section at the termination of a page, unless a hashtag terminator is used. So while I prefer the old style, the alternate style was introduced for good reason as a display case, so condemning it is a little harsh when you are a vastly experienced user. People are given choice, and that is a good thing. — billinghurst sDrewth 02:09, 20 October 2014 (UTC)

┌───────────────────────┘
Nobody condemned the premise for using hash tags to make life easier for newbies - only in the manner it was implemented...

  • tags from extension is default -> automated conversion to hashes forced by Base.js site-wide -> return to default tags via gadget by per user preference
vs.
  • tags from extension is default -> automated conversion to hashes enabled universally for all via gadget initially -> restoration of default tags possible via disabling gadget by per user preference

The difference between the two instances is only the latter fulfills the premise without penalizing the "resources" of those who prefer the extension's default(s) in the process. The attempt to paint this condition as somehow a net increase in choice leading to a overall gain in benefit falls flat [again] if one examines the nuances of the facts at hand.

The Gadget interface was in place at the time and could of just as easily served as the vehicle for [java]scripting hash tags into a "reality" as forcing it via Base.js did - the promised uptick in traffic by a quick acceptance of hash-conversion prevailed over common sense and preferred practices is all. Too bad that promise played out like the search for WMD's did in Iraq.

This isn't the first time 'administration thru appeasement' has turned out to under-serve the community-at-large's interests in the long run but I consider it one of the most egregious examples of it to date. And fwiw, that was the only condemnation made in my previous comments, Neville. -- George Orwell III (talk) 02:13, 21 October 2014 (UTC)

In my opinion, (although no one asked for it), replacing section tags with hash tags were/are a most idiotic idea. When I began using them, I was also a newbie and I was not confused by section tags, but certainly was by the hashtags. GO3, Is Neville a reference to Papa (Joseph) Chamberlain's infamous boy? — Ineuw talk 02:40, 21 October 2014 (UTC)
well, ok, could we get a better solution, say button on edit toolbar? a section button with a pop-up fill-in section name would be really useful. Slowking4Farmbrough's revenge 23:06, 2 November 2014 (UTC)

Is there a way to view short pages needing reviewed?

I was looking through some of the stuff here and I noticed from the community portal that there are a lot of pages that need to be proofread. Some are very long though and have foreign language characters that need to be converted. I wanted to ask if there was a way to see the shortest unreviewed pages? A lot of headway could be made fairly quickly by doing those small pages first. Reguyla (talk) 17:46, 2 November 2014 (UTC)

Oh and I found several pages like Page:The Oxford book of Italian verse.djvu/99. With both the needs proofread and does not need proofread note. Reguyla (talk) 18:35, 2 November 2014 (UTC)
The Oxford Book of Italian Verse is mostly in Italian. Since this is the English Wikisource, and since we do not host Italian text, only the English portion (prefatory material) of that work is hosted here. The rest is available at the Italian Wikisource. --EncycloPetey (talk) 20:57, 2 November 2014 (UTC)
"Short pages" in the WP sense are less relevant here, especially in the main namespace. As we transclude transcribed pages from the Page:ns to main ns, so all main ns pages are small, and one that transcludes 1000 pages would be little difference in size to one that transcludes one page (we are talking a difference in 3 character … from=1 to=1 … compared with … from=1 to=1000 …). As a page of paper can only hold so much text, our page sizes in the Page: ns are pretty similar, so it comes about the work, its reproduction, and whether it is of interest. [Personally I dislike novels, as it is all the speech interaction, and short paragraphs which drives me to distraction.] What we have though, depending on your interests are:
  • small works that have been proofread in need validation, findable at Wikisource:Proofread of the Month/little works (manual list).
  • there is Category:Index Proofread which shows all works that have been proofread once, and are in need of validation
  • there is Special:IndexPages which will show a variety of works in their non-proofread/proofread/validated stages and their overarching process towards completion
  • Wikisource:Proofread of the Month which will show our current team work, and for November (each year) the focus is on validating works that have been taken to the proofread stage.
  • or sometimes I just look at Special:RecentChanges and see someone proofreading a work, and if it is vaguely of interest, I follow along behind them. (I believe that people appreciate someone validating their work as they do it, and to know that it is progressing.)
We have lots of work out there, so depending on what rocks your boat from historical, novel, biographical, poetic and musical, there is plenty available. Some of our curatorial and display of TO DO is less than perfect, so many of us will happily help you dig something out from some dark crevice, or help you do something new of interest to you. — billinghurst sDrewth 23:52, 2 November 2014 (UTC)
Oh, and Wikisource:For Wikipediansbillinghurst sDrewth 23:54, 2 November 2014 (UTC)
Thanks, I already read though most of that and I know this is different from WP, but the short pages was mostly from the context of the new guy trying to learn things here. Searching through the links provided above will tell me the ones that need to be proofread and need work certainly, but as a new guy I would prefer to start by looking at some smallish pages rather than pages with paragraphs of text. If there was a way to simply pull up a few that were only a couple sentences or less, then I could rake through a bunch of those rather quickly while learning the ropes here. Thanks again. Reguyla (talk) 15:02, 3 November 2014 (UTC)
Unfortunately not. That would need something Page namespace specific, that interprets the proofreading status (1 or 2), AND then drills down to the size. We don't get that sort of service here. Numbers of the tools and special pages built for WPs aren't as functional here. <shrug> But don't sweat where you choose, we patrol newcomers edits, and if we see an issue then someone will usually politely say something helpful. — billinghurst sDrewth 11:40, 4 November 2014 (UTC)

Another related question

I found a page Page:A Treatise on Geology, volume 2.djvu/8 with some writing on it that needs to be proofread. In these cases would we go ahead and add the text, do we mark it as blank (since its clearly not a "part" of the text, or do we do something else with it? Reguyla (talk) 15:11, 3 November 2014 (UTC)

@Reguyla: We leave it blank as the text itself is blank--we don't want to transcribe any hand-written notes or stamps from a library or stickers put on by a book club. The appropriate response is to leave it blank and when proofreading, check the light grey "Without text" radio button. —Justin (koavf)TCM 17:58, 3 November 2014 (UTC)
KoafV is correct in our general approach where we are not interested in scratchings, library marks, etc. though there are annotations that are valid edge cases. The way that we look at it is the transcluded work, is the work of the author, and the remainder is informational and maybe notational.

What I have done previously where handwritten text is pertinent to the scan, eg. signed by author, or something of particular note, is added a {{user annotation}} (to the header or footer sections, or wrapped in <noinclude> with or you can put specific text on the Page talk: page so that it is findable by search engines. If the work is transcluded and you think that an annotation is worthy of note, you can add something to the work's note section, eg. front matter contains a personal note and signature of the author; if not transcluded then you can scribe something on the Index talk: page (and hope that someone sees it). Other components might be that it was a scan of 1 of 500 of a limited edition print run, which is another sort of valid note, but not something that we transclude as the "body of the work". — billinghurst sDrewth 23:29, 3 November 2014 (UTC)

Great good to know thanks. Reguyla (talk) 02:51, 4 November 2014 (UTC)

17:28, 3 November 2014 (UTC)

What do we need to note and check?

My take on this note:

  • Check whether we have used Special:Cite anywhere and update if we do (though it does redirect). For information, this is a toolbar link that provides a citation for our WS page, though it cites our work, not the original author, and it is something which we should be considering for bugzilla now that metadata is better coordinated these days.
  • Review page and look at our "protection" templates, and the featured and nominated text templates, and look to migrate. Also see if other components can utilise this aspect.

billinghurst sDrewth 00:46, 4 November 2014 (UTC)

I can't find any mention of Special:Cite [20]. WhatamIdoing (talk) 15:09, 4 November 2014 (UTC)
I'm fairly sure I ported the change to Special:CiteThisPage for the handful of affected pages last week so it shouldn't be any surprise we can't find anything using/pointing to the old Special: page now that the code for it has actually rolled out to all wikis.
  • as for the new indicator tags replacing what amounts to the old top-icon scheme - its also live now. I concur - the protection/featured type of templates need a revamp to utilize this new approach as do the tools at the top of the Index: page template. I'm sure there are others but I can't think of them at the moment & my free time has been limiting as well. -- George Orwell III (talk) 20:57, 7 November 2014 (UTC)

15:00, 10 November 2014 (UTC)

18:28, 17 November 2014 (UTC)

Wikisource on front page of en:Wikipedia

For info: In about an hour there will be a link to Encyclopedia of Needlework from the "Did You Know" section of the main page of the English Wikipedia. This is a part complete book by @Durova and the article is inspired by an article about an encyclopedist. Victuallers (talk) 22:56, 17 November 2014 (UTC)

Page needs updating. Studies of a Biographer 4 is complete and needs to be replaced with another title. I would do it myself, but don't want to mess things up. Londonjackbooks (talk) 01:21, 19 November 2014 (UTC)

Done by BWC, thanks for the prod. — billinghurst sDrewth 12:45, 19 November 2014 (UTC)

A User with a message

I recently received an unhelpful message from a user who may or may not be the same user as User:Enjoymypresencelifewasters (contribs)—an account created just after the message was posted. Thanks, Londonjackbooks (talk) 20:22, 20 November 2014 (UTC)

Looks to have been handled. — billinghurst sDrewth 12:06, 21 November 2014 (UTC)
Yes, Thank you. Londonjackbooks (talk) 14:27, 21 November 2014 (UTC)

Fat-fingered key

Index:George Washington National Moument.djvu Would someone who has the knowledge please correct the spelling on the pages of the above book? —Maury (talk) 04:39, 23 November 2014 (UTC)

@William Maury Morris II: There are only seven pages proofread so far, so I'll happily look at those. —Justin (koavf)TCM 05:36, 23 November 2014 (UTC)
I think perhaps Maury was asking someone to move the index and pages to a title in which "Monument" is spelled correctly. Hesperian 12:05, 23 November 2014 (UTC)
@Hesperian: Right. Gotcha. Thanks. —Justin (koavf)TCM 12:20, 23 November 2014 (UTC)
I thank you both. Kindest regards, —Maury (talk) 15:37, 23 November 2014 (UTC)

19:31, 24 November 2014 (UTC)