Wikisource talk:Maintenance of the Month

From Wikisource
Jump to: navigation, search
Main page General discussion Suggestions Subpages Completed maintenance
Potential tasks for the coming months
Month Task
June 2014 Proposed policies and guidelines
August 2014 Wikisource:Maintenance of the Month/Orphans
Work index revision

Is this the right approach?[edit]

While in principle I think this is a good idea, it sounds offputting. Maybe "Focus on improvement" or something similar, would work. Dominic suggested "Reform" which is far more radical than maintenance, though that is undoubtedly one aspect.

Is a monthly theme right for maintenance? The suggested theme for this month (ie. what do we want to do) reveals this problem. Some things might take a week, others a year.

Many maintenance tasks need to be quantified to be of any use, and that presumably requires tools. e.g. to count the number of uncategorised article (on some dimension) and to set a target for reducing them.

But since this all is new, then maybe the first thing is to set up a channel by which the direction could be established: probably a talk page for active contributors. The emphasis should be on identifying and prioritising actions which are needed for Wikisource. I'd guess that this would produce some very different perspectives, but that it would be a useful exercise. Chris55 (talk) 23:04, 8 July 2012 (UTC)

This was the result of the discussion on Scriptorium. I left time before I acted to see if anyone had a better idea but no one responded. That goes for the name and the period too. While criticism is necessary for this project to function, I don't agree with the points raised so far. I don't see how setting up the process reveals any problem when I am, in fact, setting up the process. Few of the current suggestions are incompatible with a monthly schedule and some that are cannot be achieved on Wikisource (changing the search function, for example, needs a software change either through meta or bugzilla). We already have ample maintenance categories and special pages to provide information; and adding could be the focus of a future month. Also, this page is the talk page for contributors and a channel by which to establish a direction. - AdamBMorgan (talk) 12:01, 9 July 2012 (UTC)
Ok, Adam, you're right about the name: there was a long pause and I had forgotten that the MotM idea was backed. Sorry. And I hadn't looked at the list of suggestions. (I'm still not used to the brown bands of which WS is so fond.) But I AM still working on the Help files! Could I suggest that you add a column to the suggestions tables for volunteers so it's easier to see which tasks need more? Chris55 (talk)
Sure, volunteers column added. - AdamBMorgan (talk) 14:51, 9 July 2012 (UTC)

Categorization + Portalization[edit]

I am in favor of the categorization task - I think categories are a great navigational tool, and easier to maintain than portals. Granted, portals have their benefits, but I think the first step should be to add appropriate categories, then these can be used to assemble lists at Portals.

We should add a {{{categories}}} parameter to {{header}}, which should function the same way the {{{portals}}} parameter does - I mean that multiple categories can be separated by "/". However, more than 3 categories should be supported - 10 should be more than enough. And both {{{categories}}} and {{{portals}}} should be added to the default preloaded header code. Also, there should be convenient links somewhere to help the user find appropriate categories and portals, probably via search boxes. (See {{engine}})

So far, so good. Next, we will need to actually categorize the works! We will need a list of all works. That seems like a lot, and one month may not be enough. The initial list can be generated and broken down into sections using Special:PrefixIndex. Then all subpages (items with "/" should be removed. I know this can be done with AWB.) Once the first step giant leap is is done, a new list can be periodically generated from Special:NewPages, thus getting all pages created since the last list was made.

Portal links could be added to the works at the same time, or that task could be deferred - I think it will be considerably easier to do the categories first, then use them to add portal links, then use the "what links here" at the portal page to generate a list of items that might belong there. --Eliyak T·C 05:22, 15 July 2012 (UTC)

I have made the necessary changes to {{header}} and {{header/preload}}. See it in action here: Magic Morning (Kouznetsov). --Eliyak T·C 05:56, 15 July 2012 (UTC)
(e/c)As an initial step, you could use the Maintenance Reports section of Special:SpecialPages to find uncategorised categories, files, pages & templates as well as unused and wanted categories. Beeswaxcandle (talk) 05:58, 15 July 2012 (UTC)
  • Comment -- I'd support this direct addition to header templates. It would be nice if our category namespace behaved more like wikipedia's at the same time where the cat-tree is collapsed a level or more & without any need to actually click into the sub or parent category to see the list above or under it. -- George Orwell III (talk) 06:17, 15 July 2012 (UTC)
It is configurable per instance or per default - see mw:Extension:CategoryTree. --Eliyak T·C 06:41, 15 July 2012 (UTC)
tweaked thru a MediaWiki message value. I've been to that page many times before & never put two and two together until now. Again, thanks for the pointer :) -- George Orwell III (talk) 07:03, 15 July 2012 (UTC)

I don't understand. Where is the benefit of adding categories via the header template, as opposed to adding them in the usual way using standard wikicode? Hesperian 06:25, 15 July 2012 (UTC)

I'm sorry I wasn't clear. The objective is to prompt the user for the necessary categories, and leave an obvious blank space where they are missing. --Eliyak T·C 06:30, 15 July 2012 (UTC)
Hmm, if all pages used {{{categories}}} in the header, we could generate a category of pages without topical categories (these would still likely have date, license and technical categories). --Eliyak T·C 06:33, 15 July 2012 (UTC)
  • Contrary: I too am in favour of categories in general, but I'm not sure they work on Wikisource (quite apart from the fact that they have been badly maintained). For one thing, the structures are so broad for many works (e.g. fiction) as to be almost useless. Some of the categories have been badly drawn up. Unlike French WS which categorizes by century for the last 3 or 4, English WS has "Early Modern" from 1631-1899 which covers 80% of material. As others have pointed out, large numbers of entries have no subject classification at all.
Also there is no current mechanism in MediaWiki for joining categories e.g. someone might want 19th century romantic fiction. And the "incategory:" search mechanism on WS is useless and misleading for several reasons: it does not search subordinate categories and it has the transclusion problem. So I don't see how one could set up any sensible search boxes using the category system. Without that, one is left looking at categories that have thousands of entries and are thus practically useless.
With these limitations, I can see why people have put more energy into developing the portal system. It is a purely manual system but it's relatively easy to tell people how to add a new work into the system. We certainly need some tools to find omitted works and if we could develop the category system to provide these, I'd be more enthusiastic, but it could be harder than simply concentrating on portals.
I'm happy to debate these issues, but I've tried and failed to use the category system to find anything. Chris55 (talk) 10:30, 15 July 2012 (UTC)
In all honesty - that's pretty much the way I saw this all playing out too. It is still worth a try if enough folks really want to get behind it but I don't see how the "old" way Hesperian mentions and the "new" way can coexist in the end. If it further bloats the header template on the way to hopefully breaking it for good - all the better. -- George Orwell III (talk) 13:35, 15 July 2012 (UTC)
As Chris55 points out, the categories need an overhaul. This doesn't mean deleting categories, it mainly mean breaking them down into subcategories. So, for example, Category:19th century works should have the subcategory Category:19th century romantic fiction, and that subcategory should also be in Category:Romantic fiction. That way it can be reached in both directions. When we have enough such works to clog up that category too, there should also be Category:19th century romantic fiction originally in French, Category:19th century romantic fiction by Jane Austen, etc. --Eliyak T·C 14:28, 15 July 2012 (UTC)
To demonstrate the possibilities, I have started by categorizing any modern plays I could find into sub-categories of Category:Plays by decade. I figure there are perhaps another 20-50% that I didn't find by brute force. --Eliyak T·C 17:47, 15 July 2012 (UTC)
Now you're just confusing me. I see you put Plays by Decade rightly under Works by Decade because that makes complete & obvious sense but Works by Decade shouldn't be a sub to Works to begin with - it should be a sub to Works by date no?.... the cross- cat wanted there being works by date vs. works by type -- George Orwell III (talk) 20:50, 15 July 2012 (UTC)
Absolutely correct. That was a mistake, now fixed. --Eliyak T·C 21:30, 15 July 2012 (UTC)

I just finished generating the list of all "base pages", meaning mainspace pages excluding subpages. You can see it here: Wikisource:Maintenance of the Month/Base Pages. It had to be broken up into subpages itself because it is pretty big. This should be a useful list whether or not it is used for its intended purpose of improved categorization. --Eliyak T·C 14:42, 15 July 2012 (UTC)

Thanks Eliyak, those are very useful pages for other purposes. e.g. one can estimate that at least one third of the base pages come from one work, namely the DNB, and another one third are US court cases.
I think your project for subdividing plays is an interesting pilot. It's unfortunate that the "Works by Genre" doesn't include a single category for plays - it's divided somewhat oddly into "Drama" and "Comedies". But supposing that anomaly to be resolved, let's see what implications your approach might have if applied across the board.
You've added about 10 plays by decade to cover the 19th/20th century, so let's suppose there are double to cover the other identifiable periods. If we look at the genres to which we could apply this treatment, there are about 100 in the top few levels excluding those that are already dated in some way. So this might give us up to 2,000 new categories if carried out consistently.
Would this help the user? By itself it would be a clickathon, but if we provided some automatic search facility with a certain "fuzzy" element in it, it might work. But it assumes that every work has been put in the right cubby-hole. Getting people to do anything is, as we know, a significant problem, and I suggest that it would only work if we had a bot to do it.
Having got through that, how does it compare with a portal? You get a bare title, whereas in a portal you get title, author and probably more. So for common/similar titles the portal approach still has a lot to offer.
This does expose the poverty of the search mechanism in mediawiki. I'm aware that powerful searches of the category forest could be a cpu drain, but for it to be useful we really need something like that. Wikipedia works partly because article titles are a dead easy searching mechanism, but we don't have those benefits here. The traditional library catalogue, which is now at the base of the portal system, still has its advantages. There's plenty of work still needed, but I think it would pay off better than spending too much on an alternative approach, even if it's more 'wiki'. Chris55 (talk) 12:02, 16 July 2012 (UTC)
Eliyak, do you mind those subpages being edited? Given the interest here, categorisation makes the most sense for the first month (and probably later months too). It would help to point to a list (such as these) where users could mark off the ones they have done. It would also be useful to do the same for portals and years in the header but focussing on one thing, with the option to do more, makes more sense to start with. - AdamBMorgan (talk) 22:59, 16 July 2012 (UTC)
I do not mind at all. If we ever need the originals, there are the history pages. And, at some point (1 year? 6 mo?), we will presumably want to generate an updated list or else a list of new pages since the original was made. --Eliyak T·C 23:46, 16 July 2012 (UTC)

Can someone have a look at Wikisource:Maintenance of the Month/Base Pages, especially the "Categorization" section? I'm trying to write a quick guide for the August task but I can't seem to get it right. I think I keep trying to overcomplicate it, when all I want is "Add [this stuff] to works." - AdamBMorgan (talk) 21:29, 24 July 2012 (UTC)

It's a great start - already a candidate for replacing Help:Categorization, which is not very helpful at all. The first 2 sections are great, and I've added "date" as a standard check. I think the licence should be a 4th standard check but I'll let you do that!
The list of "sometimes" categories has problems. Most of them don't seem very sensible categories to me which shows in the few entries they have.
But the big problem to me seems the size of the task. There are 89,000 titles to be checked, of which 24,000 are court cases and 22,000 are DNB entries, both of which need special treatment. Of the remaining 42,000, many are individual poems and the like which are more sensibly catalogued in their collections.
The "crossing out" procedure is doable but possibly doubles the time of the procedure and may be unreliable. An alternative would be for Eliyak's script to identify those that still remain to be done, but this in turn requires it to identify those head pages that are inadequately categorized in the first place. If that could be done, then I think the time could be properly spent. Chris55 (talk) 10:47, 25 July 2012 (UTC)
I don't think any computer could accurately judge adequate categorisation. You could possibly make a big list of categories and tell the program "at least one from column A, one column B, etc" but I would expect that to be a lot of tedious work to set up. It still wouldn't be guaranteed to be correct (for example, if two or more genres apply, or a new category is created during the categorisation) and it would need to run multiple times per day to keep up the pages to date (otherwise you risk wasting effort when the same page is checked by lots of people). Striking out is a pretty robust, low-tech solution. It was the simplest thing I could think of and it has worked with similar lists in the past. ({{closed}} will work for blocks of links too, which could be used after everything in a block is struck out). - AdamBMorgan (talk) 22:04, 25 July 2012 (UTC)

Biographical articles[edit]

How (if at all) should we differentiate in categorizing Author pages and biographical articles about a person? For example, Category:Presidents of the U.S. is in the Author pages category tree, so where should we categorize Encyclopædia Britannica, Ninth Edition/Abraham Lincoln? Perhaps we need Category:Works about presidents of the U.S., etc.? Or should that be Category:Biographies of presidents of the U.S.? --Eliyak T·C 23:06, 24 July 2012 (UTC)

I would say all three: Category:Presidents of the U.S. > Category:Works about Presidents of the United States > Category:Biographies of Presidents of the United States. I expect most of the things written about any given President are along the lines of "This President is awful and here's why," which goes into 'works about.' Biographies would be a work about a President but there should be enough of them (in Wikisource's potential future if not now) to fill a separate category that is also a subcategory of Category:Biographies. I wouldn't put works directly into Category:Presidents of the U.S.. - AdamBMorgan (talk) 22:09, 25 July 2012 (UTC)

Number of Works[edit]

On the basis of Eliyak's listings I've done a little counting of the most common "base pages" on Wikisource and listed every article starting with a common prefix having more than 1,000 entries. The results are surprising.

Total 89,011  % cumulative cumulative%
Court cases 24,428 27.4 24,428 27.4
DNB entries 22,358 25.1 46,786 52.6
Executive Orders 3,657 4.1 50,443 56.7
Proclamations 3,329 3.7 53,772 60.4
Presidential addresses 1,370 1.5 55,142 61.9
UNSC Resolutions 1,319 1.5 56,461 63.4
Others 32,550 36.6

The first surprise is the total number. We were recently claiming to have 783K articles on Wikisource on the main page and George recently changed this to 260K as another counter recorded. But this measure would suggest the number is under 90K.

But a large part of these are particular types of document which arguably should not be in the main page-space base pages at all. The sample that I drew the "one-third" comment above isn't entirely representative, partly because of the other big-hitters above. There are other big categories, such as disambiguation pages (1,648 in category), which aren't listed above. It's easy to see how many of those above can be added by almost automated means. We may be missing out by not advertising them. There are, incidentally, 61,243 entries in the category Category:PD-USGov.

All of these numbers presumably have significance. Which do we want to use? Chris55 (talk) 14:47, 16 July 2012 (UTC)

Don't forget that some (many) articles are not being represented - most importantly encyclopedia articles such as 1911 Encyclopædia Britannica/Medicine which are not "base pages." I had originally thought to do a more complex filter of the complete page list via regex - to remove anything that has "/Chapter 1" etc. in the title. I decided against, since a) this will not actually remove all subpages which should not be counted, and b) encyclopedias can easily be identified and worked on separately. --Eliyak T·C 14:57, 16 July 2012 (UTC)
No I certainly wasn't. When I talked about those that mightn't be in the main page-space I meant they mightn't be "base pages", in your terminology. All of these, if they contain content, take work to create and maintain. What we're presumably discussing is how best to maintain them. Chris55 (talk) 15:06, 16 July 2012 (UTC)
Thinking a little more, Eliyak - is the difference between 260K and 783K simply a matter of namespace? 3:1 might correspond to main+index+page:main? I don't have the access to count. Chris55 (talk) 09:23, 17 July 2012 (UTC)
Just to fill in some of the back story on those numbers - I'm sitting on over 70 years worth of scanned & uploaded Presidential docs, covering roughly 8700 Proclamations and 13600 Executive Orders. Eventually, everything listed will also have PR'd scans to back up all those automated imports from some point in time before I stumbled into this place. Tarmstro99 has over 100 volumes of passed legislation scanned & uploaded but no free time to do them any justice. Slaporte is behind all those court cases and will need to do the same securing-of-scans, if and when he comes back to his project. The 3 branches of Federal government has produced a whole lot of stuff over the years -- all of it free from copyright -- and, surprisingly, a good slice of it still relevant to the days current events. So there are bumps in increased outside interest when "we" get lucky enough to be ahead of the online curve with a key document already handy but those are becoming fewer and farther between admittedly. -- George Orwell III (talk) 15:35, 16 July 2012 (UTC)
Thanks for the background, George. What I was really suggesting was that we need a few more portals or ways of drawing attention to all this good stuff. e.g. tables of Proclamations by year, administration, area or whatever. Very few have any marking other than Category:PD-USGov. I didn't put any focused search boxes for legislation or court results because I didn't find any suitable portals to hang them on. The portals currently list maybe a few hundred cases and there's no easy handle on them. I got some of my stats by looking for " v. " and I suppose one could use that and "Proclamation" is a good prefix. At least most of them don't yet have the transclusion problem:-)
in passim: As a Brit I was slightly encouraged to see the Category:OGL - but it'll take us another century to catch up with the US! (And I only found the category by accident 2nd time round! How to get there from category or portal I haven't a clue!) Chris55 (talk) 18:56, 16 July 2012 (UTC)
Still, one must come to terms with the realities at hand. The limits on what and how recent the items are that we can host here dictate the possibilities available to us; not the other way around. You're 1000% right about the organization part though. Until all the President's had their own lists of Procs and EOs on their own author subpages coupled with an easy to navigate category setup to see what we had and what is missing - all that stuff was sitting around gathering virtual dust for the most part. Now that that the framework is in set & all the EOs have a mild standardization in place, folks dealing with those themes who manage to stumble onto either the lists or the CATs come away mildly surprised and genuinely appreciative of the efforts (from what little I can passively gleam that is). Some of our stuff started showing up on other sites soon afterward (a hat-tip to Carl's investigative skills). Formulating an approach was the easy part, sticking to it was far harder. Converting all existing and new EOs to use the same header took almost 2 years of fumbling around in the wiki-dark by itself for example. -- George Orwell III (talk) 20:40, 16 July 2012 (UTC)
Yes, I found those author pages eventually and it's simple to make up a page of pointers. (I've passed it on to Adam). I can see how much work has gone into this. Chris55 (talk) 21:29, 16 July 2012 (UTC)
I've created a portal for the proclamations. I've also expanded Portal:Presidents of the United States to show which presidents have the standard subpages (non-standard ones can be added manually to the final column). This should be a dynamic display that detects subpages as and when they are created, assuming the same naming conventions, to future proof it a little. (Which is incidentally a benefit of having categories as well; they are always automatically up to date with only minimal manual input—adding the category). - AdamBMorgan (talk) 22:44, 16 July 2012 (UTC)
Fantastic! One note: Obama's weekly addresses are video rather than radio and should be moved a column - I guess that may be a pattern for the future. Chris55 (talk) 09:14, 17 July 2012 (UTC)

┌───────────────────┘
Ooo! I'm glad you brought this up. It reminds me that we have hundreds of these weekly addresses by the President from at least some point in Reagan's administration on forward, and in light of the fact there were some only in print before the radio and now the video incarnations, we should really get around to renaming them to an agreed upon standard that drops any mention of the radio or video mediums completely. The date of course will still need to be part of whatever title format is selected for cat and disambig reasons.

  • President's weekly address (2016-07-21)

... seems fine to me but I guess folks might feel slighted if it doesn't also include (... the United States) in some way. Thoughts? -- George Orwell III (talk) 03:39, 18 July 2012 (UTC)

Let's work on help pages soon![edit]

They really need it and it is something I heard actual users asking for at Wikimania. Editors coming from en.WP except to be able to find out how things work. Honestly, a lot of US could probably benefit. I know there are things I avoid because all the templates changed and I never learned the new way. I have skipped marking a page proofread after I DID proofread it and at the bottom found a hyphenated word. I couldn't remember the name of the stupid template and tried to find instructions in the help and gave up. I have seen validated pages that did not use the template. Maybe we have proofreaders right now working that don't know how to handle hyphenated words. I feel that I have to start working on these help pages and making them up to date and more prominent. NARA wrote there own instructions in some kind of skin, because they did not find ours useful.BirgitteSB 00:23, 19 July 2012 (UTC)

OK, help pages are provisionally earmarked for September's task. - AdamBMorgan (talk) 11:30, 19 July 2012 (UTC)
I think people should get on with it now! I've been working throughthe pages as I learn how Wikisource works, but there are many aspects I'm still in the dark about. But I do have the advantage of knowing what they look like to a relative newcomer. Chris55 (talk) 11:59, 19 July 2012 (UTC)
People can if they like (and could anyway). I mean the Maintenance of the Month for September is tentatively Help Pages, just as the Maintenance of the Month for August is Categorisation. Part of the point of a project like this is to focus attention on a specific area; more than one task per month dilutes that. - AdamBMorgan (talk) 13:15, 19 July 2012 (UTC)
There's over a month to go yet but I have done two things to prepare the way for this. (1) The subpage Wikisource:Maintenance of the Month/201209 Help workflow is a process I think will work best for collaborative creation and review of help pages. It allows lots of people to work on the same area, in as large or a small amounts as they like, and for the process to be easily managed. (2) To facilitate this, and for future work, I have imported Wikipedia's stub template framework (and concept) for use in the non-content pages of Wikisource. The template {{stub}} should be enough on its own, although it is supported by half a dozen other templates. It's a simple concept and one people familiar with Wikimedia should understand. (NB: Wikisource:Stub is itself a stub at this point.) - AdamBMorgan (talk) 22:39, 19 July 2012 (UTC)

Not once and done[edit]

Unlike a {{PotM}} there will not always be a pre-definable completion to many {{MotM}} projects. Some may be to large to complete in one month. Listing, priority areas and reachable goals, will be important. Some tasks may need to be revisited after a time also. Help pages are one that is unlikely to be completed in a single month. JeepdaySock (talk) 15:17, 19 July 2012 (UTC)

This project and the Main Page[edit]

I think we should put add something like this in the "Collaboration" box of the Main Page.--Erasmo Barresi (talk) 19:18, 19 July 2012 (UTC)

Wikisource-maintenance2.png
The current Maintenance of the Month is focused on:
Categorisation

Front pages[edit]

Since the topics for months are being quickly decided, can I propose that we revise the main pages and left column? At the moment we still have a lot to learn from French Wikisource. Chris55 (talk) 08:29, 21 July 2012 (UTC)

Why do you use the plural form "pages"? :-) If you are writing about the Main Page, I agree to revise it. We can use ideas from a past proposal, in particular the subdivsione of "New texts" into proofread ones and validated ones.--Erasmo Barresi (talk) 18:13, 21 July 2012 (UTC)
[Covering this and the previous topic] I would like to see more collaborative projects on the main page—adding to the current projects rather than replacing them—maybe including things like PSM and EB1911. However, that is a topic for Scriptorium rather than here. Also, I'm not sure if the main page is the best place to point out our flaws (with MotM) but that debate can be held on Scriptorium too. - AdamBMorgan (talk) 19:59, 21 July 2012 (UTC)
Sorry for the misunderstanding (English is not my native language): I wrote "put" to mean "add". The only things I'd remove are the recent collaborations: In my opinion just the last one would be enough.
Chris: I added the suggestion to revise the Main Page here, with me and you as volunteers.--Erasmo Barresi (talk) 19:27, 22 July 2012 (UTC)
I've made a copy of the Main Page's sandbox here: Main page mockup. It might help; you can edit that page without affecting anything else. Some of the templates might pose a small problem (although you could transclude in sandbox versions of alternatives, possibly using subpages of that page, to solve that) but you should be able to get an outline of what you want Main Page 2.0 to look like. - AdamBMorgan (talk) 12:33, 23 July 2012 (UTC)
I said "pages" to include "Community portal" as well as some of the other pointers in the left column. e.g. we have 3 "random" links which all give a mix from the page and index space which tend to swamp main - must be confusing to the newcomer for whom presumably they are designed. The French have links to authors and portals there which seems very sensible. I'm sure there are many other possible ideas. Chris55 (talk) 20:27, 22 July 2012 (UTC)
The page MediaWiki:Sidebar – editable by administrators only – determines the links which appear in the left column. I agree to remove the confusing links "Random author" and "Random book". A link to the index of authors is already in the main page header; a link to Portal:Portals could be added in the same place.--Erasmo Barresi (talk) 08:12, 24 July 2012 (UTC)
I don't have any problem with "Random author" but I think both "Random page" and "Random author" could be replaced by "Random root page" - tho probably called "Random book". Maybe these are all for the convenience of admins rather than newcomers. But I'm only just an admin (:-) and haven't yet discovered how these special pages are configured. Chris55 (talk) 10:27, 24 July 2012 (UTC)
I'm not sure if "Random root page" will work nor if it would be something the average user would want. That said, I think there may be a benefit to changing "Random page" to "Random work" and rewording "Random book" to something like "Random index", "Random transcription", "Random proofreading" (whichever makes more sense to a casual user). That should cover the main areas in which I'd expect a user would to be interested. The current random page does frequently throw up pages in the Page namespace more than the Main namespace, at least in my experience, which I don't expect is what most people flciking through random pages want. (NB: Special:Random can work with any namespace, for Special:Random/Portal, Special:Random/Category etc are all possible.) See also, mw:Help:Random page. - AdamBMorgan (talk) 18:26, 24 July 2012 (UTC)

(returning to the left) I feel "Random book" is fine and clearer than "Random index". "Random page" should be renamed into "Random work" as it refers to the main namespace instead of the Page namespace. Other possible changes to make are: adding a link to Portal:Portals and turning "Random author" into "Index of authors" – linking to Wikisource:Authors.

Getting back to the starting theme, the front pages may be the Main Page, the Community Portal, and Wikisource:News. I'm adding a table at the top of this page with the maintenance tasks for the coming months. Feel free to edit it.--Erasmo Barresi (talk) 15:07, 25 July 2012 (UTC)

I use the "random author" link more than the others. (Also, "random scan" might make more intuitive sense for "random book"). Anyway, the table is fine. I've floated it to the right, so it doesn't leave too much whitespace. The three "front pages" could possibly be merged into one month. It might be difficult to co-ordinate a whole project working on just one page for a month. - AdamBMorgan (talk) 20:18, 25 July 2012 (UTC)
As a result of this discussion, this is the summary of the changes to made to the sidebar:
Random bookRandom transcription ("transcription" represents our work better than "scan")
Random pageRandom work
+ Portals
You're right about the front pages, but preparing all the three drafts by October may be hard. Let's see how the work on categorization and help pages will progress.--Erasmo Barresi (talk) 10:36, 26 July 2012 (UTC)
I've added a proposal to Scriptorium as this affects everyone. (However, I amended the Random work link to limit it to the main namespace, as this was part of the original complaint.) Once there is some consenus, making the changes will be easy. - AdamBMorgan (talk) 11:56, 26 July 2012 (UTC)

OCR text layers[edit]

I would like to suggest the fixing up of books having no OCR text layers. Is it as simple as just clicking the OCR button on the edit page and mark them as not proofread? If so, I would like to work on this. Asking here, before I put it up on the page listing all of the "assignments." - Lucyrocks=) (talk) 18:58, 25 July 2012 (UTC)

Feel free to add it to the suggestions. It might be something a bot could do, considering the OCR function already exists. It might have to run somewhat slowly to prevent the server being overloaded but it should be faster than a human. - AdamBMorgan (talk) 20:20, 25 July 2012 (UTC)
Running OCR without a plan to proofread it may not be the best plan. Using this page Page:Latin for beginners (1911).djvu/353 as an example the OCR version, without a proofread is garbage. It has taken me a almost a year to get through just these few pages and that is with a plan to work on them. JeepdaySock (talk) 10:39, 26 July 2012 (UTC)
There's another problem with just zapping texts without OCR layers. We currently have about 5K books loaded - double what it was a year ago - and there are maybe 30K books here. So just erasing all the work done in the past doesn't help too much. There's a heap of work to do in proofreading the books we have got. In addition, unlike most of the WS languages the number of naked texts is still increasing so we're not even keeping up. Chris55 (talk) 17:38, 26 July 2012 (UTC)

I agree, Pages: without a text-layer should have one but hitting the OCR button is not the same as applying a text layer to the entire file in question via an OCR routine. The OCR button is a per page proofreading temporary solution while uploading the source file [PDF] to IA so they can apply a text layer via their OCR regime or "uploading" the source file [DjVu] to Any2DjVu to do the same via their OCR process is a far better solution that doesn't need page creation as a requirement. The text layer would always be a part of the post-OCR'd file, ultimately [re]uploaded to Commons, so one can PR the work right away or months from now, whatever the case might be.

For those Indexes built around one or more image files instead of true documents (i.e. PDFs or DjVus; a single file with any number of internal "pages"), there are no good options other than coverting the series into a common document format first. Finally, only custom OCRs recognize languages other than English so if your work is a mix of languages don't expect much in the way of a text layer no matter who you get to apply on online. (a Latin OCR'd text layer... what were you thinking? :) -- George Orwell III (talk) 01:15, 27 July 2012 (UTC)

Categorising index files[edit]

I have a suggestion of where we should concentrate a large part of our effort for next month's maintenance task. We have around 3,600 works which have not been properly proofread and the number is possibly growing about 100 per month, dwarfing the 1,200-odd that have been proofed. An important part of the reason is that these files are not presented anywhere to Wikisource readers apart from the index category files, which presents a bewildering list of names many of which are not even real titles. I've been grappling with how to present target lists of books to be proofread and validated to users, and the lack of categorisation emerges as perhaps the biggest problem.

Effort put in at this stage doesn't have to be wasted - the categories could be automatically copied over with the other header information in the transclusion process, which would mean that chapters in the main space were automatically categorised when they were created as well as the main entry. For the future we could strongly encourage people to categorise the work immediately the index file is created. It's not an unattainable goal, certainly not compared with the task of improving the 90-230,000 mainspace files we have and could mean we concentrate our efforts where we most want to improve Wikisource.

I haven't simply added this to the suggested tasks as I realise it's not current policy to label index files - but I think it should be. Any comments? Chris55 (talk) 16:14, 30 July 2012 (UTC)

I agree to categorize index pages. As Chris wrote in his essay, two types of index categorization may coexist: Index pages which are part of a series (like periodicals) may be categorized by this membership; all others may be categorized thematically. The system would work; let readers become contributors would be simpler with thematic categories. We can write a guideline draft about categorization (not just categorization of index pages) and then propose it to become an official guideline at the Scriptorium.--Erasmo Barresi (talk) 14:18, 2 August 2012 (UTC)
The reason we haven't been categorising Index files is that they then get mixed up with mainspace works and, of course, end up in the categories twice (or more for multi-volume works). I think there are two questions here: 1) What works about New Zealand history do you have that I can read? and 2) What can I do to help improve the New Zealand history section? I don't believe that the answers should come from the same source and so I would only be happy with this proposal if there was a separate categorisation tree for Index files with just a few broad categories and on marking an Index as "done" the category would be removed.

Addressing Chris' other suggestion of auto-categorising at transclusion, I'm not keen on flooding the mainspace categories with chapters, sections and other subpages. For example, when completed, Tracts for the Times will have 90 subpages all of which could sit in Category:Anglicanism all in the format "Tracts for the Times/Tract xx" but what's the point of doing that? A single reference in the category to the main page will be sufficient and navigation can be from there. Beeswaxcandle (talk) 00:34, 4 August 2012 (UTC)

Having thought a little more, I agree that repeating category entries for chapters (as subpages) is undesirable for the reason you give. It would help improve categorisation statistics on wikisource but that's not a good argument; iIt's the statistics that should change. It does work for poetry etc. where they are not subpages, but I'm dubious about those for other reasons.
I do think that categories are increasingly for aiding the creation process - portals are where we should send readers. And in that respect we need the information immediately a work is loaded as index file, not waiting till it is transcluded to the mainspace. But it's not easy to see that a parallel index would work. Maybe the category information should be transferred rather than transcluded.
But I've just checked a small sample of index files (20) and a majority of these don't any entries in the mainspace, and quite a few are almost finished proofreading. Maybe we should just concentrate on getting those done - both entered in the mainspace and categorising them. This thread started from a belief that attempting to check all the base pages in the mainspace was a hopeless task. I still believe that. Chris55 (talk) 11:11, 4 August 2012 (UTC)

Toss out the word "Modern"[edit]

Eras
Era Period
Ancient Before AD 600
Medieval 601–1420
Renaissance 1421–1630
Early modern 1631-1899
Modern 1900–present
Contemporary living

In looking over (briefly because there is much to look over closely) the text I encountered the word Modern for texts. What is modern? It is "for this moment". I think the word "modern" should be tossed out completely and be replaced by the word century. Century is definite whereas "modern" always becomes the past. I myself do not want to touch the good work others have set up and are setting up because it may create confusion. However, I can make suggestions here and am. I previously wrote to two people on talk pages; "Create an area other than here for suggestions, make it stand out in a different color and point to it on Scriptorium. Very Respectfully, Maury ( William Maury Morris II (talk) 17:33, 2 August 2012 (UTC)

You are bringing up one of my pet subjects! I believe modern to be the most terrible word in the English language. It is worse than useless, a useless word would merely lack significant meaning. "Modern" rather signifies a meaning that it so completely dependent on the moment that it is nearly guaranteed the meaning wil fail to be understood after a small bit of time, very often the meaning is lost irretrievably. Avoid this word! --BirgitteSB 23:33, 2 August 2012 (UTC)
I think we got stuck trying to find an alternative. People seem happy with Ancient and Medieval, it's everything after that that is the problem. Strictly, that's all "modern history" but the word can be confusing, especially as it covers everything from the Tudors to the Cold War. I do like the idea of having labelled eras, so it goes Year > Decade > Century > Era or even Year > Decade > Century > Period > Era (the Medieval era is divided quite clearly into Early, High and Late periods). It's a convenient way of grouping authors and works into a small number of easily understood blocks of time. - AdamBMorgan (talk) 00:55, 3 August 2012 (UTC)
I think we have to switch to centuries well before the 20th! Let's get rid of "Early Modern" as well. The Renaissance has very few takers apart from poetry so why not start with 15th? Clearly there are boundary problems particularly for Authors and it will need a bot to sort out the 10,000 entries in the Early Modern. But anything that divides up such a category is welcome! Chris55 (talk) 12:10, 3 August 2012 (UTC)
Authors are sorted by the {{what era is}} template, so that only needs to be changed once for the majority of the authors to be re-sorted automatically. However, I'm not comfortable with mixing types. Centuries and eras are not the same thing; and this would set up problems for the Ancient and Medieval authors/works, which will have both a century and an era. I'd prefer either (1) dropping all eras, (2) coming up with a new name for the the post-medieval age (probably covering both "Early modern" and "Modern", to align with the history system) or (3) splitting the post-medieval age into a set of new eras. Any system will do as long as we are consistent and the labels are fairly intuitive (for example, also based on history terms, the Enlightenment, Industrial, Atomic and Information ages). - AdamBMorgan (talk) 13:17, 3 August 2012 (UTC)
A very random thought: Given that we are a library, would the Print Age (or some variation) be an appropriate alternative? Our current eras are shown in the table above (which I have added for reference). However, according to Wikipedia's Middle Ages, the medieval era ended between about 1450 and 1500, depending on which historian you listen to. The Gutenberg Bible was the first major printed publication and led to the "Printing Revolution;" the first copies of this book were available in 1454-1455. Why not remove our last three eras, extend our Medieval age up to 1454 and start a new age from 1455 onwards? It could be called something like the Print Age, the Gutenberg Age etc.
Continuing on this theme, we could have a fourth age (Digital Age, Electronic Age etc) marked by the use of e-texts. Wikipedia's E-text article says the first was created in the 1940s while large scale use started in the 1960s (File Retrieval and Editing System and NLS (computer system), both in 1968). Either is a good place to end the previous age and start a new one extending to the present.
Is this acceptable to anyone? - AdamBMorgan (talk) 22:11, 3 August 2012 (UTC)
It is interesting what "authors" use but we too are working with books and I do not believe that every person who reads here, or considers working here, understands those terms and why authors use those specific terms—or even the fact that they do use those terms. It is easy for anyone to understand works by the century. I tend to prefer nautical works around the 1720s or 1850s and I understand what centuries those are in. But that author classification above is not what I would immediately and correctly think of. I have 3 degrees from universities where I once did understand and use those terms above but that was long ago. We need to consider what readers will recognize and "modern" isn't it. 1900's? We are in the 21st century and the idea of (especially early) 1900s is not "modern" to me. Technology rules this era. If it must be then Adam's ideas of "digital age" &c., is fine with me but I still prefer century. I am in no argument with anyone here but I am giving my personal opinion. Everyone's opinion is of value. Needless to say, "Modern", is needless to say in my opinion. William Maury Morris II (talk) 23:58, 3 August 2012 (UTC)
I'm relieved to know that all the era (and century) categories are produced by template. It's not impossible to mix eras and centuries. The French do it (see fr:Catégorie:Périodes). I can see the benefit of them for the ancient world but what's the point now? Chris55 (talk) 11:37, 4 August 2012 (UTC)

Kill categories[edit]

Crazy proposal brought on by this MotM. Right now we're juggling five organizational schemes: categories, portals, indices (i.e. Wikisource:Index; seems kinda dead?), author pages, and index pages (i.e. Index: namespace). Categories and portals are in many respects redundant: topical, hierarchical, many-to-many relation with works. Categories have pretty much one thing going for them: MediaWiki integration. Portals have the benefit of being organized individually thanks to being wiki pages and organized with relation to one another thanks to AdamBMorgan et al.'s tremendous work. I think it would be desirable to somehow subordinate categories to the portal system. No idea how exactly that would work, just throwing stuff against the wall and seeing what sticks. Prosody (talk) 00:17, 3 August 2012 (UTC)

On the English Wikisource portals have been a way to recycle old lists, so they seem similar to categories. But in the Wikimedia world portals are not mere lists: see w:wp:Portal guidelines. When we select portals improvement as that month's task, the aim will probably be to make our portals not just lists. So I do not support your proposal.--Erasmo Barresi (talk) 14:21, 3 August 2012 (UTC)
Errr.... Portals on en.WS primarily mirror the Library of Congress' Classification system and do behave more like a category than a subject-tree. Trying to mirror encyclopedic or topical subject matter on a site that is dedicated to hosting source material was not feasible nor desired. -- George Orwell III (talk) 15:40, 3 August 2012 (UTC)

Undated works[edit]

Hi. I was giving a look at suggested item "Undated works". I noticed that entries "Xxxx Yyyyy (DNB00)" are in Category:Undated works. I am not familiar with DNB project. Probably there is a reason behind it. Otherwise wouldn’t it be possible to modify Template:DNB00 to automatically identify volume date and categorize accordingly? Bye--Mpaa (talk) 21:21, 5 August 2012 (UTC)

Yes check.svg Done AdamBMorgan (talk) 13:50, 7 August 2012 (UTC)
Thanks Adam. I noticed another issue that overpopulate this category. See e.g. 43rd Annual National Thanksgiving Turkey presentation.
The year parameter in header is not set (the page is then placed under Category:Undated Works by default) but Category:1990 works is also specified. So the work belongs to 2 self-contraddicting categories.
Simple solution would be to specify year in header instead of specifying explicitly the category. But if for some reason one does not want to specify year in header, then a possible solution could be to foresee in header something like "override_year" (similar to "override_translator") to avoid the default setting to Undated work. Just a proposal for discussion … --Mpaa (talk) 18:05, 7 August 2012 (UTC)
I could add something along those lines. However, I think in these cases the editor(s) just didn't know about the header parameter or added the texts before it existed. I know SDrewthbot went through everything, moving the category year to the header template (I've seen it often enough in page histories). Perhaps asking Billinghurst for another pass would help? - AdamBMorgan (talk) 23:58, 7 August 2012 (UTC)
Couldn't be also an "esthetic" reason? Not to display the year in brackets? BTW, I also saw pages with year in page title but no year/category, e.g. 2008_U.S._Presidential_Debate_-_September_26. I would be glad to help but not being aware of some choices I wouldn't like just to go ahead and step on some toes :-)--Mpaa (talk) 06:47, 8 August 2012 (UTC)
Hi. One additional observation. Version pages are based on the header template, but year parameter is most of the times not applicable to such pages. Version pages then fall into Undated works category. Currently there are 547 version pages.
I think that in the header template it should be foreseen the possibility not to tag as Undated work if the year parameter is missing. --Mpaa (talk) 12:58, 9 August 2012 (UTC)
Two new parameters added: (1) "override_year" as you suggested. I copied the function of override_author, so it will display the override but not attempt to categorise it. When I applied this to {{versions}} I hit a problem in that it still needs to display something to work. So, (2) "no_year" just turns off the categorisation, which is what you suggested in the first place. I have left "override_year" just in case it has some use in the future. I have checked a few works and these changes seem to work without breaking anything. - AdamBMorgan (talk) 01:50, 10 August 2012 (UTC)


While turning off the categorisation is useful, there are instances where simply hiding the year from displaying (while keeping the auto-cat if the year parameter is populated) would also be just as useful. For example....

  • Proclamation of July 19, 1864 (1864)

... is redundant as it is ugly. I'd much rather have the ability to turn off the display of the year while keeping it's auto categorization than have to create another independent header to correct for nonsense like that. I wish the header remained mostly for navigation purposes instead of both information storage, etc. & navigation but thats a conversation for another day and elsewhere I suppose. -- George Orwell III (talk) 02:13, 10 August 2012 (UTC)

I agree with GOIII regarding the possibility to hide the year from displaying. This might also explain why 'year' is not used (and Category:nnnn works is used instead, or in the worst case nothing). I do not think we should procrastinate the discussion on this point (which I also hinted above). If this is not the right place we can post it where it belongs (Scriptorium?).
In the meanwhile I shall proceed with fixing the conflict between Category:nnnn works and Undated work as done by sDrewth in the past.--Mpaa (talk) 00:05, 11 August 2012 (UTC)
Past observation shows that editing such a high-usage template as our header is better served when the debate and discussion takes place on its own talk-page with a pointer to proposed changes placed in WS:S or the proposal itself take place on WS:S. I have been just as lazy as other folks in the follow-up and closing of such discussions being moved to the appropriate talk-page no matter where it originally took place - I will keep this is mind moving forward however & urge others to do the same housekeeping. Documenting any changes should be a given as well. -- George Orwell III (talk) 00:38, 11 August 2012 (UTC)
While poking around, there seem to be other problems with the year at the moment. I don't think any work before 1000 AD is being categorised properly (I noticed a similar problem with the Author template recently). I'll mention it on the template's talk page. - AdamBMorgan (talk) 02:09, 11 August 2012 (UTC)

Wikisource namespace[edit]

There are some pages in the Wikisource namespace that could use some maintenance:

  • Wikisource:Requested texts needs some work. It's a bit of a mess at the moment and hasn't been maintained much (some requests have been here longer than I have). It might just need cleaning up but there does seem to be a few different things going on here. It may be useful to slim this down or split it to different pages.
  • Wikisource:Proofreading is cleaner but not very up to date. It does not seem to get many edits per year, although they do trickle through. It could also be advertised and integrated into the rest of WS more' I didn't even know it existed until about a month ago.

These pages serve useful purposes but they don't seem to be in good condition at the moment. There are probably more pages like this around. If so, please add to the list. - AdamBMorgan (talk) 21:04, 12 October 2012 (UTC)

Summarising maintenance categories using {{PAGESINCATEGORY}}[edit]

There is probably some value in creating a page that acts as a meta directory to the maintenance categories that we wish to watch, and has a count using the magic word count. So those who come along can see a quick list of things that need a human touch. It is an expensive count, but as we are not pulling it up on a traffic intensive page that should be okay. — billinghurst sDrewth 12:46, 13 December 2012 (UTC)


Restore excerpts[edit]

Can we restore the excerpts and templates removed by Wikisource:Maintenance of the Month/Sister-project link standardization? As discussed in Wikisource:Scriptorium#Purpose_of_author_pages they are appropriate. As discussed in Wikisource:Scriptorium#WikiData_to_Headers there are no other viable alternatives. Jeepday (talk) 11:31, 19 January 2013 (UTC)

I can't see any way to do this as a general project. We can track what is in place now but not what used to be in place. I didn't personally notice any excerpts get removed, just a few unused templates being deleted, so I'm not sure how much of this took place. - AdamBMorgan (talk) 20:35, 18 February 2013 (UTC)

Buffer[edit]

We should probably build a buffer of tasks a few months ahead, rather than trying to pick one at the last minute, so we all know what's going on and can prepare (if necessary). Personally I'd like at least 2-3 months on the list at all times. The pattern so far has been alternating between lots-of-small-things and one-big-thing, which seems reasonable to continue. Does anyone have anything they want to have a little attention in the near future? - AdamBMorgan (talk) 20:29, 18 February 2013 (UTC)

In April, we could focus on proposed policies and guidelines. There are currently three categories for them (1 2 3), and Wikisource:Text integrity has not a clear status.
In May, we could continue the broad categorization task.--Erasmo Barresi (talk) 22:06, 19 February 2013 (UTC)
That sounds like a reasonable plan. Later on we might want to designate some month to be Categorisation Month, like PotM's Validation Month; it seems like a task that might be around for a while. Do you want to discuss the YouTube video and the possibility of putting it back into rotation? - AdamBMorgan (talk) 23:28, 21 February 2013 (UTC)

Current: Wikidata and subpages[edit]

Should we be adding Wikidata links to subpages? Currently they show up in the list of pages not connected to Wikidata because the subpage tag doesn't pull in the wikidata link from the main page. The Haz talk 05:12, 10 February 2014 (UTC)

I think you refer to pages like Author:Emily Dickinson/M. I don't think we should create new Wikidata items for them, but we can connect them to items that already exist if they perfectly matches to them. For example, we could add the aforementioned page to the same item as the Wikipedia article "List of Emily Dickinson poems whose title starts with M", if this existed.--Erasmo Barresi (talk) 20:24, 27 February 2014 (UTC)
Sounds good. Thanks, The Haz talk 22:20, 27 February 2014 (UTC)

Marking Progress on Wikidata Integration[edit]

I propose that we keep an infographic showing the dates and the count of remaining author pages disconnected from WD. Initial data:

Date Count
2013-03-21 ~5300
2013-03-23 4723
2013-03-24 4666
see Wikisource:Maintenance of the Month/Wikidata ...

Once we have a few more data points, then we can port {{Bar chart}} from WP and plug data in. This may help the community focus on the task as hand and encourage more involvement when people see real progress getting made. -- DutchTreat (talk) 11:04, 23 March 2014 (UTC)

Update table for 2013-03-24 DutchTreat (talk) 09:00, 24 March 2014 (UTC)
Moved table to subpage: Wikisource:Maintenance of the Month/Wikidata - DutchTreat (talk) 22:54, 24 March 2014 (UTC)
Interesting idea, but if you're going to do this please go before March 21, as a good amount of work had been done before that. The table currently offers a somewhat limited view at a larger scope project. Also, I think you should list the times (and time zone) that those numbers were captured for the sake of accuracy. You could take measurements 47 hours apart and the dates would still be one number apart. My opinion, of course. The Haz talk 23:48, 26 March 2014 (UTC)

Proposal to resolve orphaned pages[edit]

I'd like to resolve the ~2,000 or so pages not linked to by some other page (Orphans) listed HERE at some point in the coming year. Thoughts? -- George Orwell III (talk) 08:00, 31 May 2014 (UTC)

Good idea. Jeepday (talk) 11:30, 1 June 2014 (UTC)
I like the concept. Is there a way to filter out redirect pages from this list? Many cases a page was an orphan only because it was moved to a new name. The original name exists on the list because it contains the redirect to the new name. The team would operate more efficiently if these were not considered for clean-up. Otherwise, I agree this is a useful maintenance activity. I cleaned up a few after I saw your message. -- DutchTreat (talk) 10:30, 4 June 2014 (UTC)
useful (?) view.--Mpaa (talk) 11:37, 4 June 2014 (UTC)
Many of theses need to be deleted (if older than two months). I will start going through Category:Soft redirects, which should help limit these kind of orphans. Have you seen any redirects in here that are not from the "dated soft redirect" template?--BirgitteSB 01:56, 6 June 2014 (UTC)
Most of the links are part of Category:Soft redirects to translated works.
There are also pages who have a link only to: Wikisource:Maintenance of the Month/Base Pages and subpages. This might hide them from being considered orphans.--Mpaa (talk) 08:10, 6 June 2014 (UTC)
On the former, I asked the creator of that template about the original intention and these should be resolvable one way or another. The latter are probably really orphans, unless I am misunderstanding you. They would be in the scope of this project. BirgitteSB 12:33, 6 June 2014 (UTC)
Thanks @Mpaa and @BirgitteSB. Confirmed. When we filter out these two kinds of soft redirects, we'll have a focussed and useful list. Shaping up to be a nice, productive maintenance project! - DutchTreat (talk) 12:59, 7 June 2014 (UTC)
Sorry for the late input - and I agree with what has been discussed so far - but if one scrolls high enough to pass the mainspace listed works, one can still find hundreds of Author:, Index: & etc. orphans that do not fall into the "redirect trap" mentioned so far. -- George Orwell III (talk) 21:24, 7 June 2014 (UTC)

Scheduled for August.--Erasmo Barresi (talk) 09:40, 11 July 2014 (UTC)

Authors by nationality[edit]

Would it be a good idea to look for pages in the Author namespace that don't belong to any of the categories in Category:Authors by nationality? Most of these should be easy to fix, e.g. Author:Samuel Halkett or Author:John McLure Hamilton (these were the first two I randomly checked and they didn't have a nationality category). --Azertus (talk) 19:23, 16 September 2014 (UTC)