Wikisource:Scriptorium/Archives/2017-07

From Wikisource
Jump to: navigation, search
Warning Please do not post any new comments on this page. This is a discussion archive first created on 01 July 2017, although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Announcements[edit]

Do you create PDFs on Wikimedia wikis?[edit]

Hi everyone, I’m looking for feedback from people who use the function to create PDFs on the Wikimedia wikis, which feels relevant for Wikisource. In short, the main technology we’re using to render them – OCG – is breaking down. The code is old, it’s difficult to maintain, and if we don’t replace it now we might suddenly find ourselves in a situation where we'd have to take it down without having planned to do so.

We have some plans for the future over at mw:Reading/Web/PDF Functionality. If you care about the PDF function, please head over there and tell us on the talk page if anything is missing, or if there’s something in there we shouldn’t spend our time and energy on. /Johan (WMF) (talk) 12:19, 18 May 2017 (UTC)

Proposals[edit]

Bot approval requests[edit]

Repairs (and moves)[edit]

Other discussions[edit]

Tech News: 2017-25[edit]

15:44, 19 June 2017 (UTC)

Ship track upload as documentary source?[edit]

I'm about to receive a track of the ACX Crystal, recently involved in a collision in Japanese waters. Would this be proper to upload here as a "documentary source"? I expect it to be in a tabular format that can then be converted to a graphic, but not yet plotted as a graphic. - Bri (talk) 18:03, 19 June 2017 (UTC)

what is the license? if it is a document of tabular data, you could argue for PD in the US, but the pdf of the document would go to commons first. or do you want to upload here as "fair use"? Slowking4SvG's revenge 11:42, 20 June 2017 (UTC)
Further to this, we don't allow "fair use" on Wikisource, and we also don't allow reference material such as tables of data unless it is published as part of a complete source text. —Beleg Tâl (talk) 12:25, 20 June 2017 (UTC)
but we very well could, would, and should. given the propensity of commons to delete books in use, it is a matter of time. Slowking4SvG's revenge 19:01, 20 June 2017 (UTC)
If it's added to Commons as '.map' data, it'd be plotted automatically. Like commons:Data:Wikimedians.map for example. I'm not sure Wikisource is the place for pure data. Sam Wilson 12:30, 20 June 2017 (UTC)
Commons:Structured data is acceptable to be uploaded to Commons, usual copyright applies. I would not think that a track would be copyright as fact is not copyrightable. — billinghurst sDrewth 05:05, 24 June 2017 (UTC)

Problem with a pdf file[edit]

A pdf file has a problem! When I download it and I go to page 172 using Acrobat reader, I see the page but in the wikisource, no page is shown. This is the page address in fa.wikisource.org. Please help me to solve it. --Yousef (talk) 11:17, 22 June 2017 (UTC)

The page is visible to me. You need to purge your cache. Hrishikes (talk) 12:18, 22 June 2017 (UTC)

Search projects from this project now active in English Wikipedia[edit]

Just to let you know, as announced via mailing list service, English Wikipedia is now receiving search results of this project, Wikisource, intended to direct Wikipedia users to this project. Currently, an option to suppress the search results of this project from the English Wikipedia search system is proposed at Village pump's "proposal" subpage, where I invite you to comment. --George Ho (talk) 19:04, 22 June 2017 (UTC)

How do you contribute to Wikisource?[edit]

Hi everyone,

I have been proofreading a few pages here, but I feel like I don't understand really how this place works. There are many many projects started, some of them lingering for years. I don't even know how to find out how many books are finished, how many books are ongoing. It seems like a lot of people work for some pages on a book, alone, then very often give up, because this is a very long and sometimes boring task. Apart from a few discussions on the Current Collaborations, I don't see where people talk, so I don't feel like there is an active community. Am I missing a magical place where people discuss, exchange, organize?

A few years ago, I participated in PGDP, where there is a very active forum, with a thread for each project where the different proofreaders can exchange on the formatting or the difficulties to reach a consistent result, or even just share the most interesting/funny quotes of the books they are working on. There was also some specialized teams, like one named the gravediggers if I remember correctly, which focused on the oldest projects, or teams for texts on a specific topic, which could gang up on a given book at the same time. This was made possible by the existence of statistics at the book level, not only at the page level.

So:

  • Is there a lot of discussion and organisation going on somewhere I don't know (other talk pages? IRC? mailing-lists?)
  • Would you be interested in statistics at the project level? (e.g. list of projects with the progress percentage, so that we can quickly finish works almost done, or focus on the oldest ones). I think I could code something giving regular updates. Actually, does it exist in other wikisources?

Koxinga (talk) 20:12, 22 June 2017 (UTC)

This very page (the Scriptorium) is our central discussion forum. You've come to the right place! Discussions regarding a specific project are done on the Index talk page. Bigger projects are organized as WikiProjects. Other discussion forums and lists of places to contribute are listed at Wikisource:Community portal. I'll let someone else speak to statistics as I don't know much about that. The best place to contribute if you don't know where to contribute is probably the proofread of the month. —Beleg Tâl (talk) 21:02, 22 June 2017 (UTC)
dashboard for wikisource progress? yes please! the example that comes to mind is Wikisource:WikiProject DNB/Statistics and Wikisource:WikiProject DNB/Progress. but in general we are too disorganized to do actually reporting, except ad hoc. some tools to make project management & progress communication would be fine. we should really do a wish list, or you could write an idealab - quick grant, if you could write up your own scope. Slowking4SvG's revenge 22:20, 22 June 2017 (UTC)
Of course I know about the talk pages and the Scriptorium, but it is just so empty. There is no feeling of community here.Koxinga (talk) 22:43, 23 June 2017 (UTC)
The Special Pages link on the left-hand side gives you access to a lot of interesting information, and particularly List of index pages is the page to see if you want find projects at various stages of completion. — I think one of the strengths of English Wikisource is it (usually) allows you to start and work on all sorts of project autonomously, but that does result in a lot of unfinished projects and makes the community spirit a little hard to see at times. I've put up a lot of index pages that I'd like to work on "some day" and a couple of times I've come across one that someone has taken on and finished, which was extremely gratifying. — One thing I do to contribute is search for common scan errors and correct them. One of my favorites has been "thou earnest" for "thou camest". That's a good way to get a glimpse of a lot of interesting material. Anyway, I hope you'll be sticking around, and I agree that more community interaction would be a good thing! Mudbringer (talk) 01:34, 23 June 2017 (UTC)
To add to this, a lot of editors will add a list of the projects they're working on to their user page, so you can get an idea of what people are up to by looking there. Special:RecentChanges will also show what people are currently working on. —Beleg Tâl (talk) 11:54, 23 June 2017 (UTC)
Yes, that's exactly what I mean. It is very gratifying to see someone else working on the same project. On the opposite, I have been back after a hiatus of a year, to find that not a single page had been proofread in the meantime. I do work on some rather specific topics, with Chinese characters that might frighten some contributors, but still, this is rather disheartening.Koxinga (talk) 22:43, 23 June 2017 (UTC)
I enjoy contributing to wikisource, it's one of my favorite passtimes. I like the idea of adding works here and making them available for future generations. Maybe someone 100 years down the line will be reading some of the works we've been adding. I also like the idea of me being able to read works I've never read before and also at the same time making them available for other readers to read. But it has to be enjoyable for me, so I mainly work on subjects I'm interested in and as you mentioned I often might start a book and get disinterested, and then just forget about it. I don't care. This isn't a job, I don't have to contribute if I don't want to, I can wake up tommorow and never contribute to wikisource again and probably no one will ever notice. I don't want deadlines here, I have them at work. I like contributing here to get away from work and relax. So basically wikisource for me is something enjoyable to do in my free time and having to be forced to finish a work, or work on books we're not interested in just to get it done is the wrong way to go for me. Don't get me wrong, we should strive to get the works we're working on finished, but if we don't or can't who cares, someone else will probably get it done down the line. Jpez (talk) 11:26, 23 June 2017 (UTC)
I am not talking about setting deadlines or anything like that. It is fine if your motivation is entirely internal and you can work alone at your own pace. However, I do think we would get more contribution with more reporting on what is going on, what are the projects moving forward, what are the projects close to completion, etc.Koxinga (talk) 22:43, 23 June 2017 (UTC)
welcome to smaller wikis. there is less chatter and drama, and more work done. a little coaching (management) would be welcome. people tend to ask for help here, ad hoc, rather than systematic reporting; people team to get a project done. we could use a wikisource newsletter, or progress dashboard. if you could make some tools to report project progress semi-automatically, rather than by hand, that would be a big help. Slowking4SvG's revenge 15:19, 26 June 2017 (UTC)

Collaboration products newsletter: 2017-06[edit]

08:41, 23 June 2017 (UTC)

License tags in Translation space[edit]

What is the best way to put license tags in Translation space? The original work needs an explicit license tag, but I'm not sure about the translation itself. I assume it will always be CC-BY-SA-3.0 and GFDL, but I've seen some editors explicitly release it into PD. Is this allowed? Should the CC-BY-SA-3.0/GFDL licenses be explicitly tagged? I've been tagging them explicitly, as below, but I just want to see if others have a better way.

{{translation license
| original = {{PD-old}}
| translation = {{CC-BY-SA-3.0}}{{GFDL}}
}}

Beleg Tâl (talk) 13:22, 23 June 2017 (UTC)

Our rider on saving is By saving changes, you agree to the Terms of Use, and you irrevocably agree to release your contribution under the CC BY-SA 3.0 License and the GFDL. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license. So that is what is applying for contributor work in Translation: ns. So until we update that, that is what it is. — billinghurst sDrewth 22:33, 23 June 2017 (UTC)

The Time Machine (Heinemann text)[edit]

Hrishikes has brought an issue to my attention, which I have looked into as well. This is a bit complicated, so I will summarize, then say more at length.

Summary: Our copy of The Time Machine (Heinemann text) is not the 1895 Heinemann text of the novel by H. G. Wells, but seems rather to be the 1924 revised "Atlantic" text included in an omnibus edition The Time Machine, The Wonderful Visit and Other Stories published by T. Fisher Unwin. [17] As H. G. Wells died in 1946, his works are in PD in the UK. The omnibus was printed in the UK in 1924, and does not seem to have had copyright renewal in the US. So it may be in PD in the US. Hrishikes has located a scan of the Heinemann text and started transcription. So, if our copy of the "not-Heinemann" (Atlantic) text is in PD in the US, then we need to move it to a new location and make room for the actual Heinemann text. But if it is not in PD, then it should be deleted. As an added wrinkle, the "not-Heinemann text" is a Wikisource Featured text.

Identity of the text located at The Time Machine (Heinemann text): It is easily seen that our current copy is not the Heinemann text. Compare the table of contents for the actual Heinemann text with the one on our current copy. The number of chapters and their presentation are completely different. The Heinemann text has 16 chapters with chapter titles, but our copy has 12 chapters without titles. Neither did the 1895 Holt text have 12 chapters. The earliest edition with 12 chapters seems to be the "Atlantic" text that was the result of a revision. The "Atlantic" text may be seen here in an electronic version that preserves the original pagination and page headers.

The Atlantic text and copyright: The "Atlantic" text was published as part of an omnibus edition of Wells' works in the UK in 1924. Details of that publication may be found here. I do not know whether the text was simultaneously published in the US, possibly under a different title, or whether copyright applied for at that time. However, a search has turned up no evidence of a renewal for that volume. If so, then it seems the copyright in the US for the Atlantic text has expired. The original text was published in 1895, so it would be PD in the US as well, and all of Wells' works entered PD in the UK at the beginning of this year, as it has now been more than 70 years since his death.

Proposed actions:

(1) Feedback and confirmation of findings thus far. Is our text the Atlantic text?
(2a) If our text is the Atlantic text, and in PD, then propose moving it to The Time Machine (Atlantic text), and then proofreading and transcluding the actual 1895 Heinemann text to The Time Machine (Heinemann text) from the scan Index:The Time Machine (H. G. Wells, William Heinemann, 1895).djvu begun by Hrishikes.
(2b) If our text is not the Atlantic text, or is but not in PD, then delete it and proceed with adding the actual Heinemann text from scan etc.
(3) Decide about Featured status for the text. (Let's wait on that discussion until we know whether we're following 2a or 2b).

Original discussion: User talk:EncycloPetey#The Time Machine (Heinemann text). --EncycloPetey (talk) 17:38, 23 June 2017 (UTC)

1- inclined to agree based on chapters, but could not find an internet archive version, or at hathi trust, and not near me at worldcat [18]
2- i would be inclined to keep both, and change the header data for the reprint. (is it Heinemann text, published by Atlantic?)
2- do not see a reason for deletion (although there is a Scribner 1924 edition)
3- we can have delisted featured, we should think about all the old versions not transcluded from page scans
4-- i imagine we will have more of this, as we research editions. (and as our scholarship improves) the metadata at internet archive is so bad, people could be easily confused. Slowking4SvG's revenge 17:54, 23 June 2017 (UTC)
It's not the Heinemann text. The two texts are completely different editions, even having a different numbers of chapters (16 versus 12). The concern over deletion is that, if this is a 1924 publication, and if copyright was renewed, this edition might not be in PD yet. My research didn't turn up anything, but someone else's search might do so. --EncycloPetey (talk) 18:33, 23 June 2017 (UTC)
if you did not find anything, that is good enough for me. under the current US copyright search, that is the best result you can get. there is no positive proof of non-renewal. we have to set the standard of "good faith search" even if there is a very small chance of facts emerging. this is the standard of hathi trust. Slowking4SvG's revenge 17:26, 24 June 2017 (UTC)
I'd prefer The Time Machine (1924) as the page name, but aside from that I agree with your assessment and support your proposed actions. —Beleg Tâl (talk) 18:20, 23 June 2017 (UTC)
Unless we can verify for certain that the text is specifically from a 1924 edition, I'd hesitate on adding a date to the filename. Doing so might require further changes to the name later, if research turns up additional information. But if we can verify that it is the "Atlantic text", from any edition of that text, then the proposed name will work regardless of the actual date. --EncycloPetey (talk) 18:33, 23 June 2017 (UTC)
  • Pictogram voting comment.svg Comment It is now an edition of a work with an uncertain source, we could just delete it if it doesn't bring true value. With regard to its copyright status, that does not change whether it is a 1924 version, or not, the copyright will always be the original version. Any copyright in the remainder of the suspected publication will depend on each of the components, and the renewal aspects. — billinghurst sDrewth 22:30, 23 June 2017 (UTC)
    As far as I am aware, the 1924 edition was a complete revision of the text by Wells himself, and not merely an editorial version. Does that affect the possibility of copyright? --EncycloPetey (talk) 22:35, 23 June 2017 (UTC)
    If it wasn't published before 1923, and wasn't previously published in an authorized version in the US, the URAA would have restored it. It's hard to say where the line is legally between a non-copyrightable new version and copyrightable changes, but decent revision should do it. It will be out of copyright in the US in 2020.--Prosfilaes (talk) 01:19, 24 June 2017 (UTC)
    Expert opinion from H. G. Wells's The Time Machine: A Reference Guide (2004) by John R. Hammond, page 19:

In the original edition of The Time Machine, published by Heinemann in 1895, the text was divided into sixteen chapters, and each chapter was given a title. When Wells revised his novels for a collected edition in 1924, the Atlantic Edition, he retained the text of The Time Machine virtually unaltered but reduced the number of chapters from 16 to 12, eliminating the chapter titles.

Most modern editions follow Wells's revision in dividing the text into twelve chapters. In the discussion that follows chapter references follow this practice.

A comparison of the chapter divisions is as follows:

Heinemann   Atlantic
1 Introduction 1
2 The Machine 1
3 The Time Traveller Returns 2
4 Time Travelling 3
5 In the Golden Age 4
6 The Sunset of Mankind 4
7 A Sudden Shock 5
8 Explanation 5
9 The Morlocks 6
10 When the Night Came 7
11 The Palace of Green Porcelain 8
12 In the Darkness 9
13 The Trap of the White Sphinx 10
14 The Further Vision 11
15 The Time Traveller's Return 12
16 After the Story 12
  Epilogue Epilogue

As per above, Heinemann chapter divisions were original, but Atlantic chapter divisions are currently in vogue. "Virtually" no difference in text. So I propose that the text may be migrated to scan, with title unchanged, alongwith additional chapters. Two pages are missing in the scan, which I am going to fix by blank placeholders. The blanks may be proofread from the Atlantic text. Hrishikes (talk) 02:00, 24 June 2017 (UTC)

The disadvantage of that approach is that we will have no copy of The Time Machine with the chapter divisions that are now in vogue. If we can legally retain a copy of the Atlantic text, then we should do so for this reason. --EncycloPetey (talk) 02:03, 24 June 2017 (UTC)
Wells's books are PD-UK. But the policy here is PD-US. Non-US texts need not have copyright registration/renewal in the U.S., the copyright is restored by the URAA for 95 years after publication. So we have to assess whether modification of chapter divisions, without alteration of text, amounts to significant change, attracting copyright. If the change is deemed as significant, then we cannot retain this text. Anyway, reduction in chapter number and elimination of chapter titles in currently-in-vogue version of the work may be mentioned in the header note, that should suffice.
P. S. It seems that the Atlantic edition was published in U. S. in the same year (1924) by Charles Scribner's Sons (details at http://www.isfdb.org/cgi-bin/pl.cgi?614641) without copyright notice/renewal. Hrishikes (talk) 03:20, 24 June 2017 (UTC)
Adding chapter names might have been copyrightable, but removing them wouldn't be, and splitting a few chapters in two pieces wouldn't be either. I don't know whether that copyright renewal would have been needed, since it's 30 days of first publication, but the changes don't seem copyrightable.--Prosfilaes (talk) 00:32, 25 June 2017 (UTC)
This site gives a date of October 15, 1924 for the first two volumes in the Atlantic Edition of The Works of H. G. Wells, which includes the text in question. —Beleg Tâl (talk) 01:52, 27 June 2017 (UTC)

Proposed action:

Given that: (a) the original work is PD in both UK and US, (b) the "Atlantic" text seems not to differ substantially except by removal of chapter titles and positioning of breaks, I propose we take the following actions:

(1) Move The Time Machine (Heinemann text) to The Time Machine (Atlantic text) to preserve this version.
(2) Add to the empty The Time Machine (Heinemann text) the front matter from the 1895 scan.
(3) Paste into each chapter subpage the relevant Atlantic text, then split-and-match to the Page namespace of the scan.
(4) Proofread the result against the Heinemann text scan, keeping alert for differences.
(4a) If proofreading demonstrates that the Atlantic text is indeed identical or inconsequentially different from the Heinemann text, then we keep both.
(4b) If proofreading reveals significant editorial changes, we can then delete the Atlantic text at its new location, perhaps moving a copy to Wikilivres, and restoring it 2020 when the US copyright would expire.

--EncycloPetey (talk) 00:45, 25 June 2017 (UTC)

Agreed. I don't think copyright will matter, anyway, it is PD-US-no notice. Additionally, I propose that the header note should mention metadata of this edition, including UK publication by Unwin and US publication by Scribner. And the Featured Text status should move to this new location of the Atlantic text. Hrishikes (talk) 02:17, 25 June 2017 (UTC)
It's only PD-US-no notice if it was published in the US within 30 days of first publication in the UK. Otherwise the copyright (if any) was restored.--Prosfilaes (talk) 03:41, 25 June 2017 (UTC)
i do not believe we have deleted a work based on URAA, so you may not want to open that can of worms, given the WMF legal advice. Slowking4SvG's revenge 14:37, 26 June 2017 (UTC)
@Slowking4: URAA-based deletion is a regular feature here. Premchand's Idgah was deleted under URAA provision, and later restored when it was proved that it was PD-India on URAA date. The works of Jibanananda Das were shifted to Wikilivres under URAA provision. Same with Sokoli Tomari Iccha and Naya Kashmir. There are many more examples. Non-US works are regularly deleted here when it is found that they were not PD-source country on URAA date. The WMF legal advice you referred to is for allowing foreign works that are PD-source country on current date, not merely URAA date. On that advice, Commons has stopped deletion of works that were not PD-source country on URAA date. This practice has not yet started here. If it starts, then the works of Jibanananda Das will need to be restored. Adopting this policy here is risky. You will do well to remember the direct deletion of Anne Frank's Diary by WMF in Dutch Wikisource, overriding the local community, based on URAA. Hrishikes (talk) 17:00, 26 June 2017 (UTC)
this case is very clearly PD not renewed. what evidence do you need? do you want a transcribed catalog of copyright entries?
sorry to hear you are propagating the URAA hysteria. let the restorations begin. i remember that about Anne Frank, why don’t you let me upload it here as fair use, since it is PD in Australia, and i will take the risk. i do not think that the plantiff will risk a DMCA takedown given w:Lenz v. Universal Music Corp. the federal judges are very consistent, and i have the $10k ante for federal court, don’t need any EFF help. Slowking4SvG's revenge 22:32, 26 June 2017 (UTC)
In order to state clearly this is "PD not renewed", we would need evidence that the edition was registered for copyright in the US within 30 days of the UK publication. Lacking evidence for that, we cannot say for certain this work falls under PD not renewed. If the original copyright was not filed in the US, or was not filed in 30 days, then the edition may retain copyright under URAA. That's rather the whole point. We need evidence of the original copyright filed and meeting the conditions, and we still need to verify that the text was not substantially altered. If no copyright was filed at the correct time, and if the text is substantially altered, this edition may still be under copyright. --EncycloPetey (talk) 22:47, 26 June 2017 (UTC)
If it was published with permission of the copyright holder within 30 days in the US, it's treated as a US work and is out of copyright for lack of notice as well as lack of renewal. If it wasn't an authorized edition, or it was more than 30 days after the UK edition, then any new copyrightable aspects will be under copyright.
Honestly, this seems like a bit much. There's no real evidence that's anything copyrightable here, and if there is, there's three years left on its copyright. Someone should split and match it against the old scans, but marginal copyright questions like this shouldn't be that much of a concern, IMO.--Prosfilaes (talk) 04:42, 27 June 2017 (UTC)
I'm inclined to agree with this, especially if we aren't able to determine whether the two publications were 30 days apart. By the time we have all the information we need to know whether it is subject to URAA or not, the copyright may well have already expired. —Beleg Tâl (talk) 05:25, 27 June 2017 (UTC)
registration date is here - Oct. 17, 1924 [19] Slowking4SvG's revenge 14:57, 27 June 2017 (UTC)
Anne Frank's Diary doesn't belong here, since the English translation will be in copyright until 2045 (in the US), and the translator was alive as of 2013. Feel free to bring it up with Commons or nl.Wikisource.--Prosfilaes (talk) 04:42, 27 June 2017 (UTC)

Disambiguation quandary[edit]

The work Once a Week is a literary magazine, but it shares the title with a book by Author:A. A. Milne.

Ordinarily, we would move Once a Week to something like Once a Week (magazine), and use the base name for disambiguation. But the current title is a literary magazine that already has multiple subpages for its series, volumes, and articles. A move would permanently extend the filename of all of the subpages, and require editing all of the links within and between these pages, both in headers and in the Page: namespace.

In this instance, where there is a multi-volume literary magazine involved, would it make more sense to set the disambiguation page at Once a Week (disambiguation), and leave the magazine where it is? --EncycloPetey (talk) 19:19, 23 June 2017 (UTC)

I'm willing to use AWB to disambiguate properly on the magazine. However, is the Milne work being added imminently? If not, there is no need to disambig yet. —Beleg Tâl (talk) 19:33, 23 June 2017 (UTC)
Although the Milne book is not being done yet (there is a good scan at IA [20]), the literary magazine is actively and rapidly growing on Wikisource each day. The longer we delay, the more moves and changes will have to be made. --EncycloPetey (talk) 19:41, 23 June 2017 (UTC)
That's a good rationale. I'll move it over when I'm on my other PC. —Beleg Tâl (talk) 19:54, 23 June 2017 (UTC)
Just to note that the articles are being created as mainspace base pages rather than subpages of the issue. e.g. The philosophy of advertising. Beeswaxcandle (talk) 20:09, 23 June 2017 (UTC)
Good to know. I'll move them to the proper path while I'm at it. —Beleg Tâl (talk) 20:12, 23 June 2017 (UTC)
The Mainspace articles probably ought to be subpages within series, volume, etc., but with redirects left from the Main namespace. I was looking into making those moves when I discovered the disambiguation issue, and decided it ought to be taken care of first. --EncycloPetey (talk) 20:16, 23 June 2017 (UTC)
Agreed. —Beleg Tâl (talk) 20:29, 23 June 2017 (UTC)

Facsimiles of older United States Reports post Google Books' typical full view cut off[edit]

Anybody know where these might be found? Prosody (talk) 19:20, 24 June 2017 (UTC)

These volumes are already present at {{List of United States Reports scanned volumes}}. Are you wanting something additional? Hrishikes (talk) 23:47, 24 June 2017 (UTC)
I was unclear, sorry. There are 564 volumes now, and Google Books only has facsimiles publicly available for US users for ones published before ~1920s (not sure what their copyright restriction policies are for users in other countries). Since asking I've found that Internet Archive seems to have some more. Prosody (talk) 17:06, 25 June 2017 (UTC)
the National Archives has it on microfilm through 1997 https://www.archives.gov/research/guide-fed-records/groups/267.html let’s see if i can find a digital copy at citizen archivist. Slowking4SvG's revenge 23:14, 25 June 2017 (UTC)
can’t find a systemic digitization. we have US govt documents, but they are haphazard. maybe a project with a sweep of the scans available would be a start. we have a few of these large projects that are stalled because the scans are crummy and it is so humongous. Slowking4SvG's revenge 01:32, 28 June 2017 (UTC)

Tech News: 2017-26[edit]

15:38, 26 June 2017 (UTC)

A word about clearing the cache and page refresh[edit]

We are not aloneIneuw talk 19:30, 26 June 2017 (UTC)

How to see edit history on a whole text[edit]

Is it possible to see the edit history of a whole text? I can see the changes made in the last 30 days through selecting "On Watchlist" in the general Wikisource "Recent Changes" page. I would like to look back and see if anyone or any bot has been working on the project I have been working on, namely An_Exposition_of_the_Old_and_New_Testament_(1828). PeterR2 (talk) 09:31, 27 June 2017 (UTC)

@PeterR2: I don't sure that I understand what do you mean on saying to see the edit history of a whole text, but if you open the page An Exposition of the Old and New Testament (1828), and then click on the link "Related changes" which is in the left panel (in the section "Tools") — is this that one what do you need? The page opened by this way would show edits made on both either of the viewed page or its subpages (or also on other pages related to the main page), so you could see the edits on the whole text of the work (since the whole text of the work consists of the main page combined with all of its subpages). P.S. Sorry if I wrongly understood your help request. --Nigmont (talk) 21:16, 27 June 2017 (UTC)
I would love to see an option on the watchlist to automatically watch all the subpages of a given page. There are some mediawiki extensions doing that, was the possibility already discussed here? Koxinga (talk) 21:57, 27 June 2017 (UTC)
There is a gadget (although, I can't find it right now because I can't remember what it was called) for watching all pages in a category. There was an idea earlier this year to extend it to cope with following all pages linked on an Index page, but I don't think that bit was finished. As for seeing all history of a work, I think Special:RelatedChanges is the only way, and that has some limitations (mainly that it only goes back 30 days, because it's using data from RecentChanges). Sam Wilson 22:57, 27 June 2017 (UTC)

Tech News: 2017-27[edit]

15:31, 3 July 2017 (UTC)

Pagelists[edit]

Anyone want to finally clear this backlog? There are some I don't feel happy working with for copyright reasons.ShakespeareFan00 (talk) 14:10, 4 July 2017 (UTC)

Join the strategy discussion. How do our communities and content stay relevant in a changing world?[edit]

Hi!

I'm a Polish Wikipedian currently working for WMF. My task is to ensure that various online communities are aware of the movement-wide strategy discussion, and to facilitate and summarize your talk. Now, I’d like to invite you to Cycle 3 of the discussion.

Between March and May, members of many communities shared their opinions on what they want the Wikimedia movement to build or achieve. (The report written after Cycle 1 is here, and a similar report after Cycle 2 will be available soon.) At the same time, designated people did a research outside of our movement. They:

  • talked with more than 150 experts and partners from technology, knowledge, education, media, entrepreneurs, and other sectors,
  • researched potential readers and experts in places where Wikimedia projects are not well known or used,
  • researched by age group in places where Wikimedia projects are well known and used.

Now, the research conclusions are published, and Cycle 3 has begun. Our task is to discuss the identified challenges and think how we want to change or align to changes happening around us. Each week, a new challenge will be posted. The discussions will take place until the end of July. The first challenge is: How do our communities and content stay relevant in a changing world?

All of you are invited! If you want to ask a question, ping me please. You might also take a look at our the FAQ (recently changed and updated).

Thanks! SGrabarczuk (WMF) (talk) 14:53, 5 July 2017 (UTC)

Wikilivres is now Bibliowiki[edit]

Wikilivres has moved and rebranded; they are now Bibliowiki and are located at https://biblio.wiki . Our internal references to Bibliowiki need to be updated.

  • Documentation needs to be updated (I can do this, albeit it may take a while for me to get to it).
  • The interwiki map for [[wikilivres:foobar]] needs to be updated to point to the correct location, and [[bibliowiki:foobar]] should be created as a preferred alternative.
  • Probably other stuff I haven't thought of.

Beleg Tâl (talk) 15:23, 4 July 2017 (UTC)

wikilivres has been redirected and bibliowiki has been created in the global interwiki map. I suggest moving the template to the new name, and updating as necessary. — billinghurst sDrewth 12:30, 8 July 2017 (UTC)

Tech News: 2017-28[edit]

15:07, 10 July 2017 (UTC)

Per project statistics[edit]

Following my previous post about progress statistics by project, I decided to do some analysis myself. Based on the latest database dump, I looked at the Page: namespace and only counted the edits which change the status of a page.

It is possible to find many interesting tidbits of information from the different projects. For example:

However, it is mostly interesting to check the status of the backlog. For example:

What I wanted mostly was to know on which projects people are currently working. Dumps are not the most appropriate way to go about it as we miss a few days, but it is possible to know what happened in June using the latest dump from July 1st. In June, 419 projects have been edited (i.e. at least one page changed status), the most active being:

Editions by project in June 2017 (as of July 1st)
Index name Index status Pages validated Pages proofread Pages empty Pages remaining Number of pages modified Number of revisions Number of Authors
Index:Travels in Mexico and life among the Mexicans.djvu To be proofread 228 396 62 0 667 910 4
Index:Tarzan of the Apes.djvu Validated 407 0 17 6 410 769 3
Index:Thoreau - His Home, Friends and Books (1902).djvu Validated 346 0 38 0 384 745 15
Index:The Shaving of Shagpat.djvu Validated 306 0 20 0 308 611 2
Index:Ballantyne--The Battery and the Boiler.djvu Proofread 116 316 16 4 432 475 2
Index:The Novels and Tales of Henry James, Volume 1 (New York, Charles Scribner's Sons, 1907).djvu Validated 550 0 18 0 455 455 2
Index:The Novels and Tales of Henry James, Volume 2 (New York, Charles Scribner's Sons, 1907).djvu Validated 564 0 14 0 434 434 2
Index:The Bostonians (London & New York, Macmillan & Co., 1886).djvu Validated 451 0 13 0 430 430 1
Index:Cuthbert Bede--Little Mr Bouncer and Tales of College Life.djvu Proofread 48 256 14 0 311 425 3
Index:Royal Naval Biography Marshall sp4.djvu To be proofread 14 420 18 29 383 385 2
Index:Maud Howe - Atlanta in the South.djvu To be proofread 7 339 10 6 348 352 2
Index:Morley--Travels in Philadelphia.djvu Proofread 203 69 16 0 255 347 2
Index:Ballinger Price--Us and the Bottle Man.djvu Validated 162 0 14 0 176 338 4

Is there any interest for this kind of statistics and analysis? I understand Wikisource is currently driven by very dedicated users who start and often finish a work all by themselves. However, for a more casual editor, who wants to simply proofread a few pages and see a complete book including his work without having to wait for years, this could be a good extension to the proofread of the month (which is clearly visible in the table above!).

Technical description: I parsed a dump of the database to extract each project (based on the index pages), each page (based on the page namespace) and each revision changing the status of the page (not proofread, proofread, validated, etc.). The link between the page and the project is done by looking at the page name. This approach means I don't deal well with all the projects where the page is not a subpage of the index (there are 8769 of these). I also extracted the number of pages of the file, in order to take into account pages not yet created (I did not find how to get this data from the database directly, I had to scrap the HTML of the commons page).

Koxinga (talk) 08:45, 9 July 2017 (UTC)

From the perspective of "completeness" of works, we are interested in works that are nearly proofread, or nearly validated, that have not been edited for a period of time, so we can put resources to them. They are cheap wins with true value. If you are looking to see missing/non-created pages of a work, then you probably want to get a count of pages from the File: and compare that with the number of subpages of Index:. That would be a neat comparison as that would be another indicator of near completeness.

The other factoids, are interesting trivia, though I am not sure that they are particularly enlightening for the site, or our work — though I could just be considered a boring unexcitable, unromantic, task-focused fart. Noting that the stats about projects doesn't consider our multiple volume projects (EB1911, DNB, DMM, +++). Thinking of what would be useful: numbers of Index: works with counts for images missing, score missing, etc. so we could focus efforts, or promote efforts to assist completion. Numbers of edits on works is not relevant, though maybe date from creation to validation may have some social interest, though even that has dodginess of the work has advertising. We already track our validated and proofread works, and try to keep on top of transclusion status. [As said I may have the wrong focus for what is interesting to the trivia buffs.) — billinghurst sDrewth 13:19, 9 July 2017 (UTC)

Noting that pages remaining (not proofread) can be due to works having their advertising pages remaining, eg. Ballantyne's work above, so for a work like that, it has been marked as proofread (important), and we are tracking that its advertising is not done by a category. So pages unread in a proofread or validated work; whereas pages unread where small in a work not proofread is interesting. We are a complex beast. :-) — billinghurst sDrewth 13:25, 9 July 2017 (UTC)
Finding a work that had no pages remaining (ie. nothing to be proofread) for a work marked as "not proofread" is very useful as it enables us to review and reclassify as required by the review. — billinghurst sDrewth 14:03, 9 July 2017 (UTC)
Thank you for your comments. A few answers:
  • The trivia was to show the different possibilities, but I mostly aim to do something useful for project tracking and motivation of the different users. I know that at least for me, it would be motivating to see which works are being actively worked on, so that I can see progress when I come back to it, I know I can ask questions and exchange about the project, etc.
  • Yes, I use the actual number of pages of the uploaded file, so I can find the pages not yet created, I mostly consider them the same as the "not proofread" pages but it can be separated if needed.
  • I don't trust the "proofread", "validated", etc. flag in the index. It is manually set, so there can be mistakes in one direction or another. That's why I think it is useful to compare it to the actual situation of each page.
  • It is possible to remove the advertisement pages from my analysis, based on the <pagelist> tags, but we need to define a consistent marking for them. I saw some adv, adv., advt, advert (with a bonus "index to advert"), advertisement. Do we allow all of these or to we try to normalize?
  • I can take into account multi-volume projects and group them together, by looking at the Volumes part of the index page, especially the Category:Scanned volume navigation templates, I will look into it.
Koxinga (talk) 19:43, 9 July 2017 (UTC)
There is something wrong with the information gathered. Index:Popular Science Monthly Volume 12.djvu was completely validated a long time ago, and and the proofreading of Index:Travels in Mexico and life among the Mexicans.djvu was also completed, perhaps at the beginning of this month. — Ineuw talk 04:41, 12 July 2017 (UTC)
My analysis is based on a database dump, using the most recent one from July 1st. At the time of this dump, Page:Popular Science Monthly Volume 12.djvu/430 was not yet validated, it has been done after I posted this message. For Index:Travels in Mexico and life among the Mexicans.djvu, I did say that there was 0 page remaining, however, at the time of the dump, even if all the pages had been proofread, the index status was still "to be proofread". It has also been changed just after I posted this message. If there is an interest, it would be possible to use the recent changes to update the data more frequently, but judging from the lack of response here, it does not seem worth it.Koxinga (talk) 01:12, 13 July 2017 (UTC)
Thanks for clarifying. It's interesting. — Ineuw talk 09:50, 13 July 2017 (UTC)