User:Chris55/Essay1

From Wikisource
Jump to: navigation, search

My Interests[edit]

I'm interested in the use of Wikisource for high-quality ebooks. There are various high quantity sources, such as Google and archive.org but they mainly depend on scanning and OCR and have no community for the real work of proofreading and tidying. Then there's Gutenberg, which is rather secretive and I don't know what's happening to it since Michael Hart died. But I like the discipline that's been established here for proofreading and checking.

Now one can easily get ebooks out of Wikisource (EPUB at least), I'd like to help to make it more accessible to the newcomer. At the moment the user documentation is severely lacking and that's where I'll put my effort, hoping that people will take enough notice to correct things when I get them wrong. I started by puzzling as to why all the thousands of pages in Portal: are invisible to the newcomer. Is there any organisation or have they just been created for the joy of creating? I'm still puzzled why the only links on the front page are to categories not portals.

I realise the community is very small and has done a magnificent job but I think it's chicken and egg at the moment. I've taken two trial dips in the past and had decided it wasn't worth the effort and I doubt I'm alone. If we can encourage people to visit and use the books here I'm sure the word will spread and the community will grow as it did in other wiki projects. But at the moment I haven't the first idea how many real documents there are in Wikisource. The stats say 785,721 content pages but does that mean books or chapters/articles; does it include disambiguation and pages in the non-Main namespaces? i.e. is the number meaningful within an order of magnitude? How many articles in the EB are actually there?

Even the guidance about what you can put in is vague. Is it encouraged to down/upload djVUs and text from archive.org/Google and get to work on them, or are there issues with that? I'm sure I'll be asking these questions in the Scriptorium soon if I can't find answers first.


So what is the state of health of Wikisource? After a couple of posts I made on the Scriptorium I decided to look at the stats on Wikimedia and reproduce an abstract of a few below. To avoid being overwhelmed but to give some perspective I've concentrated on the top 5 languages.

Wikisource-growth.png

Ok, by Wikipedia standards it's tiny, but 11M views a month is something most websites would be happy about. I'm not sure what happened at the beginning of 2010 - but it may be a lot of publicity caused a big spike which wasn't maintained. But since then the growth has been steady. It's interesting that the French site has a similar spike, but not the other languages. Russia is, I believe, a special case as it's been used to publish a lot of government documents.

The pattern of edits per month is very volatile and suggests that it's largely data driven: when there's a new batch of material there's a lot of activity, but there aren't too many editors doing the long steady task of improving documents. A look at the number of new documents lends some support to this: there was a large peak in March 2011 for English. In similar projects I've been involved in there's a lot of setting of goals and highlighting of those who've done stuff, and maybe that's what's needed here to keep people motivated.

Wikisource-editors.png

The number of active members is tiny compared with Wikipedia - only about 70 currently (though 350 have done 1 edit/month). The number is 1/350 of that of WP, but the number of documents (215,412) is only 1/13 of that on WP. Incidentally, that number (articles) seems more meaningful to me than the widely quoted figure of 786,503 for the number of pages.

It's interesting that the core set of editors ("very active librarians") is the only graph in which English doesn't stand out from the other languages: we have quite a small nucleus and even more worrying it has shrunk from 20 to 7 in the last year. Correction: I am informed that these statistics only record updates in the main namespace. If we look at the local stats we get a more reliable answer: there are 40 in the very active category, plus one hyper-active bot.