Help:Importing

From Wikisource
Jump to navigation Jump to search
Importing from other projects

There are several ways to import content from other projects, and using the right method is important. Doing it wrong or choosing the wrong method can cause significant problems, so this help page tries to provide some guidance to help you do it right.

Main import methods[edit]

These are the main ways content can be imported, and a brief explanation of what each method is appropriate for.

Method Used for…
Cut & paste Only trivial things where attributing the original contributors does not matter. Very rarely is cut & paste the right approach when we need to transfer content between projects.
imagetransfer.py Pywikibot's imagetransfer.py script can transfer files from Commons to Wikisource, preserving the upload history on the file description page. It does not preserve the actual revision history.
FileImporter Imports files to Commons, and for files this is the preferred method to transfer to Commons. Unfortunately does not work the other way, from Commons to Wikisource. Support for this has been requested, but until that materialises FileImporter is mentioned here mostly for completeness.
Special:Import Good for importing a template or Lua module from another project, or in the rare cases where content that is in scope for Wikisource has been added to a different project. Special:Import is powerful, and thus also dangerous, and should only be used by someone that knows what they are doing. Does not support importing files from Commons!

Cut & paste[edit]

Cut & paste of text, including wikimarkup, template code, Lua module code, etc., works just fine; but only for trivial things. Anything that's complicated (multiple interdependent templates, stylesheets, etc.) will probably be unwieldy.

More importantly, almost all content on Wikimedia projects is licensed under a Creative Commons BY licence, that is it requires attribution. If you cut & paste content you have to take particular care to provide such attribution for every single contributor of the original content. The easiest way to do that is to provide a link back to the original content in the edit summary so that the original's revision history serves as attribution. This is easy to get wrong and then we have to start putting huge infoboxes on the talk page to provide the link.

So the rule of thumb is to not use cut & paste for importing content unless it is really trivial.

imagetransfer.py[edit]

Pywikibot's imagetransfer.py script can transfer files from Commons to Wikisource, preserving the upload history on the file description page. You'll need the reupload-shared permission on your account to use it, since by definition the file will already exist on Wikisource as a shared file (i.e. one that exists on Commons but is visible here). That means that by default only admins and members of the Commons media local uploader group can use it (the latter is mostly used for bots and bot-like entities), but you can request the permission by seeking community consensus and posting a request on the administrators' noticeboard.

A typical usage would be:

$ pwb.py imagetransfer -lang:commons -family:commons "File:Laws of the Earliest English Kings.djvu" -tolang:en -tofamily:wikisource -keepname -force_if_shared

Several admins can do this with their existing tools, and there is a shared Wikisource-bot running on Toolforge that technical contributors (mostly admins) has access to. Anyone with the requisite technical skills can also use PAWS (Wikimedia deployment of Jupyter Notebooks) to run a shell with access to pywikibot (but see caveat about reupload-shared user right above).

If you need a file transferred you can request it on Wikisource:Scriptorium, Wikisource:Scriptorium/Help, or WS:AN depending on what makes sense for your particular request.

Special:Import[edit]

Special:Import is restricted to admins only for a reason: it is very powerful, and hence also very dangerous. If you click the wrong thing on that page you can end up causing massive and very visible breakage on nearly every single page on Wikisource, and make revision histories completely unreadable in a way that in practice isn't possible to fix. Even admins should not try to use this tool without being very sure they know what they are doing!

Note especially: Special:Import does not import files from Commons! It will import the file's description page, but not the file itself, leading to what appears to be a "broken" file. For importing files (DjVu, PDF, JPEG, PNG, etc.) see #imagetransfer.py above.

Special:Import operates on revisions of wikipages: it reads the list of revisions from the database for the foreign project, and inserts those revisions into the revision history in Wikisource's database. This can cause all sorts of weird and non-intuitive behaviour. For example, if you are importing a new version of a template from Commons or Wikipedia, and the local copy has changes that are newer than the newest revisions of the imported template, nothing will appear to happen. Special:Import imported the revisions from the other wiki but inserted them chronologically where they belong in the revision history; and since none of them were the latest revision the later local revision wins. But the opposite also holds true: if the foreign wiki has revisions later than the local template, the foreign template will overwrite any local modifications.

This has scary implications when you use the "Include all templates and transcluded pages" option. Trying to import, e.g., a file from Commons will instead import the file description page, every template used on that page, and every template and Scribunto module they use; overwriting our local copies of these templates and modules if Commons has edited them more recently. In one instance of this happening, approximately 60 templates and modules were inadvertently imported. In that case, luckily, most of them were not in use or had more recent local edits; but it still imported several hundred old revisions that will permanently pollute the revision histories of the pages affected. Untangling the mess after such an import can be devilishly tricky since by its nature it messes up the revision history, making it very difficult to tell what revisions are what or even reverting any change.

Thus, the long and short of it is: even admins should not assume they can use this function unless they have prior experience. Ask for guidance the first few times, and read this page carefully.

Special:Import usage[edit]

The main user interface for Special:Import looks like this:

The main options you always have to use are "Source wiki" and "Source page". The first is a list of the interwiki prefixes of the Wikimedia projects from which importing has been enabled and the second is the page name including namespace prefix of the wikipage on that wiki you want to import. A little further down there is also a "Comment" text field, which will essentially show up as the "Edit summary" for the import in affected pages' revision history. Apart from the need to remember interwiki prefixes these options are relatively straightforward.

Then there's "Copy all the revisions for this page". This does sort of what it says on the tin, but in general you'll not want to use it. It fills up the revision history for the local template with every single edit on the remote project. This is bad enough when you mix local and remote edits, but it gets even worse when you import from multiple projects. Some of our templates now have several hundred apparent edits, that are a mix of edits made on Commons, on English Wikipedia, and local edits. It is impossible to disentangle them and understand the actual history of the code. In every normal circumstance you should leave this option unchecked. When it is off all the remote changes will be bundled into a single local revision, with an edit summary of the form "4 revisions imported from w:Template:Example", along with the "Comment" you provided above. You'll then end up with a nice linear revision history like "Initial import from Wikipedia" → "re-sync from latest version on Wikipedia" → "Local edit to fix the flurble gizmo" → "Switch to importing the Commons version" → "re-apply local flurble fixes".

Then there's "Include all templates and transcluded pages". Do not use this option! When this option is on, every single template used on the remote page, and every single template and module those templates use, will be imported locally and overwrite our existing templates. There are some few situations where you might theoretically want to use this option, but even in those cases it is much easier and less fraught to just do it manually (import each thing you want one by one). Unless you are extremely certain you know what you are doing, do not use this option!

"Assign edits to local users where the named user exists locally" makes the revision history show "Example" instead of "w > Example" when importing remote edits by Example. It should usually be turned on, but only really matters when you also use "Copy all the revisions for this page" (which you generally shouldn't do).

Finally, the three namespace-related options give various ways to adapt the import if you're not just doing a straightforward import of a template. The default is "Import to original namespace" which just copies things one to one: c:Template:ExampleTemplate:Example. "Import to a namespace" will import the wiki page specified in the "Source page" field to the local namespace you choose, but will still overwrite templates etc. if you turn on "Include all templates and transcluded pages". Typically you'd use this to, for example, import: w:Wikipedia:Blocking policyWikisource:Blocking policy. "Import as subpages of the following page" puts everything imported, including namespace prefixes, into subpages: w:Draft:The Great GatsbyUser:Example/Draft:The Great Gatsby.

See also[edit]