Help:Internet Archive

From Wikisource
Jump to: navigation, search
Internet Archive
Shortcut:
H:IA
Guidelines to downloading files from and uploading files to the Internet Archive
Stylised image of Ancient Greek "post and lintel" architecture with four columns supporting a simple entablature and triangular pediment
The Internet Archive

The Internet Archive is a non-profit digital library that holds nearly 3 million digitised books as well as music, audio, video and other files. It is one of the main sources of DjVu files for use on Wikisource. As well as files based on their own scans, the Internet Archive will also derive files (including DjVu files) from scans uploaded by its users. This can be a useful way to convert user-made scans into a DjVu file compatible with Wikisource (as well as making the work available for others).

This help page focuses on DjVu files, because that is the most used file type on Wikisource, but the process can be used for any other file type available from the Internet Archive.

Contents

Getting files [edit]

Searching [edit]

  1. Go to The Internet Archive
  2. Search for the book (or other text) you want. The basic search has a text field and a drop-down list. Type the title of the book in the text field and set the drop-down to "Texts".
  3. Click "Go"
  4. If the correct files are found on the Archive, you should see it in the search results. If there are multiple appropriate files, select the one you deem the best. This is subjective, but a clear scan will work best for proofreading, so aim for the best quality available (also note that some scans may have dirt or writing on the pages, which may or may not make proofreading harder). Different scans may come from different editions. If so, it is up to you which you pick but the earliest edition available is a popular choice.
  5. If unsuccessful, you can also try following links, searching by subject, searching by author, or using the Advanced Search function.

DjVu file [edit]

1. On the left side of the details page, will be a box with the title "View the Book" as shown in Fig. 1.
2. Click on the "HTTP" link to get to the list of files. This is indicated by the red arrow in Fig. 1.


Internet Archive View Book Box 2.png
Fig. 1.
A basic form of the "View the Book" box, found in the Details pages of the Internet Archive.


3. This will open a list of files, as shown in Fig. 2.
4. Locate the file with the .djvu suffix. This is indicated by the red arrow in Fig. 2.
Other files can be downloaded instead of the DjVu. If required, proceed with the most appropriate file from the list.
  • An alternative format for text are PDF documents, with the .pdf suffix.
  • Audio files in the Ogg Vorbis format have the .ogg suffix.
  • Video files in the Ogg Theora format have the .ogv suffix.
  • The original scans are available from this list as well. In this example, the file sikhafghansinco00shahrich_jp2.zip is an archive of JPEG 2000 images of individual pages. This can sometimes be useful as it will contain high quality versions of illustrations, photographs and other elements of the book.
5. This is the file that needs to be uploaded to Wikimedia Commons. See Uploading (below).


Internet Archive files list 2.png
Fig. 2.
Example of a list of files on the Internet Archive.

Uploading [edit]

There are two main ways to upload the file to Wikimedia Commons.

One: Automatic transfer [edit]

Use the URL2Commons tool to automatically transfer the DjVu file from the Internet Archive to Wikimedia Commons.

  1. Refer to Help:URL2Commons for information on using the tool.
  2. Right click on the appropriate file in the Internet Archive file list and select "Copy Shortcut" or equivalent.
  3. Paste this into the top panel of the URL2Commons tool.
  4. Proceed as described in the URL2Commons help document.

Two: Manual download and upload [edit]

Download the file to your own computer, then upload it to Wikimedia Commons manually.

  1. To download, right click on the appropriate file in the Internet Archive file list and select "Save Target As.." or equivalent.
    This may take some time, depending on the size of the file.
    If you use download manager softwarte of any kind, follow the instructions for that software.
  2. Once downloaded, go to Wikimedia Commons' Upload Wizard (guided upload process with helpful steps) or Upload page (quicker but requires more knowledge of Commons' policies and methods).

Others [edit]

(Uploaders like Commonist.)

Adding files [edit]

Files can be added to the Internet Archive by any registered user. The following information is presented for ease of use and reference for Wikisource users. However, Wikisource is not affiliated with the Internet Archive and any or all of these stages may be changed by the Archive at any time. It is strongly recommended that anyone attempting this should refer to the Internet Archive's own instructions, and follow those above the steps listed here.

These instructions are:

The following Internet Archive blog posts might be useful as well:

Preparing the file [edit]

If uploading a collection of page scans:

  1. The page scans should each be in an image format. For example, JPEG format.
  2. The page scans should be named in the correct alphabetical order. It may be a good idea to use a naming format such as "MyScan001.jpg", "MyScan002.jpg" etc. If so, remember to use leading zeroes, otherwise page 10 will come after page 1 but before page 2.
  3. Make sure that the page scans are the only file in the folder you are using.
  4. Create a zip file of the folder containing your page scans. The file name should be in the format "Myscan_images.zip", where "Myscan" is whatever you want to call the file. The "_images" suffix is important; your scan may not derive properly later if this is omitted.

Files such as PDFs can just be uploaded as they are.

Uploading [edit]

  1. Log in to the Internet Archive.
  2. Click the "Upload" button at the top right of the screen.
  3. Click the "Share" button at the top right of the screen.
  4. Select the file to upload
  5. Fill in the infotiona requested and choose an appropriate licence (this will be similar to the licences on Wikisource).
    • Title (required)
    • Description (required)
    • Keywords (required)
    • Author
    • Creative Commons Licence or Public Domain Mark
  6. Wait for the upload to complete.
  7. Click the "Share my File(s)" button.
  8. You will see the message "Please wait while your page is created..." then "Your Page is Ready!" followed by link to page.
  9. Clicking the link will result in a "Your item is not yet public" message.
  10. Pick a collection for your file. The options will include "movie, audio, text, etree" and "community video, community audio, community text". You will probably be using "text" and "community text". Select the appropriate collection and click the "Submit" to the right.
    • At this stage, you might be told to wait and come back later. This text is: "Your item is in the process of being derived, and you may not replace the metadata until the derive has finished (because any changes queued now would roll back those being made by the derive). Please try this page again after your item has finished deriving. [Item History]" In this case, simply follow those instructions: try again later.
  11. In the Metadata Editor complete more information (including the information from earlier stages).
  12. Click the Submit button. This will enter the file into log. This will take some time to complete

Deriving [edit]

Derivation can take up to 24 hours. This can be monitored either through the filename or the 'Contributions' page which is accessible from the home page. The various formats of the work should automatically be derived from the files that were uploaded. If this has not occurred, the "View the book" in the left-hand sidebar will not be showing the various available formats (DjVu, EPUB, Kindle, Daisy etc). Derivation failure can have numerous reasons, many of which are internal to IA and have nothing to do with the uploaded file. However, one common cause of failure is if the file resolution is too high for IA software tools. As of this writing (December 9, 2012), a resolution greater than 300ppi is not recommended.

First, force the derivation from the file page:

  1. Click "Edit item"
  2. You will see two choices: "change the information" and "change the files". Click "change the information".
  3. Click "Item Manager"
  4. Click "Derive"

In case this fails:

  1. Go to the IA home page and click on 'Contributions'.
  2. Click on 'See your contribution tasks that are not yet completed.'
  3. The screen will display a list similar to this image.
  4. If the derivation process is still running, then wait.
  5. If the process has stopped and marked red, and 'waiting for Admin', then email to info@archive.org, advise them of the problem and request restart of the derivation process. Be sure to include the uploaded page link.