User:SnowyCinema/QT.py/Workflow

From Wikisource
Jump to navigation Jump to search
  • How can we keep track of how much time it took for each stage of the process below? Is there a timer we can build?

Preparation

[edit]
  • How can we create or modify author pages?



  • How can we create or connect a Wikidata item for that author?



  • How can we create and maintain author disambiguation pages?
    • What if it needs to be in the mainspace? See also general disambig section



  • How can we find the best and most original scan of the work?




  • How can we create a versions page if necessary linking to other versions of the work?



  • How can we download the scan of the work?
    • What if it's from HathiTrust?
    • What if it's from Google Books?
    • What if it's from another site?



  • How can we upload the scan to Wikimedia Commons?



  • How can we crop, process, and save all images in the work?

Transcription

[edit]
  • How can we take the OCR and initially correct likely scannos?
    • What if it was from Gutenberg? What corrections need to be made if so?



  • If a certain scanno etc. needs mass-fixing throughout the text, how can this be applied quickly?
    • QT markup needs to be separated out first, see below



  • On that note, how can QT markup be explicitly identified by the parser, and be separated from the rest of the text if need be?



  • How can we quickly and at least almost accurately split the OCR content page by page before beginning transcription?
    • What if the text layer was from Gutenberg and pages are listed?
    • What if it's from Gutenberg and they aren't listed, so a match and split method needs to be employed?
    • What if it's from IA?
    • What if it's from Google Books itself (highly not recommended)?



  • How can we properly implement a page list?
    • The ---- system basically
    • So make a subpage of documentation on that



  • How can we deal with running header and footer automation? How can we set up rules for that?



  • How can we get a quick MediaWiki preview of what a page, chapter, or maybe the entire work will look like when transcluded?
    • Jump to parts of the verification process with QT parsing errors?

Verification

[edit]
  • How can we identify and fix hyphenation inconsistencies?



  • How can we identify and correct very likely scannos in order?



  • How can we keep track of different types of scannos, how likely it is that they are transcription errors, and report on them in the long run?



  • How can we identify and fix QT parsing errors?



  • How can we identify likely author names and work names used in the transcription for linking?



  • How can we pull and use existing data from works by the same author to use for potential finding of hyphenation inconsistencies with those works only?



Transclusion

[edit]
  • How can we upload our work images to Commons?
    • Use descriptive file names based on labels given within the work
    • Or DEFAULT file name is <WORK NAME> + iterative number, if no label is given
    • Automatically insert description as "This image was cropped from <WORK NAME> (<WORK YEAR>) by <AUTHOR>."



  • How can we insert the correct information into the Index page?



  • How can we copy the correct content into the Page namespace?




  • How can we transclude the content into the mainspace?
    • Depends somewhat on the type of work...



  • How can we create and manage disambiguation pages and redirects?



  • How can we create a Wikidata item for both WORK and VERSION?
    • How can we connect those Wikidata items to the correct pages?
    • How can we identify possible duplicate Wikidata items based on similar titles and similar entered information, and what should we do when that is found?


  • How can we add a list of poems/short stories/articles as works from a collection to an author page?



  • How can we be notified from afar if there is an obvious error that is thrown with the transclusion?
    • Maybe a Discord bot?

Review

[edit]
  • How can we quickly review all automated transclusion edits, especially the ones that could be wrong?
    • Review by task # with a WS gadget



  • How can we identify if the Wikidata item has a Wikipedia page to give our new Wikisource transcription free advertising?



  • How can we even quicken the process of writing into New texts? (IT IS AGAINST THE [UNSAID] RULES TO USE A BOT TO ADD TO NEW TEXTS DIRECTLY, but you can autogenerate something and add that text in manually with your main account)
    • Easy peasy.


Retrospection

[edit]
  • How and where can we properly log the actions performed during this job, when and where they were done, and what the revision code is?



  • How and where can data on the transcription, not the bot task, be logged?
    • That's separate from the job, because the job will be an attempt which could have to be aborted.
    • A transcription could also be postponed or aborted.
    • Data on the work itself but also STATISTICS on how the transcription was done (time it took, time each element took, page-by-page estimation, length of each page, number of pages, etc.) that can be reviewed to see general and specific performance
    • Pages proofread in what time compared to the amount of change you had to make to the originally barely past OCR'd text



  • How can we find and add probable scannos collected during this session to the primary collection list (see Transcription)?



  • What else can we learn from this transcription project? How else could the software or workflow of the QT system be improved?
    • Are there any scannos not commonly occurring, or that cause other problems, that might need to be removed from the list?



  • Is there any way this transcription experience could help build better QT documentation?


I, the copyright holder of this work, hereby release it into the public domain. This applies worldwide.

In case this is not legally possible:

I grant anyone the right to use this work for any purpose, without any conditions, unless such conditions are required by law.

Public domainPublic domainfalsefalse