User:Inductiveload

Inductiveload User Area
Main User Page	Talk Page	Talk archives	Contributions
ELCOME to my user page. Have a look around my galleries and contributions if you like, or leave messages on my talk page. If you can suggest improvements to my own work, tell me. Don't let any poor quality work hang around!
Wikisource user page	Commons user page	Wikibooks user page	Wikipedia user page
Languages: (native), (basic), (basic), (basic), (like, totally fluent), (passable), (basic), Lua (basic)

Those who proofread don't mind, and those who mind don't proofread.

The Monthly Challenge for October contains 70 works. You can help by reading the guide and contributing to the current challenge.

This month:

Pages processed: 4301 (143.4% of target)
Avg. pages/day: 139
Yesterday: 140

Last month:

Pages processed: 4789 (159.6% of target)
Avg. pages/day: 160

Ready for export (712) works

Ask me to do things

I will need information:

Tools and scripts

User preferences and custom javascripts:

common.js
monobook.css
Regexp toolbar.js
Running header.js - completes a {{rh}} template by copying the header from the last-but-one page and incrementing the number.
Custom toolbar buttons.js
InlinePagenums.js - toggle display of inline pagenums
ColourBackground.js - shade text boxes and background in edit mode to avoid eye strain.
Visibility.js - custom visibility switching
index preview.js - preview index page thumbnails with alt-click on the page-list

Commons scripts

commons:User:Inductiveload/basic upload templates.js: add buttons for preloading book templates on the Basic Upload Form

Popups Reloaded

Popups, but way better.

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/popups_reloaded.js&action=raw&ctype=text/javascript');
mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/popups_reloaded.css&action=raw&ctype=text/javascript', "text/css");

Quick Access

Keyboard-driven tool access

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/quick_access.js&action=raw&ctype=text/javascript');

Cleanup (Alpha)

This is a much more in-depth version of User:Samwilson/PageCleanUp.js that includes hundreds of regexes for scannos that have unambiguous or almost-certain corrections. For example, no work in English ends in -abte: this is almost certainly -able

This is an alpha-level tool. Configuration, especially, is likely to change.

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/cleanup.js&action=raw&ctype=text/javascript');

This tool will work with default configurations, but it is more reliable with configurations.

Page Carousel

Quick buttons to load the next/previous pages of a book.

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/page carousel.js&action=raw&ctype=text/javascript');

Save/Load Actions

Run Javascript actions on page save or page load. Can be used to implement custom text transforms, for example from [[Link/Foo|Foo]] to [[Link/Foo|Foo]].

This tool probably needs configuration in your JS. Just adding it won't do anything use. Consult the documentation for more information.

Preview markup

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/show_markup.js&action=raw&ctype=text/javascript');
mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/show_markup.css&action=raw&ctype=text/javascript', "text/css");

Maintenance Wizard and Replacer

Perform maintenance without going into edit mode.

maintain.js

// maintain script has no purpose in special
if (mw.config.get("wgCanonicalNamespace") !== "Special") {
  mw.loader.using(['ext.gadget.utils-difference', 'mediawiki.util', 'mediawiki.api',
      'oojs-ui-core', 'oojs-ui-windows', 'oojs-ui-widgets']).done(function() {
    mw.loader.load("/w/index.php?title=User:Inductiveload/maintain.js&action=raw&ctype=text/javascript");
    mw.loader.load("/w/index.php?title=User:Inductiveload/maintain-ws-tools.js&action=raw&ctype=text/javascript");
  });
}

Jump to file

Add a button to go to the book file at Commons from the Index or Page namespace, and to the transcluding page from the Page namespace

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/jump to file/load.js&action=raw&ctype=text/javascript');

MiniPane

Add a mini image page in the Page namespace for keeping the proofread text nearer the input box.

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/MiniPane.js&action=raw&ctype=text/javascript');

IaUploadPopup

Add a small popup to assist uploading works at the IA via IA-Upload

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/IaUploadPopup.js&action=raw&ctype=text/javascript');

Scan Transcludes

Show, in the pagelist grid, which pages are transcluded and which are not, colouring the pages according to expected transclusion status.

scan transcludes.js

mw.loader.load("/w/index.php?title=User:Inductiveload/scan_transcludes.js&action=raw&ctype=text/javascript");

ActivePageAlert

Display an icon if a page has been edited recently, with the ability to vary the definition of "recently" on a per-other-user basis.

mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/ActivePageAlert.js&action=raw&ctype=text/javascript');

Tweaks

Show an indicator when a script loads

I use this to check my local script is loading

$(function() {
  $(".mw-indicators").append($("<img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/OOjs_UI_icon_chem.svg/20px-OOjs_UI_icon_chem.svg.png\">"));
});

Add `nocache=1` to WS-export sidebar links

$(function() {
  $("#p-coll-print_export a[href*='ws-export.wmcloud.org/?format'").each(function(i, a) {
    $(a).attr("href", $(a).attr("href") + "&nocache=1");
  });
});

Add a button to the main page POTM to edit it

if (mw.config.get("wgTitle") === "Main Page") {
  $(function() {
    $(".collaboration-potm tr:first-child td:last-child").prepend(
      $("<a>")
        .css({float: "right", "font-size": "70%"})
        .attr("href", mw.config.get("wgScript") + "?title=Module:PotM/data&action=edit")
        .text("[+]")
    )
  });
}

Add a link your your bot's contributions

$( function () {
  var botname = "InductiveBot";
  mw.util.addPortletLink(
    'p-personal',
    '/wiki/Special:Contributions/' + botname,
    botname,
    'pt-botcontribs',
    'Contributions by ' + botname,
    '',
    '#pt-logout'
  );
} );

Special icon for Internet Archive links

a.external[href^="https://archive.org"] {
    background: url("//upload.wikimedia.org/wikipedia/commons/thumb/1/13/Internet_Archive_7x8px.svg/7px-Internet_Archive_7x8px.svg.png") no-repeat right;
    /* @noflip */
    padding-right: 10px;
}

Add a custom panel of special characters to the WikiEditor toolbar

	var addCharacters = function () {

		$( '#wpTextbox1' ).wikiEditor( 'addToToolbar', {
			section: 'characters',
			pages: {
				yazidi: {
					layout: 'characters',
					label: 'Yazidi',
					characters: [ 'Ḍ', 'ḍ', 'Ḳ', 'ḳ', 'Ṣ', 'ṣ', 'Ḫ', 'ḫ', 'Š', 'š', 'â', 'î' ]
				}
			}
		} );
	};

	/* Check if view is in edit mode and that the required modules are available.
 * Then, customize the toolbar … */
	if ( [ 'edit', 'submit' ].indexOf( mw.config.get( 'wgAction' ) ) !== -1 ) {
		mw.loader.using( 'user.options' ).then( function () {
		// This can be the string "0" if the user disabled the preference
		// ([[phab:T54542#555387]])
			if ( mw.user.options.get( 'usebetatoolbar' ) === 1 ) {
				$.when(
					mw.loader.using( 'ext.wikiEditor' ), $.ready
				).then( function () {
					addCharacters();
				} );
			}
		} );
	}

Show microformat data under the header

#ws-data {
    display: block !important;
}

#ws-data > span {
    margin-right: 1ex;
    padding: 0 4px;
    background-color: #e8e8e8;
    border: 1px solid #aaa;
    border-radius: 4px;
}

"Steal" the preview accesskey (p) for the "preview with this template" button instead

	$( function () {
		var sandboxPrev = $( 'input[name="wpTemplateSandboxPreview"]' );

		if ( sandboxPrev.length ) {
			var ak = 'p';
			// unbind other users of this accessKey
			$ ( '*[accessKey="' + ak + '"]' )
				.attr( 'accessKey', null );

			// add it to our button
			sandboxPrev.attr( 'accessKey', ak );
		}
	} );

Sticky header without hiding all the buttons (Vector)

.skin-vector-legacy #mw-head {
    position: fixed;
    background-image: linear-gradient(to bottom,#ffffff 50%,#f6f6f6 100%);
}

Browser UserScripts

Add the Hathi IDs to the catalog page tables

// ==UserScript==
// @name Add Id to Hathi Catalog
// @match https://catalog.hathitrust.org/*
// @version 1.1
// @updateURL https://gist.github.com/inductiveload/fc64a5d528654b76fe72e702887773a4#file-add_ids_to_hathi_catalog-js
// ==/UserScript==


/* Creates a simple cell element with the given text content */
function create_cell_with_text(tag, text) {
    const cell = document.createElement(tag);
    cell.textContent += text;
    return cell;
}

window.addEventListener('load', function () {
    const entries = document.querySelectorAll('.viewability-table tr a');

    for (const link of entries) {
        const href = link.getAttribute('href');
        const id = href.split('/').at(-1);
        const row = link.closest('tr');

        row.appendChild(create_cell_with_text('td', id));
    }

    document.querySelector('.viewability-table thead tr')
        .appendChild(create_cell_with_text('th', 'ID'));
});

One-liners

Pywikibot

List all indexes which have a page with a linter error

python pwb.py listpages -ns:Page -linter:misnested-tag -format:"Index:{page.title}" | sed -E 's/\/[0-9]+$//' | uniq

Image Processing

Convert all files of type X to type Y, in parallel

find . -type f  -name '*.png' -print0 | parallel -0 convert {} {.}.pbm

Make an image showing only not-black and not-white pixels

convert input.png -colorspace rgb -fill white -opaque black -fill red +opaque output.png

PDF/DJVU processing

Fix a PDF that chokes ImageMagic due to "bad streams"

gs -o generated.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input.pdf

Functions

Some extra functions that might be handy for other scripts:

Roman numerals.js - couple of functions that might be handy

The rant zone

What does a $130 million budget not get you?

phab:T95878 (filed 2015) There's still no mobile editing for Wikidata, after 9 years. Seriously.
- phab:T259183 The totally-not-beta UI is also still trash on "narrow" screens.
~~phab:T278104 Commons uploading is still totally broken for files over 100MB and no one cares, even though it's blocking IA Upload all the time and also preventing various other uploads~~ Fixed after 7 months.
phab:T288980 ~~WVUI~~ Codex might one day get usable by general folk?
OOUI is now not only barely documented and horrific to actually use (OK actually it's pretty good from the server side), but also no-one cares because it's going to get "one day, promise" replaced by WVUI
VisualEditor (aka the 2017 editor!) is still completely useless, and not just at Wikisource
Literally every data model is still work in progress for both WD and SDC and no one cares.
phab:T121646 (filed 2016) There's still no API for evicting data from local storage.
There's still no guidance for best practices for gadgets. None.
The chat ecosystem is completed fragmented into siloed commercial apps and trying to bridge them together is, at best, shunted to "Volunteer time". Just pay for a darn Matrix homeserver like every other FOSS project and stop pretending like everyone will use IRC in 2021 instead of leaving for Discord and Telegram. Even the WMF doesn't really use IRC, it hides its machinations on Slack so no-one can see what the hell is going on.
Speaking of, why does Community Tech not work in the open?

But they still need your cash for...something? And the begging banner will use every dark pattern they can fit in there.

And a textbook example of the Mrs Micawber principle (Annual income twenty pounds, annual expenditure nineteen nineteen six, result happiness. Annual income twenty pounds, annual expenditure twenty pounds nought and six, result misery):

Tasks created in (2021-12): 1617
Tasks closed in (2021-12): 1325
Open and stalled tasks in total: 49056

Median age in days of open tasks by priority:

Unbreak now: 24
Needs Triage: 726
High: 1033
Normal: 1601
Low: 2210
Lowest: 2290

The counter-rant zone (aka why maybe it's not all bad)

The Special:APISandbox is brilliant
The Wikipedia Library is brilliant
Commons providing basically infinite storage is brilliant
Toolforge and WMCS is brilliant
Site Reliability knows where their towels are
An API that allows Pywikibot can't be all that bad

Maintenance and reports

Below are lists of pages in Wikisource which are useful for various purposes. All of these could be out of date. If you really need up-to-date reports, just leave me a note, and I will do it as soon as I can.

templates A list of all templates in use on enWS, along with links and usage counts.
wikisource pages A list of all Wikisource namespace pages.
portals A list of all Portal pages.
ws-portal redirects A list of all Wikisource pages which redirect to Portal pages. No pages should link to these.
ws-wp no backlink A list of Wikisource pages linking to Wikipedia pages which do not link back here.
false root pages A list of pages that should be subpages but aren't.

site-css-js: in-progress CSS tidying-up - look here for CSS and/or JS moved out of MediaWiki namespace (rather than being deleted)

SPARQL: Useful SPARQL queries.
HTML processing: Useful HTML transforms for extracting data

Live thumbnail generation times

petscan:19363685: Proofread indexes not transcluded
petscan:19363689: Validated indexes not transcluded

Bot activities

I operate a bot, InductiveBot, which performs minor maintenance tasks. It is based on pywikipedia and is quite flexible. If you have a specific request, please let me know on my talk page, and I'll see what I can do!

InductiveBot information, containing information about custom scripts runnig over pywikipedia, etc.

MW dev

Run extension linter

docker-compose exec mediawiki bash -c "cd extensions/ProofreadPage && composer install"
docker-compose exec mediawiki bash -c "cd extensions/ProofreadPage && composer test"
docker-compose exec mediawiki bash -c "cd extensions/ProofreadPage && composer phan"

Run extension parser tests

docker-compose exec mediawiki php tests/parser/parserTests.php --file=extensions/ProofreadPage/tests/parser/proofreadpage_pages_pagelist.txt

Or to run all the tests in a directory:

docker-compose exec mediawiki sh -c 'find extensions/ProofreadPage/tests -name "*.txt" -exec php tests/parser/parserTests.php --file={} \;'

Run extension unit tests

docker-compose exec mediawiki php tests/phpunit/phpunit.php extensions/ProofreadPage/tests/phpunit

Run linter

npm run-script test

Snippets

PHP: Logging

\MWDebug::log('foobar');

PDF munging

Extract page labels from a PDF → JSON

For meanings of /P, /S, /St see §8.3.1 Page Labels in the PDF Spec

qpdf --json --object-streams=disable ss.pdf | jq '[ .pages[] | .label ]'

Pywikibot

Transfer an image from Commons to enWS

./pwb.py imagetransfer -site:commons:commons -tosite:wikisource:en -keepname -force_if_shared "File:Foobar.djvu"

Random links

mw.hook fire points: https://codesearch.wmcloud.org/search/?q=mw%5C.hook(.*)%5C.fire&i=nope&files=&excludeFiles=&repos=
Current lag: https://en.wikisource.org/w/api.php?action=query&titles=MediaWiki&format=json&maxlag=-1
All wikis edit firehose: https://event-streams.toolforge.org/
Wikisource Image Uploader uploads at Commons: https://commons.wikimedia.org/wiki/Special:RecentChanges?tagfilter=OAuth+CID%3A+2348
Zuul Status (Gerrit → Jenkins pipeline): https://integration.wikimedia.org/zuul/
Thumbor thumbnails - Grafana dashboard: https://grafana.wikimedia.org/d/Pukjw6cWk/thumbor
Upload files to Phabricator: https://phabricator.wikimedia.org/file/upload/

Useful things to share

These are some useful scripts I have hacked together. I guarantee nothing! They are certainly not always neatly coded or structured, but they work for quick and dirty jobs.

Script development: general notes on JS script development
Wikisource in Docker: setting up a dev Wikisource in Docker

Universal batch image to DJVU converter. This script takes JPG, GIF, PNG, TIFF and anything else that Imagemagick can convert to PPM.
DJVU OCRing script which uses Tesseract to OCR and insert a text layer into a DJVU
Pagewise DJVU OCR extractor
Index page tabulator, creates tables of individual files for use in collecting files into an index page. See for example Index:The Complete Collection of Pictures & Songs by Randolph Caldecott.jpg.
Template usage tabulator. This script generates a table of all templates, along with the number of uses. Results can be found at User:Inductiveload/templates. Ask me if you want it regenerating, but bear in mind that it is a lot of requests to the server.
Page namespace editor A simple script to decompose Page: namespace pages, perform operations on the header, footer, and body separately, and reupload.
Page shifter A script to shift a set of Page: pages within the same index, or move to a different index.
PDF page converter A shell script to convert a PDF to images (threaded and doesn't run out of memory and die after a few hundred pages like convert can do)
Image splitter A Python script to split images in two. This is useful if you have books scanned at two-page spreads.
Page concatenator A Python/Pywikipedia script to grab a bunch of pages and string them together. Good for assembling a complete text out of many chapter subpages prior to match and split.
Archive.org API How to use the Internet Archive S3-like API to upload large files, instead of the flaky web-client.
Move to subpage Python script to bulk-move pages to subpages. Documentation on how to drive it is at /Requests/Moves to subpages.
/Scripts/upload_image.py Uploads images for works based on a YAML data file

Tesseract retraining project

Very slowly, I am working on a retraining of some models for Tesseract: Tesseract

General Python scripts

Integer to Roman numerals converter (eg. 11 -> XI)

GIMP scripts

The best one:
- Whiten-background.scm Gimp script to remove the background of an image. An adaptation of a script by Leonid Koninin. This is fairly harsh on some images, so use with care.

Less good ones:
Remove-paper-texture.scm Gimp script to remove the paper background from a scan of a black and white image by a pretty brutal adjustment of the levels
Remove-background-colour.scm Gimp script to remove a flat background colour from an image. This is a fairly brutal algorithm, use with care on delicate images. Essentially, this is just a "bundle" of "select colour, erase the colour, and desaturate. It'll likely be better to use one of the above scripts, but this one is fairly instructive from a script-writing point of view.

Bibliographic junk

http://www.rdaregistry.info/termList/
- Wikidata items with RDA term mappings

Data module ecosystem

None of these work fully. Yet.

Core modules

Module:Work data: Provides data about a work in general.
- Module:Work data/properties: Core property ID maps
- Module:Work license: provides information about the license of a work

Client modules

Module:Work link

One touch template wrapping with Autohotkey

If you use Autohotkey (and you should be), the following is a useful function that lets you wrap the current mouse selection in a template, which saves you having to paste in the contents.

F2 & s :: wrapTemplate("sc") ; small caps

wrapTemplate( name )  
{
    front :="{{}{{}" . name . "|"
    back :="{}}{}}"  
    wrapTags( front, back) 
    return
}

wrapTags( front, back ) 
{
    AutoTrim Off               ; Retain any leading and trailing whitespace on the clipboard.
    ClipSaved := ClipboardAll  ; Save the entire clipboard so we can restore it when we're done
    clipboard =                ; clear the clipboard
    SendInput ^x               ; cut the selection to the clipboard
    ClipWait                   ; wait for the clipboard to contain something
    SendInput %front%%clipboard%%back% ; Output what was selected, surrounded by front and back
    Clipboard := ClipSaved     ; Restore the original clipboard
    ClipSaved =                ; Free the memory in case the clipboard was very large.
    return
}

Regular expressions

Function	Search pattern	Replacement Pattern
Remove single newlines. Useful for OCR'd text	/([^\n])\n([^\n])/g	'$1 $2'
Convert relative links to static links. Useful when putting a TOC in the Page: namespace.	(/\[\[\/(.*)\/\]\]/g	'\[\[$1\\|$1\]\]'

My requests

Works I'm keeping an eye out for.

Periodicals

Scientific Machinist (only have v. 10)
The Optician and Scientific Instrument Maker (only have v. 62)
Cornell Studies in Classical Philology (lots on the IA)

Technical wishlist

Some things I'd like to see done (that isn't actual proofreading). Some of it is unimportant, some of it may be controversial and un-discussed, but would be nice to address and tighten up.

Get dynamic layouts to work for non scan-backed works and scrap {{prose}} and other hard-coded formatting.
Fix poem tags - only by having all lines as p or span-tags can we have hanging-indented continuation lines like 95% of all printed poems are. Stanzas should be divs . Might need a whole new tag in the poem extension, but might not be that hard?(???)
Train Tesseract specifically for 1700s-style printing esp. with long-s
Allow match-and-split to match to PDFs (since there are now ~1m PDFs on Commons)
Get para breaks working in OCR loading: phab:T230415
Add common fonts:
- Cursive: phab:T166138
- JUnicode: phab:T173573
- Sans Outline (and remove hacks like ℕ𝔼𝕎 𝕐𝕆ℝ𝕂)
- Serif Outline
- Maybe a better Polytonic greek?
Move MediaWiki:Proofreadpage_index_template to a module
Move Template:Header to module

Half-done

Ebook review process leading to categorisation, then... Category:Ready for export
Improve index autofill to fetch author links from Commons creator templates: Getting there: Mediawiki:Gadget-Fill Index.js
~~Fix {{FI}} which is invoking full-size images every single time.~~ Merge with and/or deprecate {{large image}}.
Fix headers on mobile, the tabular structure is unfriendly on narrow screens (main header done, other namespaces pending main header module-ificaton). See {{header/main block}}

Done

Tool to convert import IA page list JSON: Mediawiki:Gadget-ImportPagelist.js
Fix the print CSS: centre is broken Fixed: diff
Fix page numbers in {{TOC begin}} c.f. phab:T232477
ODPS catalogue of "exportable" ebooks for integration into e-readers: phab:T270387. See Category:Ready for export for link

User:Inductiveload

Ask me to do things

Tools and scripts

Commons scripts

Popups Reloaded

Quick Access

Cleanup (Alpha)

Page Carousel

Save/Load Actions

Preview markup

Maintenance Wizard and Replacer

Jump to file

MiniPane

IaUploadPopup

Scan Transcludes

ActivePageAlert

Tweaks

Show an indicator when a script loads

Add nocache=1 to WS-export sidebar links

Add a button to the main page POTM to edit it

Add a link your your bot's contributions

Special icon for Internet Archive links

Add a custom panel of special characters to the WikiEditor toolbar

Show microformat data under the header

"Steal" the preview accesskey (p) for the "preview with this template" button instead

Sticky header without hiding all the buttons (Vector)

Browser UserScripts

Add the Hathi IDs to the catalog page tables

One-liners

Pywikibot

List all indexes which have a page with a linter error

Image Processing

Convert all files of type X to type Y, in parallel

Make an image showing only not-black and not-white pixels

PDF/DJVU processing

Fix a PDF that chokes ImageMagic due to "bad streams"

Functions

The rant zone

The counter-rant zone (aka why maybe it's not all bad)

Maintenance and reports

Bot activities

MW dev

Run extension linter

Run extension parser tests

Run extension unit tests

Run linter

Snippets

PHP: Logging

PDF munging

Pywikibot

Random links

Useful things to share

Tesseract retraining project

General Python scripts

GIMP scripts

Bibliographic junk

Data module ecosystem

One touch template wrapping with Autohotkey

Regular expressions

My requests

Periodicals

Technical wishlist

Half-done

Done

Navigation menu

Search

Add `nocache=1` to WS-export sidebar links