User talk:Inductiveload/dp reformat

From Wikisource
Latest comment: 3 years ago by Languageseeker in topic Superscripts and Subscripts
Jump to navigation Jump to search

Support to Transcode Greek

[edit]

I found a project with Greek in it. DP uses a Greek transcription. Is there anyway to code an automatic transcription. Page:The Letters of Cicero Shuckburg III.pdf/13 Languageseeker (talk) 04:51, 4 March 2021 (UTC)Reply

I can do a crude conversion, but DP haven't done a full polytonic transliteration, so it won't be perfect and it'll be missing all the breathings and other marks. Inductiveloadtalk/contribs 06:48, 4 March 2021 (UTC)Reply
It think it might be better to tag it with Missing Greek something along the lines {{{greek missing}} to mêden mega autô chrêsthai Pompêion Languageseeker (talk) 13:32, 4 March 2021 (UTC)Reply

Weird Dutch Symbols

[edit]

I split up Engelsch-Nederlandsch Woordenboek / English-Dutch Dictionary and then noticed that it had weird symbols. I'm not sure how universal these coding are. Is there anyway to run a regex over all the pages to deal with these? Languageseeker (talk) 04:51, 4 March 2021 (UTC) [1]Reply

@Languageseeker: I'll add what I can to the script. If we find some things ([e] stands out as a bit "sus") aren't always valid to replace, we can add switches to the format parameter dialog. For handling the existing pages, you could use w:WP:AWB, or the following tool seems functional too: w:User:Joeytje50/JWB. However, for the latter, until this bug is resolved, you need to use the following method to install for use in WS page namespace:
mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/jwb.js/load.js&action=raw&ctype=text/javascript');
(Remember: you're responsible for all edits made with it, just as in the normal edit flow.)
You can always ask me to do bot replacements, but you should provide a detailed list of replacements, and complete instructions on how to apply them, the edit summary and so on, and then review the edits afterwards. It's usually easier for everyone if you can do this on a self-serve basis when possible (also good for your CPD :-) (unlike page shifts, which require admin attention to avoid redirects making a mess). Inductiveloadtalk/contribs 08:54, 4 March 2021 (UTC)Reply

Page Shift

[edit]

Sorry, didn't catch another page shift. Starting at Page:The Letters of Cicero Shuckburg III.pdf/32, the texts needed to be shifted +1. Languageseeker (talk) 04:51, 4 March 2021 (UTC)Reply

Done. Inductiveloadtalk/contribs 10:17, 4 March 2021 (UTC)Reply
Thanks Languageseeker (talk) 13:32, 4 March 2021 (UTC)Reply

"F2" instead of "P3"?

[edit]

Looking at this documentation, you are using the final F2 version? The P3 would have been at the very least, good enough.

In a version of the future where I have a computer (that boots or a truthful explanation why it doesn't) I have hundreds of P3 and F2. Image heavy, as my inclination goes.

Can you note precisely that F2 is what is being used? Or whichever?--RaboKarbakian (talk) 14:37, 4 March 2021 (UTC)Reply

Glad to see you joining! I created a page to record what was imported from PGDP. It absolutely makes sense to record if it was a P3 or F2 version. We prefer f2 because it has more formatting and they get archived quicker.
Link to texts that Inductiveload archived at [2].
Link to projects that just finished f2 [3] Languageseeker (talk) 14:47, 4 March 2021 (UTC)Reply

Problem with Split Index

[edit]

Hi, I'm working on Index:The Age Of Justinian And Theodora Vol II (1912).pdf and the PGDP files split the columns of the index in two. Is there anyway to unite them? Here is the text for the index User:Languageseeker/problem. Languageseeker (talk) 18:13, 4 March 2021 (UTC)Reply

@Languageseeker: Since there's only ~20 pages there, maybe just manually remove every other page marker and renumber the remainder to be consecutive? If it's a common thing we can look at an option, but it might be a bit complex to use then. Inductiveloadtalk/contribs 18:15, 4 March 2021 (UTC)Reply
I've seen this on a few works, would it be possible to just write a tool to do the remove page number and renumbering? They tend to do this with lots of multi-column works. So, a separate tool just to deal with split columns. Languageseeker (talk) 18:24, 4 March 2021 (UTC)Reply
Maybe, only apply it to highlighted text and handle both the raw PGDP page numbers and the match-and-split format. It would save quite a bit of time.
Another case User:Languageseeker/problem2 Languageseeker (talk) 19:17, 4 March 2021 (UTC)Reply

Overriding Unproofread OCR

[edit]

I'm trying to merge-and-split into Index:Modern_Greek_folklore_and_ancient_Greek_religion_-_a_study_in_survivals.djvu, but it already has unproofread OCR. Is there anyway to override that? Languageseeker (talk) 01:10, 7 March 2021 (UTC)Reply

Sadly not (at present) without deleting the pages first, or botting it in separately over the top. It would be good if the split bot could have a "overwrite" setting. This is why spamming raw OCR into Page namespace is not the greatest idea. Inductiveloadtalk/contribs 10:13, 9 March 2021 (UTC)Reply
Is there anyway to delete the pages for the project so that I can merge-and-split? Thanks. Languageseeker (talk) 04:30, 11 March 2021 (UTC)Reply

Shift text

[edit]

@Inductiveload: Hi, can you shift the text of Index:What cheer, or, Roger Williams in banishment (1896).pdf by +2 starting with [[4]]? Thanks. Languageseeker (talk) 14:48, 12 March 2021 (UTC)Reply

@Languageseeker: Done Inductiveloadtalk/contribs 15:14, 12 March 2021 (UTC)Reply
@Inductiveload: Thanks! Languageseeker (talk) 17:26, 12 March 2021 (UTC)Reply

Question about Ability to add in missing pages

[edit]

@Inductiveload: Don't want to hassle you, but I was wondering if you think that you'll find time write a js to insert blank pages. I know that you're super busy and are already helping out so much, but I keep on running into DP projects with blank pages removed that makes merging extremely difficult. Languageseeker (talk) 17:34, 12 March 2021 (UTC)Reply


Superscripts and Subscripts

[edit]

@Inductiveload: According to [5] Superscripts and Subscripts have a set format. Can you add in support for them? Languageseeker (talk) 20:45, 17 March 2021 (UTC)Reply