Index talk:Report On The Investigation Into Russian Interference In The 2016 Presidential Election.pdf

From Wikisource
Jump to navigation Jump to search

Some basic copy-paste templates[edit]

Seeing as this might take a while, I might as well post some generic templates to copy and paste for wherever you need it.

Redacted texts
{{redacted|text={{white|'''Harm to Ongoing Matter'''}}}} 
{{redacted|text={{green|'''Personal Privacy'''}}}} 
{{redacted|text={{yellow|'''Investigative Technique'''}}}} 
{{redacted|text={{darkred|'''Grand Jury'''}}}}
Longer redactions
{{redacted|text={{darkred|'''Grand Jury'''}}}}{{redacted}} 
{{redacted|text={{darkred|'''Grand Jury'''}}}}{{redacted|full}} 
{{redacted|full}}
Footer with references
{{rule|15em|align=left}}
{{smallrefs}}
{{c|##}}

I’ll add anything I come across. Thanks, EggOfReason (talk) 01:20, 19 April 2019 (UTC)[reply]

Thanks Egg, also
  • {{***}} for the centered asterisks between paragraphs;
  • {{c|{{pagenum}} }} => {{c|2}} (for page 2) for the pagination on Part II
  • {{c|'''{{sc|Introduction to Volume II}}'''}} for the section titling font that looks like upper case
  • ''''' some extra apostrophes for bold and italics
Have been getting some edit conflict, will try to pick a block where no one else is editing. Avery Jensen (talk)
Headers

I. Lorem Ipsum Dolor Sit Amet

A. Lorem Ipsum Dolor Sit Amet
1. Lorem Ipsum Dolor Sit Amet
a. Lorem Ipsum Dolor Sit Amet
i. Lorem Ipsum Dolor Sit Amet
{{sc|I. }}
:'''A. '''
::1. {{u| }}
:::'''''a. '''''
::::''i. ''

Here’s some headers. EggOfReason (talk) 20:26, 19 April 2019 (UTC)[reply]

Unusual characters (e.g. in legal text)
¶ - the pilcrow character
§ - means "section"

Redactions[edit]

In the absence of a global style guide, here's a quick one for this report.

The current redaction template is hard to match up with linebreaks, as the wikisource version is meant to work at many column widths, while our method of indicating long redactions ({{redacted|full}}) is counted in lines. Also hard, that template does not work well if there is any other text on the line.

Rules of thumb:

  • Bold the text.
  • Use darkred. People had used both red and darkred equally; the latter matches the original closely.
  • In most cases -- short redactions that are longer than the length of the text and shorter than 1.5 lines -- add at least a wordlength of redaction:
    {{redacted|text={{green|'''Personal Privacy'''}}}}{{redacted}}

For longer redactions, try to make the result look clean, roughly the right length, with a minimum of artificial linebreaks.

  • For short redactions that continue on a second line, add an extra {redacted} to the end of it rather than adding a linebreak
  • For long redactions that continue on a final line, after the last {redacted|full} add:
    <br>{{redacted}}</br>
  • For long redactions that start at the beginning of a line, use three extra {redacted} wordlengths for the first line:
    {{redacted|text={{green|'''Personal Privacy'''}}}}{{redacted}}{{redacted}}{{redacted}}
  • For long redactions that extend to the end of a line after some text:
    If less than 1.5 full lines of redaction, add two wordlengths
    {{redacted|text={{green|'''Personal Privacy'''}}}}{{redacted}}{{redacted}}
    If more than 1.5 lines of redaction, add a full line of redaction per line.
    {{redacted|text={{green|'''Personal Privacy'''}}}}{{redacted}}{{redacted}}<br>{{redacted|full}}

Citation numbers[edit]

Is there a way to use the citation numbers from the actual document, and make them linkable to the foot citations? -ApexUnderground (talk) 06:11, 20 April 2019 (UTC)[reply]

This is being done with inline ref tags <ref></ref>. For an example of a a page that is already completed, see this page and view the markup [1]. Avery Jensen (talk) 20:34, 20 April 2019 (UTC)[reply]
Im talking about the tag number in the body of the transcribed text, and the tag number in the body of the source document text. For example on this page:
In total, the GRU stole hundreds of thousands of documents from the compromised email accounts and networks.[1]
The source document has different numbering:
In total, the GRU stole hundreds of thousands of documents from the compromised email accounts and networks.109
-ApexUnderground (talk) 22:23, 20 April 2019 (UTC)[reply]

Wikisource house style does not replicate the numbering of footnotes. Keep in mind also that once the pages are transcluded into the finished documents, the citations numbering will change, as multiple footnotes will then appear togather on a single page, and be numbered sequentially. --EncycloPetey (talk) 22:29, 20 April 2019 (UTC)[reply]

+1 I am working off a copy that can be copy-pasted and reproduces most of the footnote numbers if that is any help, but it still needs a lot of copyedit, especially around the redacted text. https://averyjensen.files.wordpress.com/2019/04/mueller-report.pdf
Here is also the markup for footnotes that are continued on the next page, see Footnotes that continue over page breaks:
  • <ref name="p51">Lorem ipsum dolor</ref> for the first page, and
  • <ref follow="p51">sit amet</ref> for the second page.
Avery Jensen (talk) 23:34, 20 April 2019 (UTC)[reply]

Redacted text blocks[edit]

Related question: is there some way to indicate the size of the redacted text? Some redactions are just one or two words while others are several lines or even entire sections. Or will this be invisible in the transclusion?

I've been using redacted|full and hinting at shorter lengths by sometimes stringing two fields together. Sj (talk) 00:04, 21 April 2019 (UTC)[reply]
cf. pg 52 -- I'm not sure if this is a helpful approach, will hold off on filling out more redacted pages for now. Sj (talk) 01:00, 21 April 2019 (UTC)[reply]
according to Template:Redact there is a length field. but i haven’t been able to get it to work. is it broken? Slowking4SvG's revenge 03:26, 23 April 2019 (UTC)[reply]
I think the text parameter cannot be used simultaneously with the length parameter as I haven't been able to get it to work either. I do not think it is a good idea to string two fields together to show an increased length. An alternative might be what I did here and add alternating breaking and non-breaking spaces at the end of the text parameter to extend the length of the black bar/size of the redaction. - PaulT+/C 19:30, 26 April 2019 (UTC)[reply]
Paul, that use of nbsp has the same effect as adding an extra redacted block, and is much longer and more confusing. I've suggested using multiple blocks above as easy to implement and parse, in the redacted style section above: that has worked flexibly for me without causing any visible problems. Sj (talk) 03:15, 11 May 2019 (UTC)[reply]
After using that technique a few times, I agree that the non-breaking spaces are a bit clunky and that adding an additional {{redacted}} template in specific places is likely the best approach (aside from actually getting the template fixed so that the length parameter works properly). Another option is to use the {{loop}} template to repeat spaces to have a similar effect (see this example), but that is also somewhat confusing if you aren't familiar with it (the max is 150). I don't think it is a good idea to alter the four types of redactions with non-breaking spaces as you suggest above. A better solution is to simply have a second {{redacted}} template. The "full" option should never be used in conjunction with any other content in the same line; it doesn't render properly and ends up bleeding off the end of the page. This is especially an issue with references. In those cases it is better to use a series of {{redacted}}s and line breaks to achieve the best reproduction of the source material. See this diff or this diff for examples of what I mean. - PaulT+/C 18:37, 14 May 2019 (UTC) Actually, the best example is how it renders at Report On The Investigation Into Russian Interference In The 2016 Presidential Election/Russian "Active Measures" Social Media Campaign. Look at page 25 (17 in the report) and compare the rendering on the individual page vs the document page. All of the redactions go way off the right margin when using the "|full" option with anything else on the same line. - PaulT+/C 18:43, 14 May 2019 (UTC)[reply]
Good point! Ok, how about three /redacted/ wordlengths rather than a redact|full? Updated above. Sj (talk) 00:20, 19 May 2019 (UTC)[reply]
I'm not sure what is the best way to handle this. I'm leaning towards not needing any more than a single "extra" {{redacted}} template to indicate that the rest of the line is redacted (with varying "text" values if the length of this extra field needs to be shortened or extended) and then using the "full" option for full lines (as long as nothing else is on that line, including references or unredacted spaces). In lieu of specific guidance we are pretty much going to have to figure this out on our own. - PaulT+/C 22:36, 19 May 2019 (UTC)[reply]

Stylistic choices[edit]

@Slowking4, @Avery Jensen, @EggOfReason:

  • Is it better to use {{c}} or {{rh}} to indicate the page numbers? There's no left- or right-side text in most/all of the page footers.
  • Should hyphens in number ranges (e.g. "101-102") be changed to en dashes (i.e. "101–102")? On the English Wikipedia this would usually be done, but Wikisource's style guide doesn't discuss the matter. Jc86035 (talk) 14:04, 20 April 2019 (UTC)[reply]
Re hyphens: according to the style guide, "The aim is to give an authentic digital transcription of the content, not an imitation of a printed page; to produce a type facsimile...", that is, the style should look like the style of the original document as much as possible. Avery Jensen (talk) 20:28, 20 April 2019 (UTC)[reply]
yeah, pagination appears centered to me. some of the reference line are full rule, most are around 10em.
wikisource tends to use em dash if that is how it is printed, even in article title, and english has broken a few hundred links here because of this style clash.
also we need a consensus about indenting. i have been using "gap" and "gap5em" rather than colons. Slowking4SvG's revenge 22:49, 20 April 2019 (UTC)[reply]
Broken links? Surely WP house style for file names (or even redirects) and internal style for internal text reproductions? Avery Jensen (talk) 23:38, 20 April 2019 (UTC)[reply]

Hello, good to collab on this. Thanks for this style overview; Ref follow is lovely.

  • Page nums: I've been using c
  • Gaps: What about first-line indents for each paragraph? I'll start using gap rather than :, but on single lines it's easy to do the other.
  • Links: I think slowking's just saying that a careless search-and-replace can break links that then have to be fixed up.
  • Redactions: Is there a way to get a full-line after a bit of text, without justifying the REDACTION PHRASE?

Sj (talk) 00:01, 21 April 2019 (UTC)[reply]

First-line indents are not transcribed, as per Help:Beginner's guide to typography#Paragraphs and sentences. -Einstein95 (talk) 08:04, 21 April 2019 (UTC)[reply]

@Econterms, @Slowking4, @Avery Jensen, @EggOfReason, @Sj: Also, are indented section/subsection headers (e.g. on page II-48) supposed to be indented? Econterms added the indents using non-breaking spaces but I changed it to use {{em}}. On pages that I transcribed I left them without indents. Jc86035 (talk) 11:28, 21 April 2019 (UTC)[reply]

Einstein95 has replaced the {{em}} indents with list item "fake indents" (from the HTML purist perspective). Jc86035 (talk) 11:56, 21 April 2019 (UTC)[reply]

  • Thanks, Jc86035. I was just trying to move along to get some edits done, and see later what consensus appeared. These are better ways. The colons seem good -- concise and uncomplicated. Can we just do it that way? The em syntax is not familiar to me but if you think it's best I'll study it. -- econterms (talk) 20:05, 22 April 2019 (UTC)[reply]
Colons seem to render well in all transclusion modes, while em did not. Sj (talk) 03:16, 11 May 2019 (UTC)[reply]

Validation[edit]

Going through the entire report with the JavaScript version of AWB, there's a somewhat concerning number of minor errors in pages marked "Validated", most notably "U.S," with a comma (as well as other assorted style guide violations). I haven't thoroughly looked at those pages, but another check is probably needed at some point. Jc86035 (talk) 11:50, 21 April 2019 (UTC)[reply]

I think a lot of this could be avoided by using a different/better OCR copy. The one I have been using seems to reproduce punctuation but still has problems with italics. In fact I noticed a lot of errors with the "see also" and "supra" in the footnotes of pages that were already done. Seems very time-consuming to go through page by page and add apostrophes but I don't know of a faster way. Avery Jensen (talk) 01:37, 26 April 2019 (UTC)[reply]

Tables[edit]

@Einstein95: Re: the index of acronyms on page B-14, the "scope" attribute is necessary to indicate that the header pertains to the row and not for the column. The reason that I deliberately added class=wikitable is that in some skins (notably Timeless) tables don't have cell padding by default, so there may be visual errors if it's assumed that the table will always be rendered with cell padding. Jc86035 (talk) 11:53, 21 April 2019 (UTC)[reply]

@Jc86035: I have checked and the current table looks exactly the same when using any of the skins. You can see that the Timeless skin shows the table the same as the Vector skin. -Einstein95 (talk) 12:00, 21 April 2019 (UTC)[reply]
@Einstein95: The issue is only apparent when the table becomes narrower. It shouldn't be a particularly severe issue, since there are probably a lot of other tables which have only been formatted with Vector in mind to begin with.
I've re-added the scope="row" attribute. Jc86035 (talk) 12:03, 21 April 2019 (UTC)[reply]

A Commons file used in this index has been nominated for deletion[edit]

The following Wikimedia Commons file used here has been nominated for deletion:

Participate in the deletion discussion at the nomination page. FYI this is not a joke. There is a chance that the report cannot be hosted on commons due to a handful of images with unknown copyright status present in the report.Psantora (talk) 19:15, 22 April 2019 (UTC)[reply]

the toxic culture on commons is no joke. maybe time to revisit Wikisource:Scriptorium/Archives/2017-07#Proposal to allow "fair use" in certain limited scenarios given the perennial inability to deal productively with images in PD texts. Slowking4SvG's revenge 03:17, 23 April 2019 (UTC)[reply]
@Psantora: For what it's worth, it seems likely that the files will be kept, but someone will have to remove the images from the PDFs. Jc86035 (talk) 10:14, 23 April 2019 (UTC)[reply]
no, it is likely to be deleted, unless someone overwrites with blacked out images. (and none there are interested or capable). the fact other examples have not been deleted [2], is more from neglect or caprice, than consideration to others. Slowking4SvG's revenge 12:39, 23 April 2019 (UTC)[reply]
I am capable of cropping out the copyrighted images from the PDF, but I am not comfortable making changes to the official document that was originally hosted here since it will no longer be possible to directly trace the content back to the Justice.gov-hosted version (see my comment at the bottom of this discussion for a bit more about that concern). I think the "best" solution (assuming it cannot be kept as-is at Commons) will be to host the Justice.gov file at Wikipedia with limited rationales for non-free content as necessary (per wikipedia:WP:NFCC #s 1, 3, 4, 5, 7, and 8). But, I don't know if that solution is acceptable for content at this project. (Sorry for my inexperience here, these are my first edits at this particular sister project.)Psantora (talk) 14:59, 23 April 2019 (UTC)[reply]
go for it. you will have to download from commons, and edit the pdf, and then "Upload a new version of this file" at commons, to maintain the pagination. commons (and wikimedia community in general) do not value provenance of uploads, for example, the widespread overwriting without sourcing, and the "downsizing" and rev-diving fair use uploads at english. the "fair use" uploads to english wikipedia will probably get deleted as "not subject of critical commentary" unless you expand the article to suit. we do not have an EDP here, so they will get deleted here. Slowking4SvG's revenge 21:36, 23 April 2019 (UTC)[reply]
Sorry, I don't think I was clear. To simplify (using your words), I'm capable, but not interested (in getting into a potential debate about not directly using the file from justice.gov without any changes). But, are you saying that there may be obstacles to hosting the justice.gov version (as-is) at en-wikipedia? There is certainly enough commentary at wikipedia:Mueller Report and other related articles on the wikipedia:Special Counsel investigation (2017–2019), or at least that seems obvious to me. Is that not the case? And, if hosted at Wikipedia and not on Commons, would this project still be able to use the file for Mueller Report here?Psantora (talk) 11:48, 24 April 2019 (UTC)[reply]
@Psantora: As a purely technical matter, Wikisource cannot use media files hosted on enwp. --Xover (talk) 11:18, 25 April 2019 (UTC)[reply]
Thanks. That makes sense. In the meantime, is there some kind of "placeholder" image that can be used at the relevant pages here? For reference, here are the pages under dispute:
Ideally the placeholder would explain the reason for the removal of the content as well.Psantora (talk) 16:03, 26 April 2019 (UTC)[reply]
@Slowking4:, Atsme forced my hand. https://commons.wikimedia.org/w/index.php?title=Commons:Deletion_requests/File:Report_On_The_Investigation_Into_Russian_Interference_In_The_2016_Presidential_Election.pdf&diff=347714250&oldid=347711412 Now I need to figure out commons:User talk:Rillke/bigChunkedUpload.js so I can upload the new version directly on top of the original. - PaulT+/C 13:02, 29 April 2019 (UTC)[reply]
Note a new discussion was opened regarding the "fair use" (or lack thereof) policy pertaining to this document in general at Commons. You can find the discussion here: commons:Commons:Village pump/Copyright#Third-party copyright material included in PD-USGov documents. - PaulT+/C 17:48, 29 April 2019 (UTC)[reply]

So all work on this seems to have stopped pending the outcome on Commons?

Since there seem to be so many technical people here I have a question. I have found a medieval document I would like to put on Wikisource in three languages. I have Arabic and another language, I think Italian but would have to check for sure. The translation is PD, and I would like to translate to English as well. How easy is this to do and how do I start? Would I start by uploading the documents I have to Commons? I know there is an Arabic Wikisource, but I don't know how to link languages. Can the same text have an overlay from more than one language? Thanks in advance for any advice. Avery Jensen (talk) 01:49, 26 April 2019 (UTC)[reply]

This probably isn't the best place for a technical discussion on that point. Maybe try Wikisource:Scriptorium? As for your first question, I don't think work necessarily needs to stop. The vast, vast majority of the content can be hosted at commons and recreated here.Psantora (talk) 16:03, 26 April 2019 (UTC)[reply]
well, i am continuing to proofread (at a slower rate), while i await deletion at commons. the work will not be lost, just hidden until the copyright police get their act together. Slowking4SvG's revenge 19:24, 29 April 2019 (UTC)[reply]
FYI, the discussion was closed as keep with no further redactions needed. - PaulT+/C 19:27, 29 April 2019 (UTC)[reply]

Curious OCR bugs near redactions[edit]

How to handle new version of the report?[edit]

Jason Leopold filed a FOIA request that yielded a new version of the report where, in place of colored-text descriptions of each redaction, the specific exemption clauses that apply (among 7 options in the FOIA statute) are listed.

The result is much harder to read. Perhaps we can somehow add related indicators to the existing wikisource transcription. Thoughts? Sj (talk) 16:40, 7 May 2019 (UTC)[reply]

What new information will be shown by digging into this "new" report? Isn't it the same content redacted, just with different reasons? - PaulT+/C 19:27, 14 May 2019 (UTC)[reply]
It looks like just a little new information is added in the new one, stuff of only narrow interest. Some options: (1) finish this one then upload the other pdf then copy the pages from here to the index of that one then make tweaks there to show the new info; (2) add hidden comments to this version, e.g.
<!-- (b)(6), (b)(7)(A) -->
; (3) ignore the new one for now; less-redacted versions are coming and they will be of broader interest. Overall I wouldn't want to do much with the new one. -- econterms (talk) 18:51, 15 May 2019 (UTC)[reply]
Sorry, I'm still not clear on what you mean. There is *new* information in the FOIA version of the report? Meaning some information that was redacted in the DoJ version is now available to read in the FOIA version? Or do you just mean the specific FOIA exemption clauses being the new information? If it is the former, then I'd love to see what was "missed" (and we should 100% include it here)! If it is just the latter then I think your option 3 is probably the best approach in terms of the best way to allocate resources. - PaulT+/C 19:55, 16 May 2019 (UTC)[reply]
The only new info in the FOIA version, so far as I know, is the reference to the precise exemption clauses and this is very little information really. There is already a possibility that the Flynn-related clauses will be known by the end of the month, per Judge Sullivan's recent ruling, and those will be much more informative than the FOIA version. So we agree: let's to ignore the new FOIA version for now. -- econterms (talk) 17:21, 17 May 2019 (UTC)[reply]

New new version with footnotes[edit]

Here is a newer version with hyperlinked footnotes : https://pro.dp.la/ebooks/mueller-report Sj (talk) 01:09, 22 July 2019 (UTC)[reply]