User:Inductiveload/HTML processing

From Wikisource
Jump to navigation Jump to search

Useful tools:

Hathi Trust[edit]

Extract links from table[edit]

curl https://catalog.hathitrust.org/Record/000505750 | pup '.viewability-table tbody td json{}' | jq '[.[] | {href: .children[0].children[0].href, text: .children[0].children[0].children[1].text, loc: .children[1].text}]'

produces array like:

[
  {
    "href": "https://hdl.handle.net/2027/mdp.39015080280129",
    "text": "v.36 1957",
    "loc": "University of Michigan"
  },
  ...
]

JQ[edit]

To TSV[edit]

jq -r '.[] | [.href, .text] | @tsv'