Page MenuHomePhabricator

Labeled Section Transclusion
Closed, ResolvedPublic

Description

Author: dovijacobs

Description:
Request for a function that allows calling up a predefined *portion* of a page,
without transcluding any of the rest of the wiki text on that page.

Beginning and end points of sections to be made available for individual
transclusion could be determined by using marks such as in the following
example, in which "section 1" could be individually transcluded on any other
wiki page:

Beginning of section 1: <span id="bookmark">start1</span>

End of section 1: <span id="">end1</span>.

Those are just examples. Any section of text marked off with <span
id="x"></span> at its beginning and end could be transcluded individually.

Please note that this is intuitively the *opposite* of <noinclude></noinclude>.
These *exclude* parts of a template not meant for transclusion. The suggestion
here is for designating exactly what *will* be included, rather than what will
*not* be included.

Discussion of this began with the formatting of Bible verses at Wikisource.

See
http://en.wikisource.org/wiki/Wikisource:Scriptorium#Technical_Question:_.22Bookmarks.22

and
http://en.wikisource.org/wiki/Wikisource:Scriptorium#More_radical_technical_bookmark_question


Version: unspecified
Severity: enhancement
URL: http://www.mediawiki.org/wiki/Labeled_Section_Transclusion

Details

Reference
bz5881

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
OpenFeatureNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:14 PM
bzimport set Reference to bz5881.

dovijacobs wrote:

I apologize for a silly error above, which should read:

Beginning of section 1: <span id="start1"></span>

End of section 1: <span id="end1"></span>

The point is that sections could be defined using whatever labels make sense.

Isn't this already accomplished using the <onlyinclude> tag?

Can you have more than one of those on a page?

dovijacobs wrote:

This would be much more flexible, allowing the definition of sections 1, 2, 5a,
78, a, b, c - any number of different sections, each with its appropriate tag name.

In Wikisource, these could be individual chapters of books (where the whole book
is on one page), single numbered paragraphs of Aristotle, or (as in the recent
discussion) individual verses of the bible.

Templates might be made to recognize sections, which would at least partially
accomplish the above. For example, {{foo#Section_78}} would transclude
everything within ==Section 78== on [[Foo]].

zocky wrote:

This can already be done with parser functions, like #if and #switch.

dovijacobs wrote:

Thanks, Jesse, that is a good tool already available. I didn't know that was
possible. It is a bit clunky, however, in its being dependant on == === type
sections.

Thanks too Zoran, but please explain what you mean?

The problem with transcluding sections is whether you include sub-sections.

For example, if ==Section 78== includes ===Section 78.1=== should it be possible to transclude
just that portion between those two lines? or should transcluding "Section 78" always include
all sub-sections?

In other words, if your source page [[foo]] looks like:
...

Section 78

aaaaa

Section 78.1

bbbbb

Section 79

...
should {{foo#Section 78}} cover the "aaaaa" AND "bbbbb" text or just the former? should you be
able to specify just the former?

Also, does your partial transclusion include the associated section heading? Does
{{foo#Section 78}} start with "==Section 78==" or not? If not, then what do you do with any
sub-section headers which might be included? Should they remain at the given level or be
adjusted to take account of their level relative to the "top-level" header being transcluded?

dovijacobs wrote:

Phil is right. Only exact markers placed at beginning and end would be able to
define units of text without ambiguity.

But still pardon me, what do #if and #switch accomplish here?

zocky wrote:

something like this should do exactly what you want:

{{#ifeq:{{{section|name1}}}|name1|
text of section 1
}}

{{#ifeq:{{{section|name2}}}|name2|
text of section 2
}}

{{#ifeq:{{{section|name3}}}|name3|
text of section 3
}}

zocky wrote:

Maybe that wasn't clear enough. You put that code on the page from which you
want to transclude text, and then transclude it with {{pagename}}, which will
display all text, or with {{pagename|section=name}}, which will display only the
section you want.

dovijacobs wrote:

Thanks. Is there a page that gives information on the #if and #switch functions?

dovijacobs wrote:

Tried it, and it doesn't seem to work, but rather transcludes the entire page:

User:Dovi/Bible (King James)/Genesis/45

Have I made an error someplace?

Also, I realized that by deliniating the text within the brackets (as in this
suggestion), there is no way to allow for overlapping sections. *Independent*
beginning and end markets would allow for theat.

[In reply to comment #10] Zoran Obradovic's suggestion does technically work,
but is far from sufficient to accomplish what Dovi is looking for. There are
several limitations to using ParserFunctions for this: the contents of the page
will be hidden on the page itself, since the parameter is not specified. This
can be circumvented, but it's tedious and messy. Another limitation is that the
pipe is used as a seperator in the ParserFunctions, so any pipe in the page will
have to be escape with {{!}}. Further, it would be extremely messy to implement
this on all pages, which is something that would need to be done to address the
scope of Dovi's proposal. If this is to be accomplished, it would be much better
to do so with new functionality.

[In reply to comment #7] The expected behaviour when calling {{:foo#Section}}
would be to include the section title. In those circumstances where this is not
desireable, it could be removed using ParserFunctions (which are more suited to
that particular task then to the greater proposal). Alternately, a new syntax as
suggested by Dovi might be better suited for more control. We could implement
this in the same general syntax as the Parserfunctions: "{{#section:name|start}}
[...] {{#section:name|end}}". This would have the benefit of being unambiguous,
avoiding any confusion with anchors and actual <span> containers.

dovijacobs wrote:

Jesse's final syntax suggestion:

{{#section:name|start}} [...] {{#section:name|end}}

sounds exactly like what I envisioned, but I had no idea how to express it as a
code suggestion.

The suggestion is clear and unambiguous. Plus it is completely flexible in terms
of how the text may be divided (even allowing for complementary overlapping
divisions such as exist in many classical texts).

If such a function were implemented, it would not only be an extremely useful
tool for Wikisource, but would probably find a great many positive
implementations in other contexts as well.

Overlapping portions would interact very badly with formalized syntax;
this is not likely to be supported.

dovijacobs wrote:

The above syntax is not likely to be supported, or the use of it in overlapping
ways?

dovijacobs wrote:

(Just to clarify: in the vast majority of cases, text sections are sequential
anyways.)

dovijacobs wrote:

This proposal was unfortunately only clarified towards its end, and has been
further clarified since the last comments in the discussion here.

Current status is reflected here:

http://en.wikisource.org/wiki/Wikisource:Labeled_section_transclusion

dovijacobs wrote:

Two further notes:

  1. This would also solve bug 3626 (transclusion of article introductions).
  1. This bug should be closed here and reopened as "Labeled section transclusion"

since we are no longer talking about ParserFunctions.

Birgitte_SB wrote:

This is a very important feature for Wikisource as it will allow all sorts of
comparitive texts to be created without the worry of errors in duplication. As
you can imagine the difficulties involved in making sure all instances are
proofread equally in a text re-used in many places.

dovijacobs wrote:

Adding the current coding suggestion from the URL:

A. Mark off a section of text

A section of text (an individual chapter, subsection of a chapter, a numbered
paragraph in a classical text, an individual verse) might be marked off
something like this:

<begin section="Paragraph 1"/> ...text of the section...<end section="Paragraph 1"/>

The slashes inside the tags are XML-style, which is similar to <br/>,
<references/>, etc.

NB: Sections may overlap (different sectioning mechanisms may be used within the
same text), so a syntax where the end tag is ambiguous is not acceptable.
[edit]

B. Call up the section of text

When calling up that section, it would be done like this:

{{Text##Paragraph 1}}

The double- syntax is suggested because a single "#" has the problem of being normally used for regular wiki sections, so users might intuitively try transcluding sections and be confused when it doesn't work. That is why "" is
suggested as an alternative. Other non-title characters would work, too.

For further information, see:
http://en.wikisource.org/wiki/Wikisource:Labeled_section_transclusion

ayg wrote:

*** Bug 6785 has been marked as a duplicate of this bug. ***

ssanbeg wrote:

my implementation, from my local installation

I've implemented this on my test installation, and copied a few test cases to
wikisource, which are linked from the talk page referenced in the description.

attachment lst.patch ignored as obsolete

jeluf wrote:

You add a <section /> tag to the sanitizer. There is no such tag in HTML and
thus it should not be in that list.

Why is this needed at all? Instead of including a section, one could create a
third article containing teh section and including it into both articles. This
would avoid the introduction of even more markup.

Birgitte_SB wrote:

Please read

http://en.wikisource.org/wiki/Wikisource:Labeled_section_transclusion

And the talk page for many details as to why this function is needed. The
suggestion you make is one already used at the Hebrew Wikipedia (although
instead of a third "article" they put it in the template namespace). This
method is outlined on the page I linked above along with several other
alternatives. The disadvantage listed there is "This method is extremely
labor-intensive, and also involves the creation of huge numbers of separate
pages." I would also like to add it makes proofreading very diffucult when you
have to look up every sentance or two on a new page to proofread, Basically it
does not scale well. Imagine every verse in the Bible on a seperate page then
multiply that times the seven or so seperate PD English translations available,
then imagine proofreading such a montrosity!

Please anyone who thinks they have a better alternative read the above page on
Wikisource and be sure we have not already considered this.

ssanbeg wrote:

(In reply to comment #24)

You add a <section /> tag to the sanitizer. There is no such tag in HTML and
thus it should not be in that list.

This was done primarily to allow a section marker to be placed into a template,
and allowed to pass through a normal transclusion; i.e. ot put <section
begin=discussion> in one template and <section end=discussion> in another, So
that a page that transcludes a lot of subpages could exclude all of the
discussions, thus making for a much smaller transclusion.

This feature removes unused section markers when they're rendering, so it isn't
necessary for the sanitizer to remove them

It's not critical for the way it's used in WS, but it generally makes it more
flexible and more useful.

Why is this needed at all? Instead of including a section, one could create a
third article containing teh section and including it into both articles. This
would avoid the introduction of even more markup.

That was my thought at first, too. But we're not talking about one section;
there's already a markup for that, <onlyinclude>. But that won't work with
multiple sections. The editors have been understandably concerned about trying
to manage the editing when each page is split into potentially hundreds of tiny
articles, each of which is transcluded into two real articles.

Even if we did split everything up like this, we could wind up seeing the same
kind of problems on article space in WS as we occasionally see in project space
in W when a page consists of an unbounded number of transclusions.

ssanbeg wrote:

updated patch

I've decided to drop support for indirect LST (where begin/end can ceom from
sepeate templates).

It would be nice, but more work than I expected to get it working correctly,
and not important enough to hold up other projects that are waiting for it.

This obviates the Sanitizer changes, so this patch is simpler as well.

attachment lst.patch ignored as obsolete

ssanbeg wrote:

test cases

Now that I've been introduced to the regression testing system, I've ported the
test cases I previously posted on ws.

attachment lst-regression.txt ignored as obsolete

ssanbeg wrote:

enhancement to allow indivual markers to subst from templates.

OK, I can't transclude markers from templates, but with a small block op on the
code, substing them will work.

attachment lst2.patch ignored as obsolete

ssanbeg wrote:

additional test cases for new enhancement

attachment lst-regression.txt ignored as obsolete

ssanbeg wrote:

reimplemented as an extension

This extension adds the <section> hook, and two parser functions, #lst to
include a section, and #lstx to exclude. It only needs to be included from
LocalSettings.php to be used.

attachment LabeledSectionTransclusion.php ignored as obsolete

ssanbeg wrote:

regression tests for extension

These are the regression tests for this extension, and show most of its
functionality.

Note that this depends on bug 7801 to run the regression test, since parser
functions currently can't be regression tested.

attachment lst-tests.txt ignored as obsolete

tokind wrote:

I attempted to use this extension with MediaWiki version 1.5.8. I have not been
able to fathom the syntax change for setFunctionHook(), as I assume the
undefined function has been replaced with another function.

Fatal error: Call to undefined function: setfunctionhook() in
/var/www/html/home/extensions/LabeledSectionTransclusion.php on line 18

I don't mind updating the extension myself. I'm just not sure where to begin.
Any advise appreciated.

ssanbeg wrote:

reformat, work on edit links a bit.

I hadn't thought too much about supporting older versions, and don't have much
experience with then; but I think this version should work and give you must of
the functionality.

There are two issues with these versions that I know about:

  1. I'm trying to do this with parser functions, since those work with subst:.

That is where the SetFunctionHook is used, which was introduced around 1.6. I
added a check to disable that in pre 1.6, and added a transclude tag, i.e.
<transclude page=mypage include=mysection> (or exclude=mysection
[replace=something]

  1. The recursive parsing function was introduced in 1.8; I've added a

workaround for that.

Of course, this is still in development, so any of that could change, but this
should work.

attachment LabeledSectionTransclusion.php ignored as obsolete

alexandreracine wrote:

1- @Kindig : This extention will only work with 1.8 and above.
See here : http://www.mediawiki.org/wiki/Labeled_Section_Transclusion

2- So... I have tryed a couple of combination of tag and it does not work. What
is the correct way of using this thing? In other words, what is the correct syntax?

ssanbeg wrote:

(In reply to comment #35)

True; for now I'm more concerned with getting something we can use in wikisource
than supporting a lot of versions. But it should be possible to make at least
some of it work in some older verions by checking the version number and
switching the behavior appropriately.

The main issues with older MWs are:

  1. Recursive parsing, to parse the markup in the template, changed in 1.8.

There is a workaround in the previous attachment that should allow this for
older versions.

  1. Parser functions don't exist prior to 1.6 or 1.7, so the #lst syntax can't be

used in older verions. So while 1.7+ can use {{#lst:mypage|mysection}}, 1.5
would use (in the previous attachment) would use <transclude page=mypage
section=mysection/>, which should work for any version from 1.5 on.

Regardless of which version you use, you mark a document with section tags, i.e.

<section begin=mysection/>this is a section<section end=mysection/>, then
transclude it with one of the above syntaxes. Note that if you don't specify a
namespace to transclude, the extension uses the main namespace; i.e.
{{#lts:mypage... is different form {{#lts:template:mypage...

1- @Kindig : This extention will only work with 1.8 and above.
See here : http://www.mediawiki.org/wiki/Labeled_Section_Transclusion

2- So... I have tryed a couple of combination of tag and it does not work. What
is the correct way of using this thing? In other words, what is the correct

syntax?

robchur wrote:

When introducing a new extension, and a new kind of extension at that, it's
probably a good idea not to worry too much about supporting anything earlier
than the current stable branch; so concentrate on support for trunk and 1.8.

ssanbeg wrote:

extension

OK, cool; luckily, stable & branch seem pretty similar w.r.t to this.

So I seem to have found the right voodoo to transclude from an extension. This
version

  • handles sections correctly, with the right TOC & edit links (whew!)
  • handles cycles correctly, changing the loop into a self link
  • drops (at least for now) non-parserfunction support, which couldn't do edit

sections.

Although I wish I understood noparse=true better, it does make things work,
allowing it to parse in the site (although not in the regression tests); all my
weird test cases look good.

attachment LabeledSectionTransclusion.php ignored as obsolete

ssanbeg wrote:

regression tests with closing /

I've added the / to the <section /> tags; the current extension drops support
for the non-well-formed version

attachment lst-tests.txt ignored as obsolete

ssanbeg wrote:

new extension with Nick's fixes

Implemented a few fixes recommended by Nick Jenkins:

  • add closing ?>
  • rm some dead code
  • handle case where title is invalid.

attachment LabeledSectionTransclusion.php ignored as obsolete

ssanbeg wrote:

extension, with better hook for recursive parsing.

Found a better hook for recursive parsing, so the parser can mark the headings
for us. This simplifies the code somewhat, and fixes the issues that Nick
brought up.

attachment LabeledSectionTransclusion.php ignored as obsolete

ssanbeg wrote:

regerssion tests (+ a few more)

Add Nick's test cases. With improved parsing hooks in previous version, all
tests work now.

attachment lst-tests.txt ignored as obsolete

ssanbeg wrote:

regression tests

fix error in last (js) test.

attachment lst-tests.txt ignored as obsolete

ssanbeg wrote:

extension, with fixed quoting

Apparently, \Q..\E doesn't interpolate into variables, so call preg_quote()
instead.

Attached:

ssanbeg wrote:

regression tests (+2 more)

Add test for transcluded section heading, since implementing that was a lot of
the effort.

Test quoting by transcluding a section called "/", which produced warnings (and
didn't work) before.

Attached:

nickpj wrote:

Added to SVN in r17736.

cyril.dangerville wrote:

Patch for bug in wfLstInclude() when article doesn't have any labeled section for the given label

Hi Mr Sanbeg, thanks for your extension!
I am testing some use of it in DynamicPageList2.
I had just one bug when calling wfLstInclude($parser, $page, $sec) - directly
from DPL2 - on a page (title=$page) that doesn't have any section with
label=$sec. No match, so in particular, the $m[0] is empty, and therefore I get
the warning "Undefined offset: 0..." (where the $m[0][0][1] appears).
I know this is not the usual intended use of your extension, but when using
DPL, you don't expect every page in the output list to have labeled sections
for a given label.
Anyway, I made a quick fix (see attachment) to deal with it. You may fix that
better for SVN. Thanks.

Attached:

ssanbeg wrote:

Cool, thanks.

It's supposed to just transclude an empty string when there's no section. The
heading offset code is new, so that bug is recent. I've added your fix and a
regression test so it won't break again to r18301.

(In reply to comment #47)

Created an attachment (id=2853) [edit]
Patch for bug in wfLstInclude() when article doesn't have any labeled section
for the given label

Hi Mr Sanbeg, thanks for your extension!
I am testing some use of it in DynamicPageList2.
I had just one bug when calling wfLstInclude($parser, $page, $sec) - directly
from DPL2 - on a page (title=$page) that doesn't have any section with
label=$sec. No match, so in particular, the $m[0] is empty, and therefore I get
the warning "Undefined offset: 0..." (where the $m[0][0][1] appears).
I know this is not the usual intended use of your extension, but when using
DPL, you don't expect every page in the output list to have labeled sections
for a given label.
Anyway, I made a quick fix (see attachment) to deal with it. You may fix that
better for SVN. Thanks.

  • Bug 9514 has been marked as a duplicate of this bug. ***

ssanbeg wrote:

Brion enabled it on *.wikisource.org and test.wikipedia.org.