[PROPOSAL] Make {{html}} default behavior be clean=false

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[PROPOSAL] Make {{html}} default behavior be clean=false

Caleb James DeLisle-3
Hello all,

Recently we discovered that in XWiki 8.2 the {{html}} macro cleaner now removes <video>
tag whereas in 7.4 it did not and this has unfortunately caused problems for the {{video}}
macro.

1. After some helpful investigation by a few XWiki developers, we have found that in fact
the {{html}} macro is cleaning for XHTML and not for HTML5 which is what XWiki uses and the
change has seemingly gone unnoticed since the upgrade to 8.0M1.

2. After some conversation with developers, I have observed that it is rare for an application
developer to *intend* to have their HTML cleaned. In the XWiki repositories we find that
amongst non-one-liner {{html}} macro invocations, 109 out of 292 (37%) of the invocations
explicitly request clean=false [1].

3. Nowhere is clean="true" specified and though one might think it obvious as it is a default,
we find that wiki="false" (also default) is indeed specified 28 times [2].

4. We see also that the HTML macro proves to be quite slow. During an investigation of SOLR
performance Thomas found that "about 23% of the request is spend executing html macros" [3].
I suppose it should be obvious that without the cleaner, the {{html}} macro should not cause
any significant performance degradation.


Given the status of this feature, I would like to propose that we make an intentional change
of behavior and change the default to clean=false. This way we application developers will not
need to type it all of the time and XWiki will become measurably faster.

Here is my +1 for changing the default.


Thanks,
Caleb



[1]:
$ grep -nr '{{html' | grep '.xml:' | grep 'clean=['\''"]*false' | grep -v '{{/html}}' | wc -l
109
$ grep -nr '{{html' | grep '.xml:' | grep -v 'clean=['\''"]*false' | grep -v '{{/html}}' | wc -l
183

[2]:
$ grep -nr '{{html' | grep '.xml:' | grep 'wiki=['\''"]*false' | wc -l
28

[3]: https://jira.xwiki.org/browse/XWIKI-12043
_______________________________________________
devs mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/devs
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Make {{html}} default behavior be clean=false

vmassol
Administrator

> On 06 Sep 2016, at 16:59, Caleb James DeLisle <[hidden email]> wrote:
>
> Hello all,
>
> Recently we discovered that in XWiki 8.2 the {{html}} macro cleaner now removes <video>
> tag whereas in 7.4 it did not and this has unfortunately caused problems for the {{video}}
> macro.
>
> 1. After some helpful investigation by a few XWiki developers, we have found that in fact
> the {{html}} macro is cleaning for XHTML and not for HTML5 which is what XWiki uses and the
> change has seemingly gone unnoticed since the upgrade to 8.0M1.

The HTML macro has always been cleaning to generate clean XHTML 1.0. This hasn’t changed. We do need to make it generate HTML5 for skins that output HTML5 but this hasn’t been done yet (see http://jira.xwiki.org/browse/XCOMMONS-901).

What has happened (most likely) is that the HTML Cleaner library was updated from version 2.10 to 2.16 in 8.0M1 and somewhere in there the HTML Cleaner project must have fixed a bug where previously they were not stripping the video tag when they should have (since it’s not valid XHTML 1.0 probably).

> 2. After some conversation with developers, I have observed that it is rare for an application
> developer to *intend* to have their HTML cleaned. In the XWiki repositories we find that
> amongst non-one-liner {{html}} macro invocations, 109 out of 292 (37%) of the invocations
> explicitly request clean=false [1].

Yes, developers should always use clean=false for 2 reasons:
* performance
* get an HTML validation error so that the error can be noticed and fixed in the HTML. Note: I hope that our validation tests are able to notice all the HTML errors and that as a consequence we won’t generate invalid X(HTML).

FTR I wasn’t initially in agreement but got convinced by Marius, see https://github.com/xwiki-contrib/macro-video/commit/d986b8346a4c21d55472dd85e41a96d27f24680f#commitcomment-18260475

I even mentioned introducing a new {{rawhtml}} macro or something like this ({{cleanhtml}}, {{validhtml}}, …) if we want to avoid having to write clean=false.

> 3. Nowhere is clean="true" specified and though one might think it obvious as it is a default,
> we find that wiki="false" (also default) is indeed specified 28 times [2].

That’s normal because you’re looking at development code and this code is supposed to generate valid HTML by default so there’s no need to clean it :)

> 4. We see also that the HTML macro proves to be quite slow. During an investigation of SOLR
> performance Thomas found that "about 23% of the request is spend executing html macros" [3].
> I suppose it should be obvious that without the cleaner, the {{html}} macro should not cause
> any significant performance degradation.
>
>
> Given the status of this feature, I would like to propose that we make an intentional change
> of behavior and change the default to clean=false. This way we application developers will not
> need to type it all of the time and XWiki will become measurably faster.
>
> Here is my +1 for changing the default.

We should not forget that the HTML macro is also meant to be used by users of the wiki (and not only by developers).

BTW we even said in the past that we should try to avoid using the {{html}} macro at all in our code so that it can be disabled as a macro, see http://jira.xwiki.org/browse/XRENDERING-27 and even http://jira.xwiki.org/browse/XWIKI-7894

And most users don’t set the clean=false parameter.

So I guess the question is whether we want to protect an XWiki instance against to generate invalid (X)HTML by default when the HTML macro is used by users, or whether we don’t care.

TBH I don’t remember exactly why we introduced the HTML Cleaner but we probably had issues at some point. Anyone remembers?

Thanks
-Vincent

> Thanks,
> Caleb
>
>
>
> [1]:
> $ grep -nr '{{html' | grep '.xml:' | grep 'clean=['\''"]*false' | grep -v '{{/html}}' | wc -l
> 109
> $ grep -nr '{{html' | grep '.xml:' | grep -v 'clean=['\''"]*false' | grep -v '{{/html}}' | wc -l
> 183
>
> [2]:
> $ grep -nr '{{html' | grep '.xml:' | grep 'wiki=['\''"]*false' | wc -l
> 28
>
> [3]: https://jira.xwiki.org/browse/XWIKI-12043
_______________________________________________
devs mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/devs
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Make {{html}} default behavior be clean=false

Marius Dumitru Florea
On Tue, Sep 6, 2016 at 7:43 PM, Vincent Massol <[hidden email]> wrote:

>
> > On 06 Sep 2016, at 16:59, Caleb James DeLisle <[hidden email]> wrote:
> >
> > Hello all,
> >
> > Recently we discovered that in XWiki 8.2 the {{html}} macro cleaner now
> removes <video>
> > tag whereas in 7.4 it did not and this has unfortunately caused problems
> for the {{video}}
> > macro.
> >
> > 1. After some helpful investigation by a few XWiki developers, we have
> found that in fact
> > the {{html}} macro is cleaning for XHTML and not for HTML5 which is what
> XWiki uses and the
> > change has seemingly gone unnoticed since the upgrade to 8.0M1.
>
> The HTML macro has always been cleaning to generate clean XHTML 1.0. This
> hasn’t changed. We do need to make it generate HTML5 for skins that output
> HTML5 but this hasn’t been done yet (see http://jira.xwiki.org/browse/
> XCOMMONS-901).
>
> What has happened (most likely) is that the HTML Cleaner library was
> updated from version 2.10 to 2.16 in 8.0M1 and somewhere in there the HTML
> Cleaner project must have fixed a bug where previously they were not
> stripping the video tag when they should have (since it’s not valid XHTML
> 1.0 probably).
>
> > 2. After some conversation with developers, I have observed that it is
> rare for an application
> > developer to *intend* to have their HTML cleaned. In the XWiki
> repositories we find that
> > amongst non-one-liner {{html}} macro invocations, 109 out of 292 (37%)
> of the invocations
> > explicitly request clean=false [1].
>
> Yes, developers should always use clean=false for 2 reasons:
> * performance
> * get an HTML validation error so that the error can be noticed and fixed
> in the HTML. Note: I hope that our validation tests are able to notice all
> the HTML errors and that as a consequence we won’t generate invalid X(HTML).
>
> FTR I wasn’t initially in agreement but got convinced by Marius, see
> https://github.com/xwiki-contrib/macro-video/commit/
> d986b8346a4c21d55472dd85e41a96d27f24680f#commitcomment-18260475
>
> I even mentioned introducing a new {{rawhtml}} macro or something like
> this ({{cleanhtml}}, {{validhtml}}, …) if we want to avoid having to write
> clean=false.
>
> > 3. Nowhere is clean="true" specified and though one might think it
> obvious as it is a default,
> > we find that wiki="false" (also default) is indeed specified 28 times
> [2].
>
> That’s normal because you’re looking at development code and this code is
> supposed to generate valid HTML by default so there’s no need to clean it :)
>
> > 4. We see also that the HTML macro proves to be quite slow. During an
> investigation of SOLR
> > performance Thomas found that "about 23% of the request is spend
> executing html macros" [3].
> > I suppose it should be obvious that without the cleaner, the {{html}}
> macro should not cause
> > any significant performance degradation.
> >
> >
> > Given the status of this feature, I would like to propose that we make
> an intentional change
> > of behavior and change the default to clean=false. This way we
> application developers will not
> > need to type it all of the time and XWiki will become measurably faster.
> >
> > Here is my +1 for changing the default.
>
> We should not forget that the HTML macro is also meant to be used by users
> of the wiki (and not only by developers).
>
> BTW we even said in the past that we should try to avoid using the
> {{html}} macro at all in our code so that it can be disabled as a macro,
> see http://jira.xwiki.org/browse/XRENDERING-27 and even
> http://jira.xwiki.org/browse/XWIKI-7894
>
> And most users don’t set the clean=false parameter.
>
> So I guess the question is whether we want to protect an XWiki instance
> against to generate invalid (X)HTML by default when the HTML macro is used
> by users, or whether we don’t care.
>
>

> TBH I don’t remember exactly why we introduced the HTML Cleaner but we
> probably had issues at some point. Anyone remembers?
>

The main reason for me is because we don't want simple users to break the
XWiki page layout when they copy & paste HTML from the web: "I pasted this
content, saved and now the edit button doesn't work anymore..". As Vincent
said, the HTML macro is not intended only for developers. So I'm -0, close
to -1.

Thanks,
Marius


>
> Thanks
> -Vincent
>
> > Thanks,
> > Caleb
> >
> >
> >
> > [1]:
> > $ grep -nr '{{html' | grep '.xml:' | grep 'clean=['\''"]*false' | grep
> -v '{{/html}}' | wc -l
> > 109
> > $ grep -nr '{{html' | grep '.xml:' | grep -v 'clean=['\''"]*false' |
> grep -v '{{/html}}' | wc -l
> > 183
> >
> > [2]:
> > $ grep -nr '{{html' | grep '.xml:' | grep 'wiki=['\''"]*false' | wc -l
> > 28
> >
> > [3]: https://jira.xwiki.org/browse/XWIKI-12043
> _______________________________________________
> devs mailing list
> [hidden email]
> http://lists.xwiki.org/mailman/listinfo/devs
>
_______________________________________________
devs mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/devs
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Make {{html}} default behavior be clean=false

Paul Libbrecht-2


Marius Dumitru Florea wrote:
>> TBH I don't remember exactly why we introduced the HTML Cleaner but we
>> > probably had issues at some point. Anyone remembers?
> The main reason for me is because we don't want simple users to break the
> XWiki page layout when they copy & paste HTML from the web: "I pasted this
> content, saved and now the edit button doesn't work anymore..". As Vincent
> said, the HTML macro is not intended only for developers. So I'm -0, close
> to -1.
Can such a breakage really happen?
Doesn't the editor's "back-from-html" already do the cleaning anyways?
Users that input html using the html macro in the menu or using the wiki
editor appear to be a responsible category of users.

Paul
_______________________________________________
devs mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/devs
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Make {{html}} default behavior be clean=false

Marius Dumitru Florea
On Wed, Sep 7, 2016 at 8:53 AM, Paul Libbrecht <[hidden email]> wrote:

>
>
> Marius Dumitru Florea wrote:
> >> TBH I don't remember exactly why we introduced the HTML Cleaner but we
> >> > probably had issues at some point. Anyone remembers?
> > The main reason for me is because we don't want simple users to break the
> > XWiki page layout when they copy & paste HTML from the web: "I pasted
> this
> > content, saved and now the edit button doesn't work anymore..". As
> Vincent
> > said, the HTML macro is not intended only for developers. So I'm -0,
> close
> > to -1.
> Can such a breakage really happen?
>


> Doesn't the editor's "back-from-html" already do the cleaning anyways?
>

Which editor? And WDYM by "back-from-html"?

The WYSIWYG editor doesn't clean the macro content: what you type in the
macro content text area (macro wizard) when you edit a macro from the
WYSIWYG edit mode is saved exactly as it is. Don't confuse this with
pasting content in the WYSIWYG editor's rich text area. That is cleaned but
it doesn't end up as HTML macro. I was referring strictly to pasting HTML
code inside the {{html}} macro using either the Wiki edit mode or WYSIWYG
editor's source text area / macro wizard.

Note that the WYSIWYG editor can easily be broken if the input HTML is not
valid, which is the case if the HTML macro doesn't clean its content. For
instance the editor relies on some XML comments (macro markers) to detect
the macros. When the browser tries to fix invalid HTML these XML comments
can be moved to the wrong place, thus breaking macro support in the WYSIWYG
editor. As a result the editor doesn't protect the macro output (read-only)
and so the editor ends up saving the macro output which is wrong.

Thanks,
Marius


> Users that input html using the html macro in the menu or using the wiki
> editor appear to be a responsible category of users.
>
> Paul
> _______________________________________________
> devs mailing list
> [hidden email]
> http://lists.xwiki.org/mailman/listinfo/devs
>
_______________________________________________
devs mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/devs