BaseObject ID collisions

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

BaseObject ID collisions

Marc Sladek
Dear XWiki devs

We are using the XWiki platform for our applications but sadly are still
stuck with 2.7.2. Lately we ran into issues on a large database and noticed
"disappearing" BaseObjects. We were able to link it to XWIKI-6990
<http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs collided
(hash collisions) and overwrote other objects without any trace - neither
visible in the history nor in a log file.

We analysed your implemented solution from 4.0+ in XWikiDocument
<https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/doc/XWikiDocument.java#L841>
and BaseElement
<https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/objects/BaseElement.java#L237>
and
noticed that you changed the 32bit String#hashCode to 64bit MD5, which
makes a collision less likely. I have a few questions regarding your
solution:

1) Is there any specific reason why you have chosen MD5 over SHA-1 or 2?

2) Collisions are still possible and would be extremely hard to notice
since they are completely silent. Have you considered to implement a
collision detection to at least log occurring collisions - or even better
reserve 1-2bits of the 64bit to be used as collision counter in the case of
it happening?

3) To question the concept of generating a hash for an ID in general:
Wouldn't a database defined "auto increment" be a much more robust solution
for the hibernate IDs? A collision would be impossible and
clustering/scalability is still possible with e.g. the InnoDB “interleaved”
autoincrement lock mode. Why have you chosen a hash based solution in the
first place?

I'm sorry if these questions were already answered in the dev mailing list
or on issues, please link me to them since I couldn't find any concrete
answers.

Thanks for your time and regards

Marc Sladek
synventis gmbh
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Denis Gervalle-2
Hi Marc,

Here are some answers:

1) MD5 was already a dependency of our oldcore and using SHA1 would have added a dependency without bringing much benefit. Since we only used 64 bits of the MD5 anyway, I doubt using SHA1 would have provided a better distribution.

2) Such a collision detection is difficult to be introduced in the existing code base, for some API it is even impossible. What you experience with the 32-bit ids had been my motivation to the changes in 4.x and I could say, based on my long XWiki experience, that even with the poor 32 bit ids, very few users had been affected. Therefore, by improving the hash algorithm, the size of the ids, and the quality of the hashed key, we have considered ourselves to be saved enough for a normal usage.

3) That’s the worst point. I cannot answer about the first decision, I wasn’t yet involve, but regarding the changes introduced in 4.0,  a change had been considered. The ids are only there to satisfy Hibernate and its loading mechanism. If we had used a counter, we had to manage a conversion table between ids and entity references with all the additional complexity (consistency issues, caching, ...). This is so because we use entity reference to point directly to document (or even objects) everywhere in XWiki. This would have been a huge work to introduce that behaviour and at the same time keeping all the existing API unchanged. It would probably have introduced a performance penalty as well. This is why we resigned and go for an improved hash solution. IMO, if we had to make such a change, we are even better rewriting the storage service completely, and even stop using Hibernate, which, to be honest, does not bring much benefit to XWiki with its ORM aspects.

But if you really want the complete answers, you can look at those threads:
http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
http://xwiki.markmail.org/thread/fsd25bvft74xwgcx

Regards,

--
Denis Gervalle
SOFTEC sa - CEO

On 30 Nov 2017, 14:14 +0100, Marc Sladek <[hidden email]>, wrote:

> Dear XWiki devs
>
> We are using the XWiki platform for our applications but sadly are still
> stuck with 2.7.2. Lately we ran into issues on a large database and noticed
> "disappearing" BaseObjects. We were able to link it to XWIKI-6990
> <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs collided
> (hash collisions) and overwrote other objects without any trace - neither
> visible in the history nor in a log file.
>
> We analysed your implemented solution from 4.0+ in XWikiDocument
> <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/doc/XWikiDocument.java#L841
> and BaseElement
> <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/objects/BaseElement.java#L237
> and
> noticed that you changed the 32bit String#hashCode to 64bit MD5, which
> makes a collision less likely. I have a few questions regarding your
> solution:
>
> 1) Is there any specific reason why you have chosen MD5 over SHA-1 or 2?
>
> 2) Collisions are still possible and would be extremely hard to notice
> since they are completely silent. Have you considered to implement a
> collision detection to at least log occurring collisions - or even better
> reserve 1-2bits of the 64bit to be used as collision counter in the case of
> it happening?
>
> 3) To question the concept of generating a hash for an ID in general:
> Wouldn't a database defined "auto increment" be a much more robust solution
> for the hibernate IDs? A collision would be impossible and
> clustering/scalability is still possible with e.g. the InnoDB “interleaved”
> autoincrement lock mode. Why have you chosen a hash based solution in the
> first place?
>
> I'm sorry if these questions were already answered in the dev mailing list
> or on issues, please link me to them since I couldn't find any concrete
> answers.
>
> Thanks for your time and regards
>
> Marc Sladek
> synventis gmbh
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Marc Sladek
Hi Denis,

Thanks a lot for your answer. I know it's been a while, but I'd still like
to follow up on it since it's quite the fundamental issue.

> Therefore, by improving the hash algorithm, the size of the ids, and the
quality of the hashed key, we have considered ourselves to be saved enough
for a normal usage.

Still, with enough bad luck, documents and objects may be overwritten
without a trace. This is not a stable implementation. And even worse, if on
any XWiki installation hash collisions will happen in the future (or have
already happened since 4.x), they probably won't be easily associated with
this issue because it's nearly impossible to debug.

While I do now understand the motivation to stick with hashes, I'm still
not sure why a collision detection would be difficult to introduce and why
it's even "impossible for some API". Let me briefly outline an idea:

In XWikiHibernateStore#saveXWikiDoc on L615
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
an exists check on the doc id is already performed. If now
xwikidoc.fullName is also selected in the HQL, a comparison to
doc.getDocumentReference() can expose an imminent collision before data is
overwritten. At least an XWikiException should be thrown in this case. A
similar thing could be done before saving BaseObjects on L1203
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
to avoid collisions on Object IDs.

I don't think a change like this would be difficult to implement, I could
provide a PR of that sort. The performance penalty has to be tested for
your systems though, since the full name isn't indexed afaik.

Regards

Marc Sladek
synventis gmbh

On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]> wrote:

> Hi Marc,
>
> Here are some answers:
>
> 1) MD5 was already a dependency of our oldcore and using SHA1 would have
> added a dependency without bringing much benefit. Since we only used 64
> bits of the MD5 anyway, I doubt using SHA1 would have provided a better
> distribution.
>
> 2) Such a collision detection is difficult to be introduced in the
> existing code base, for some API it is even impossible. What you experience
> with the 32-bit ids had been my motivation to the changes in 4.x and I
> could say, based on my long XWiki experience, that even with the poor
> 32 bit ids, very few users had been affected. Therefore, by improving the
> hash algorithm, the size of the ids, and the quality of the hashed key, we
> have considered ourselves to be saved enough for a normal usage.
>
> 3) That’s the worst point. I cannot answer about the first decision, I
> wasn’t yet involve, but regarding the changes introduced in 4.0,  a change
> had been considered. The ids are only there to satisfy Hibernate and its
> loading mechanism. If we had used a counter, we had to manage a conversion
> table between ids and entity references with all the additional complexity
> (consistency issues, caching, ...). This is so because we use entity
> reference to point directly to document (or even objects) everywhere in
> XWiki. This would have been a huge work to introduce that behaviour and at
> the same time keeping all the existing API unchanged. It would probably
> have introduced a performance penalty as well. This is why we resigned and
> go for an improved hash solution. IMO, if we had to make such a change, we
> are even better rewriting the storage service completely, and even stop
> using Hibernate, which, to be honest, does not bring much benefit to
> XWiki with its ORM aspects.
>
> But if you really want the complete answers, you can look at those threads:
> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>
> Regards,
>
> --
> Denis Gervalle
> SOFTEC sa - CEO
>
> On 30 Nov 2017, 14:14 +0100, Marc Sladek <[hidden email]>,
> wrote:
> > Dear XWiki devs
> >
> > We are using the XWiki platform for our applications but sadly are still
> > stuck with 2.7.2. Lately we ran into issues on a large database and
> noticed
> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs collided
> > (hash collisions) and overwrote other objects without any trace - neither
> > visible in the history nor in a log file.
> >
> > We analysed your implemented solution from 4.0+ in XWikiDocument
> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> xpn/xwiki/doc/XWikiDocument.java#L841
> > and BaseElement
> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> xpn/xwiki/objects/BaseElement.java#L237
> > and
> > noticed that you changed the 32bit String#hashCode to 64bit MD5, which
> > makes a collision less likely. I have a few questions regarding your
> > solution:
> >
> > 1) Is there any specific reason why you have chosen MD5 over SHA-1 or 2?
> >
> > 2) Collisions are still possible and would be extremely hard to notice
> > since they are completely silent. Have you considered to implement a
> > collision detection to at least log occurring collisions - or even better
> > reserve 1-2bits of the 64bit to be used as collision counter in the case
> of
> > it happening?
> >
> > 3) To question the concept of generating a hash for an ID in general:
> > Wouldn't a database defined "auto increment" be a much more robust
> solution
> > for the hibernate IDs? A collision would be impossible and
> > clustering/scalability is still possible with e.g. the InnoDB
> “interleaved”
> > autoincrement lock mode. Why have you chosen a hash based solution in the
> > first place?
> >
> > I'm sorry if these questions were already answered in the dev mailing
> list
> > or on issues, please link me to them since I couldn't find any concrete
> > answers.
> >
> > Thanks for your time and regards
> >
> > Marc Sladek
> > synventis gmbh
>
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Thomas Mortagne
Administrator
For document what could help a lot already without any performance
penalty is to compare the loaded document reference and the passed one
in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
in XWiki apply the following logic: getDocument(), modify it,
saveDocument().

On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <[hidden email]> wrote:

> Hi Denis,
>
> Thanks a lot for your answer. I know it's been a while, but I'd still like
> to follow up on it since it's quite the fundamental issue.
>
>> Therefore, by improving the hash algorithm, the size of the ids, and the
> quality of the hashed key, we have considered ourselves to be saved enough
> for a normal usage.
>
> Still, with enough bad luck, documents and objects may be overwritten
> without a trace. This is not a stable implementation. And even worse, if on
> any XWiki installation hash collisions will happen in the future (or have
> already happened since 4.x), they probably won't be easily associated with
> this issue because it's nearly impossible to debug.
>
> While I do now understand the motivation to stick with hashes, I'm still
> not sure why a collision detection would be difficult to introduce and why
> it's even "impossible for some API". Let me briefly outline an idea:
>
> In XWikiHibernateStore#saveXWikiDoc on L615
> <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
> an exists check on the doc id is already performed. If now
> xwikidoc.fullName is also selected in the HQL, a comparison to
> doc.getDocumentReference() can expose an imminent collision before data is
> overwritten. At least an XWikiException should be thrown in this case. A
> similar thing could be done before saving BaseObjects on L1203
> <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
> to avoid collisions on Object IDs.
>
> I don't think a change like this would be difficult to implement, I could
> provide a PR of that sort. The performance penalty has to be tested for
> your systems though, since the full name isn't indexed afaik.
>
> Regards
>
> Marc Sladek
> synventis gmbh
>
> On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]> wrote:
>
>> Hi Marc,
>>
>> Here are some answers:
>>
>> 1) MD5 was already a dependency of our oldcore and using SHA1 would have
>> added a dependency without bringing much benefit. Since we only used 64
>> bits of the MD5 anyway, I doubt using SHA1 would have provided a better
>> distribution.
>>
>> 2) Such a collision detection is difficult to be introduced in the
>> existing code base, for some API it is even impossible. What you experience
>> with the 32-bit ids had been my motivation to the changes in 4.x and I
>> could say, based on my long XWiki experience, that even with the poor
>> 32 bit ids, very few users had been affected. Therefore, by improving the
>> hash algorithm, the size of the ids, and the quality of the hashed key, we
>> have considered ourselves to be saved enough for a normal usage.
>>
>> 3) That’s the worst point. I cannot answer about the first decision, I
>> wasn’t yet involve, but regarding the changes introduced in 4.0,  a change
>> had been considered. The ids are only there to satisfy Hibernate and its
>> loading mechanism. If we had used a counter, we had to manage a conversion
>> table between ids and entity references with all the additional complexity
>> (consistency issues, caching, ...). This is so because we use entity
>> reference to point directly to document (or even objects) everywhere in
>> XWiki. This would have been a huge work to introduce that behaviour and at
>> the same time keeping all the existing API unchanged. It would probably
>> have introduced a performance penalty as well. This is why we resigned and
>> go for an improved hash solution. IMO, if we had to make such a change, we
>> are even better rewriting the storage service completely, and even stop
>> using Hibernate, which, to be honest, does not bring much benefit to
>> XWiki with its ORM aspects.
>>
>> But if you really want the complete answers, you can look at those threads:
>> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
>> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>>
>> Regards,
>>
>> --
>> Denis Gervalle
>> SOFTEC sa - CEO
>>
>> On 30 Nov 2017, 14:14 +0100, Marc Sladek <[hidden email]>,
>> wrote:
>> > Dear XWiki devs
>> >
>> > We are using the XWiki platform for our applications but sadly are still
>> > stuck with 2.7.2. Lately we ran into issues on a large database and
>> noticed
>> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
>> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs collided
>> > (hash collisions) and overwrote other objects without any trace - neither
>> > visible in the history nor in a log file.
>> >
>> > We analysed your implemented solution from 4.0+ in XWikiDocument
>> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> xpn/xwiki/doc/XWikiDocument.java#L841
>> > and BaseElement
>> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> xpn/xwiki/objects/BaseElement.java#L237
>> > and
>> > noticed that you changed the 32bit String#hashCode to 64bit MD5, which
>> > makes a collision less likely. I have a few questions regarding your
>> > solution:
>> >
>> > 1) Is there any specific reason why you have chosen MD5 over SHA-1 or 2?
>> >
>> > 2) Collisions are still possible and would be extremely hard to notice
>> > since they are completely silent. Have you considered to implement a
>> > collision detection to at least log occurring collisions - or even better
>> > reserve 1-2bits of the 64bit to be used as collision counter in the case
>> of
>> > it happening?
>> >
>> > 3) To question the concept of generating a hash for an ID in general:
>> > Wouldn't a database defined "auto increment" be a much more robust
>> solution
>> > for the hibernate IDs? A collision would be impossible and
>> > clustering/scalability is still possible with e.g. the InnoDB
>> “interleaved”
>> > autoincrement lock mode. Why have you chosen a hash based solution in the
>> > first place?
>> >
>> > I'm sorry if these questions were already answered in the dev mailing
>> list
>> > or on issues, please link me to them since I couldn't find any concrete
>> > answers.
>> >
>> > Thanks for your time and regards
>> >
>> > Marc Sladek
>> > synventis gmbh
>>



--
Thomas Mortagne
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Marc Sladek
Introducing this certainly doesn't hurt, but 'm not sure how useful it is.
Firstly, it can show collisions only after a document is already
overwritten, thus the damage is already done. Secondly, loadXWikiDoc has to
be called for the document which doesn't exist anymore, I guess this
doesn't happen so often since the system won't list it anymore.

On 2 February 2018 at 13:36, Thomas Mortagne <[hidden email]>
wrote:

> For document what could help a lot already without any performance
> penalty is to compare the loaded document reference and the passed one
> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
> in XWiki apply the following logic: getDocument(), modify it,
> saveDocument().
>
> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <[hidden email]>
> wrote:
> > Hi Denis,
> >
> > Thanks a lot for your answer. I know it's been a while, but I'd still
> like
> > to follow up on it since it's quite the fundamental issue.
> >
> >> Therefore, by improving the hash algorithm, the size of the ids, and the
> > quality of the hashed key, we have considered ourselves to be saved
> enough
> > for a normal usage.
> >
> > Still, with enough bad luck, documents and objects may be overwritten
> > without a trace. This is not a stable implementation. And even worse, if
> on
> > any XWiki installation hash collisions will happen in the future (or have
> > already happened since 4.x), they probably won't be easily associated
> with
> > this issue because it's nearly impossible to debug.
> >
> > While I do now understand the motivation to stick with hashes, I'm still
> > not sure why a collision detection would be difficult to introduce and
> why
> > it's even "impossible for some API". Let me briefly outline an idea:
> >
> > In XWikiHibernateStore#saveXWikiDoc on L615
> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
> > an exists check on the doc id is already performed. If now
> > xwikidoc.fullName is also selected in the HQL, a comparison to
> > doc.getDocumentReference() can expose an imminent collision before data
> is
> > overwritten. At least an XWikiException should be thrown in this case. A
> > similar thing could be done before saving BaseObjects on L1203
> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
> > to avoid collisions on Object IDs.
> >
> > I don't think a change like this would be difficult to implement, I could
> > provide a PR of that sort. The performance penalty has to be tested for
> > your systems though, since the full name isn't indexed afaik.
> >
> > Regards
> >
> > Marc Sladek
> > synventis gmbh
> >
> > On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]> wrote:
> >
> >> Hi Marc,
> >>
> >> Here are some answers:
> >>
> >> 1) MD5 was already a dependency of our oldcore and using SHA1 would have
> >> added a dependency without bringing much benefit. Since we only used 64
> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a better
> >> distribution.
> >>
> >> 2) Such a collision detection is difficult to be introduced in the
> >> existing code base, for some API it is even impossible. What you
> experience
> >> with the 32-bit ids had been my motivation to the changes in 4.x and I
> >> could say, based on my long XWiki experience, that even with the poor
> >> 32 bit ids, very few users had been affected. Therefore, by improving
> the
> >> hash algorithm, the size of the ids, and the quality of the hashed key,
> we
> >> have considered ourselves to be saved enough for a normal usage.
> >>
> >> 3) That’s the worst point. I cannot answer about the first decision, I
> >> wasn’t yet involve, but regarding the changes introduced in 4.0,  a
> change
> >> had been considered. The ids are only there to satisfy Hibernate and its
> >> loading mechanism. If we had used a counter, we had to manage a
> conversion
> >> table between ids and entity references with all the additional
> complexity
> >> (consistency issues, caching, ...). This is so because we use entity
> >> reference to point directly to document (or even objects) everywhere in
> >> XWiki. This would have been a huge work to introduce that behaviour and
> at
> >> the same time keeping all the existing API unchanged. It would probably
> >> have introduced a performance penalty as well. This is why we resigned
> and
> >> go for an improved hash solution. IMO, if we had to make such a change,
> we
> >> are even better rewriting the storage service completely, and even stop
> >> using Hibernate, which, to be honest, does not bring much benefit to
> >> XWiki with its ORM aspects.
> >>
> >> But if you really want the complete answers, you can look at those
> threads:
> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
> >>
> >> Regards,
> >>
> >> --
> >> Denis Gervalle
> >> SOFTEC sa - CEO
> >>
> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <[hidden email]>,
> >> wrote:
> >> > Dear XWiki devs
> >> >
> >> > We are using the XWiki platform for our applications but sadly are
> still
> >> > stuck with 2.7.2. Lately we ran into issues on a large database and
> >> noticed
> >> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
> collided
> >> > (hash collisions) and overwrote other objects without any trace -
> neither
> >> > visible in the history nor in a log file.
> >> >
> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> xpn/xwiki/doc/XWikiDocument.java#L841
> >> > and BaseElement
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> xpn/xwiki/objects/BaseElement.java#L237
> >> > and
> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5, which
> >> > makes a collision less likely. I have a few questions regarding your
> >> > solution:
> >> >
> >> > 1) Is there any specific reason why you have chosen MD5 over SHA-1 or
> 2?
> >> >
> >> > 2) Collisions are still possible and would be extremely hard to notice
> >> > since they are completely silent. Have you considered to implement a
> >> > collision detection to at least log occurring collisions - or even
> better
> >> > reserve 1-2bits of the 64bit to be used as collision counter in the
> case
> >> of
> >> > it happening?
> >> >
> >> > 3) To question the concept of generating a hash for an ID in general:
> >> > Wouldn't a database defined "auto increment" be a much more robust
> >> solution
> >> > for the hibernate IDs? A collision would be impossible and
> >> > clustering/scalability is still possible with e.g. the InnoDB
> >> “interleaved”
> >> > autoincrement lock mode. Why have you chosen a hash based solution in
> the
> >> > first place?
> >> >
> >> > I'm sorry if these questions were already answered in the dev mailing
> >> list
> >> > or on issues, please link me to them since I couldn't find any
> concrete
> >> > answers.
> >> >
> >> > Thanks for your time and regards
> >> >
> >> > Marc Sladek
> >> > synventis gmbh
> >>
>
>
>
> --
> Thomas Mortagne
>
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Thomas Mortagne
Administrator
On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[hidden email]> wrote:
> Introducing this certainly doesn't hurt, but 'm not sure how useful it is.

I never said it was the solution to all collisions but it will cover
most of the (very rare and never reported) overwrites at 0 cost.

> Firstly, it can show collisions only after a document is already
> overwritten, thus the damage is already done.

Again keep in mind that most of the code does:
* getDocument(DocumentReference)
* modify the XWikiDocument instance
* saveDocument()

so if getDocument() fail you are not going to overwrite anything.

> Secondly, loadXWikiDoc has to
> be called for the document which doesn't exist anymore, I guess this
> doesn't happen so often since the system won't list it anymore.

Not sure which use case you are referring to here. Are you talking
about document deleted the a document with a different reference but
same hash is saved ?

>
> On 2 February 2018 at 13:36, Thomas Mortagne <[hidden email]>
> wrote:
>
>> For document what could help a lot already without any performance
>> penalty is to compare the loaded document reference and the passed one
>> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
>> in XWiki apply the following logic: getDocument(), modify it,
>> saveDocument().
>>
>> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <[hidden email]>
>> wrote:
>> > Hi Denis,
>> >
>> > Thanks a lot for your answer. I know it's been a while, but I'd still
>> like
>> > to follow up on it since it's quite the fundamental issue.
>> >
>> >> Therefore, by improving the hash algorithm, the size of the ids, and the
>> > quality of the hashed key, we have considered ourselves to be saved
>> enough
>> > for a normal usage.
>> >
>> > Still, with enough bad luck, documents and objects may be overwritten
>> > without a trace. This is not a stable implementation. And even worse, if
>> on
>> > any XWiki installation hash collisions will happen in the future (or have
>> > already happened since 4.x), they probably won't be easily associated
>> with
>> > this issue because it's nearly impossible to debug.
>> >
>> > While I do now understand the motivation to stick with hashes, I'm still
>> > not sure why a collision detection would be difficult to introduce and
>> why
>> > it's even "impossible for some API". Let me briefly outline an idea:
>> >
>> > In XWikiHibernateStore#saveXWikiDoc on L615
>> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
>> > an exists check on the doc id is already performed. If now
>> > xwikidoc.fullName is also selected in the HQL, a comparison to
>> > doc.getDocumentReference() can expose an imminent collision before data
>> is
>> > overwritten. At least an XWikiException should be thrown in this case. A
>> > similar thing could be done before saving BaseObjects on L1203
>> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
>> > to avoid collisions on Object IDs.
>> >
>> > I don't think a change like this would be difficult to implement, I could
>> > provide a PR of that sort. The performance penalty has to be tested for
>> > your systems though, since the full name isn't indexed afaik.
>> >
>> > Regards
>> >
>> > Marc Sladek
>> > synventis gmbh
>> >
>> > On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]> wrote:
>> >
>> >> Hi Marc,
>> >>
>> >> Here are some answers:
>> >>
>> >> 1) MD5 was already a dependency of our oldcore and using SHA1 would have
>> >> added a dependency without bringing much benefit. Since we only used 64
>> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a better
>> >> distribution.
>> >>
>> >> 2) Such a collision detection is difficult to be introduced in the
>> >> existing code base, for some API it is even impossible. What you
>> experience
>> >> with the 32-bit ids had been my motivation to the changes in 4.x and I
>> >> could say, based on my long XWiki experience, that even with the poor
>> >> 32 bit ids, very few users had been affected. Therefore, by improving
>> the
>> >> hash algorithm, the size of the ids, and the quality of the hashed key,
>> we
>> >> have considered ourselves to be saved enough for a normal usage.
>> >>
>> >> 3) That’s the worst point. I cannot answer about the first decision, I
>> >> wasn’t yet involve, but regarding the changes introduced in 4.0,  a
>> change
>> >> had been considered. The ids are only there to satisfy Hibernate and its
>> >> loading mechanism. If we had used a counter, we had to manage a
>> conversion
>> >> table between ids and entity references with all the additional
>> complexity
>> >> (consistency issues, caching, ...). This is so because we use entity
>> >> reference to point directly to document (or even objects) everywhere in
>> >> XWiki. This would have been a huge work to introduce that behaviour and
>> at
>> >> the same time keeping all the existing API unchanged. It would probably
>> >> have introduced a performance penalty as well. This is why we resigned
>> and
>> >> go for an improved hash solution. IMO, if we had to make such a change,
>> we
>> >> are even better rewriting the storage service completely, and even stop
>> >> using Hibernate, which, to be honest, does not bring much benefit to
>> >> XWiki with its ORM aspects.
>> >>
>> >> But if you really want the complete answers, you can look at those
>> threads:
>> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
>> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> Denis Gervalle
>> >> SOFTEC sa - CEO
>> >>
>> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <[hidden email]>,
>> >> wrote:
>> >> > Dear XWiki devs
>> >> >
>> >> > We are using the XWiki platform for our applications but sadly are
>> still
>> >> > stuck with 2.7.2. Lately we ran into issues on a large database and
>> >> noticed
>> >> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
>> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
>> collided
>> >> > (hash collisions) and overwrote other objects without any trace -
>> neither
>> >> > visible in the history nor in a log file.
>> >> >
>> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
>> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> xpn/xwiki/doc/XWikiDocument.java#L841
>> >> > and BaseElement
>> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> xpn/xwiki/objects/BaseElement.java#L237
>> >> > and
>> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5, which
>> >> > makes a collision less likely. I have a few questions regarding your
>> >> > solution:
>> >> >
>> >> > 1) Is there any specific reason why you have chosen MD5 over SHA-1 or
>> 2?
>> >> >
>> >> > 2) Collisions are still possible and would be extremely hard to notice
>> >> > since they are completely silent. Have you considered to implement a
>> >> > collision detection to at least log occurring collisions - or even
>> better
>> >> > reserve 1-2bits of the 64bit to be used as collision counter in the
>> case
>> >> of
>> >> > it happening?
>> >> >
>> >> > 3) To question the concept of generating a hash for an ID in general:
>> >> > Wouldn't a database defined "auto increment" be a much more robust
>> >> solution
>> >> > for the hibernate IDs? A collision would be impossible and
>> >> > clustering/scalability is still possible with e.g. the InnoDB
>> >> “interleaved”
>> >> > autoincrement lock mode. Why have you chosen a hash based solution in
>> the
>> >> > first place?
>> >> >
>> >> > I'm sorry if these questions were already answered in the dev mailing
>> >> list
>> >> > or on issues, please link me to them since I couldn't find any
>> concrete
>> >> > answers.
>> >> >
>> >> > Thanks for your time and regards
>> >> >
>> >> > Marc Sladek
>> >> > synventis gmbh
>> >>
>>
>>
>>
>> --
>> Thomas Mortagne
>>



--
Thomas Mortagne
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Marc Sladek
With "most of the code", do you also mean when new documents are being
created and stored (the scenario where collisions happen)?

Imho there is nothing preventing collisions when doing:
* new XWikiDocument(docRef)
* modify instance
* exists check and save

On 2 February 2018 at 15:28, Thomas Mortagne <[hidden email]>
wrote:

> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[hidden email]>
> wrote:
> > Introducing this certainly doesn't hurt, but 'm not sure how useful it
> is.
>
> I never said it was the solution to all collisions but it will cover
> most of the (very rare and never reported) overwrites at 0 cost.
>
> > Firstly, it can show collisions only after a document is already
> > overwritten, thus the damage is already done.
>
> Again keep in mind that most of the code does:
> * getDocument(DocumentReference)
> * modify the XWikiDocument instance
> * saveDocument()
>
> so if getDocument() fail you are not going to overwrite anything.
>
> > Secondly, loadXWikiDoc has to
> > be called for the document which doesn't exist anymore, I guess this
> > doesn't happen so often since the system won't list it anymore.
>
> Not sure which use case you are referring to here. Are you talking
> about document deleted the a document with a different reference but
> same hash is saved ?
>
> >
> > On 2 February 2018 at 13:36, Thomas Mortagne <[hidden email]>
> > wrote:
> >
> >> For document what could help a lot already without any performance
> >> penalty is to compare the loaded document reference and the passed one
> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
> >> in XWiki apply the following logic: getDocument(), modify it,
> >> saveDocument().
> >>
> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <[hidden email]
> >
> >> wrote:
> >> > Hi Denis,
> >> >
> >> > Thanks a lot for your answer. I know it's been a while, but I'd still
> >> like
> >> > to follow up on it since it's quite the fundamental issue.
> >> >
> >> >> Therefore, by improving the hash algorithm, the size of the ids, and
> the
> >> > quality of the hashed key, we have considered ourselves to be saved
> >> enough
> >> > for a normal usage.
> >> >
> >> > Still, with enough bad luck, documents and objects may be overwritten
> >> > without a trace. This is not a stable implementation. And even worse,
> if
> >> on
> >> > any XWiki installation hash collisions will happen in the future (or
> have
> >> > already happened since 4.x), they probably won't be easily associated
> >> with
> >> > this issue because it's nearly impossible to debug.
> >> >
> >> > While I do now understand the motivation to stick with hashes, I'm
> still
> >> > not sure why a collision detection would be difficult to introduce and
> >> why
> >> > it's even "impossible for some API". Let me briefly outline an idea:
> >> >
> >> > In XWikiHibernateStore#saveXWikiDoc on L615
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
> >> > an exists check on the doc id is already performed. If now
> >> > xwikidoc.fullName is also selected in the HQL, a comparison to
> >> > doc.getDocumentReference() can expose an imminent collision before
> data
> >> is
> >> > overwritten. At least an XWikiException should be thrown in this
> case. A
> >> > similar thing could be done before saving BaseObjects on L1203
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
> >> > to avoid collisions on Object IDs.
> >> >
> >> > I don't think a change like this would be difficult to implement, I
> could
> >> > provide a PR of that sort. The performance penalty has to be tested
> for
> >> > your systems though, since the full name isn't indexed afaik.
> >> >
> >> > Regards
> >> >
> >> > Marc Sladek
> >> > synventis gmbh
> >> >
> >> > On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]> wrote:
> >> >
> >> >> Hi Marc,
> >> >>
> >> >> Here are some answers:
> >> >>
> >> >> 1) MD5 was already a dependency of our oldcore and using SHA1 would
> have
> >> >> added a dependency without bringing much benefit. Since we only used
> 64
> >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a
> better
> >> >> distribution.
> >> >>
> >> >> 2) Such a collision detection is difficult to be introduced in the
> >> >> existing code base, for some API it is even impossible. What you
> >> experience
> >> >> with the 32-bit ids had been my motivation to the changes in 4.x and
> I
> >> >> could say, based on my long XWiki experience, that even with the poor
> >> >> 32 bit ids, very few users had been affected. Therefore, by improving
> >> the
> >> >> hash algorithm, the size of the ids, and the quality of the hashed
> key,
> >> we
> >> >> have considered ourselves to be saved enough for a normal usage.
> >> >>
> >> >> 3) That’s the worst point. I cannot answer about the first decision,
> I
> >> >> wasn’t yet involve, but regarding the changes introduced in 4.0,  a
> >> change
> >> >> had been considered. The ids are only there to satisfy Hibernate and
> its
> >> >> loading mechanism. If we had used a counter, we had to manage a
> >> conversion
> >> >> table between ids and entity references with all the additional
> >> complexity
> >> >> (consistency issues, caching, ...). This is so because we use entity
> >> >> reference to point directly to document (or even objects) everywhere
> in
> >> >> XWiki. This would have been a huge work to introduce that behaviour
> and
> >> at
> >> >> the same time keeping all the existing API unchanged. It would
> probably
> >> >> have introduced a performance penalty as well. This is why we
> resigned
> >> and
> >> >> go for an improved hash solution. IMO, if we had to make such a
> change,
> >> we
> >> >> are even better rewriting the storage service completely, and even
> stop
> >> >> using Hibernate, which, to be honest, does not bring much benefit to
> >> >> XWiki with its ORM aspects.
> >> >>
> >> >> But if you really want the complete answers, you can look at those
> >> threads:
> >> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
> >> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
> >> >>
> >> >> Regards,
> >> >>
> >> >> --
> >> >> Denis Gervalle
> >> >> SOFTEC sa - CEO
> >> >>
> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <[hidden email]
> >,
> >> >> wrote:
> >> >> > Dear XWiki devs
> >> >> >
> >> >> > We are using the XWiki platform for our applications but sadly are
> >> still
> >> >> > stuck with 2.7.2. Lately we ran into issues on a large database and
> >> >> noticed
> >> >> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
> >> collided
> >> >> > (hash collisions) and overwrote other objects without any trace -
> >> neither
> >> >> > visible in the history nor in a log file.
> >> >> >
> >> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> xpn/xwiki/doc/XWikiDocument.java#L841
> >> >> > and BaseElement
> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> xpn/xwiki/objects/BaseElement.java#L237
> >> >> > and
> >> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5,
> which
> >> >> > makes a collision less likely. I have a few questions regarding
> your
> >> >> > solution:
> >> >> >
> >> >> > 1) Is there any specific reason why you have chosen MD5 over SHA-1
> or
> >> 2?
> >> >> >
> >> >> > 2) Collisions are still possible and would be extremely hard to
> notice
> >> >> > since they are completely silent. Have you considered to implement
> a
> >> >> > collision detection to at least log occurring collisions - or even
> >> better
> >> >> > reserve 1-2bits of the 64bit to be used as collision counter in the
> >> case
> >> >> of
> >> >> > it happening?
> >> >> >
> >> >> > 3) To question the concept of generating a hash for an ID in
> general:
> >> >> > Wouldn't a database defined "auto increment" be a much more robust
> >> >> solution
> >> >> > for the hibernate IDs? A collision would be impossible and
> >> >> > clustering/scalability is still possible with e.g. the InnoDB
> >> >> “interleaved”
> >> >> > autoincrement lock mode. Why have you chosen a hash based solution
> in
> >> the
> >> >> > first place?
> >> >> >
> >> >> > I'm sorry if these questions were already answered in the dev
> mailing
> >> >> list
> >> >> > or on issues, please link me to them since I couldn't find any
> >> concrete
> >> >> > answers.
> >> >> >
> >> >> > Thanks for your time and regards
> >> >> >
> >> >> > Marc Sladek
> >> >> > synventis gmbh
> >> >>
> >>
> >>
> >>
> >> --
> >> Thomas Mortagne
> >>
>
>
>
> --
> Thomas Mortagne
>
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Thomas Mortagne
Administrator
By "most of the code" I mean most of the code I know :)

AFAIK everything in XWiki platform and in most extensions is doing
this and it includes of course everything you do trough the standard
UI.

On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <[hidden email]> wrote:

> With "most of the code", do you also mean when new documents are being
> created and stored (the scenario where collisions happen)?
>
> Imho there is nothing preventing collisions when doing:
> * new XWikiDocument(docRef)
> * modify instance
> * exists check and save
>
> On 2 February 2018 at 15:28, Thomas Mortagne <[hidden email]>
> wrote:
>
>> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[hidden email]>
>> wrote:
>> > Introducing this certainly doesn't hurt, but 'm not sure how useful it
>> is.
>>
>> I never said it was the solution to all collisions but it will cover
>> most of the (very rare and never reported) overwrites at 0 cost.
>>
>> > Firstly, it can show collisions only after a document is already
>> > overwritten, thus the damage is already done.
>>
>> Again keep in mind that most of the code does:
>> * getDocument(DocumentReference)
>> * modify the XWikiDocument instance
>> * saveDocument()
>>
>> so if getDocument() fail you are not going to overwrite anything.
>>
>> > Secondly, loadXWikiDoc has to
>> > be called for the document which doesn't exist anymore, I guess this
>> > doesn't happen so often since the system won't list it anymore.
>>
>> Not sure which use case you are referring to here. Are you talking
>> about document deleted the a document with a different reference but
>> same hash is saved ?
>>
>> >
>> > On 2 February 2018 at 13:36, Thomas Mortagne <[hidden email]>
>> > wrote:
>> >
>> >> For document what could help a lot already without any performance
>> >> penalty is to compare the loaded document reference and the passed one
>> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
>> >> in XWiki apply the following logic: getDocument(), modify it,
>> >> saveDocument().
>> >>
>> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <[hidden email]
>> >
>> >> wrote:
>> >> > Hi Denis,
>> >> >
>> >> > Thanks a lot for your answer. I know it's been a while, but I'd still
>> >> like
>> >> > to follow up on it since it's quite the fundamental issue.
>> >> >
>> >> >> Therefore, by improving the hash algorithm, the size of the ids, and
>> the
>> >> > quality of the hashed key, we have considered ourselves to be saved
>> >> enough
>> >> > for a normal usage.
>> >> >
>> >> > Still, with enough bad luck, documents and objects may be overwritten
>> >> > without a trace. This is not a stable implementation. And even worse,
>> if
>> >> on
>> >> > any XWiki installation hash collisions will happen in the future (or
>> have
>> >> > already happened since 4.x), they probably won't be easily associated
>> >> with
>> >> > this issue because it's nearly impossible to debug.
>> >> >
>> >> > While I do now understand the motivation to stick with hashes, I'm
>> still
>> >> > not sure why a collision detection would be difficult to introduce and
>> >> why
>> >> > it's even "impossible for some API". Let me briefly outline an idea:
>> >> >
>> >> > In XWikiHibernateStore#saveXWikiDoc on L615
>> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
>> >> > an exists check on the doc id is already performed. If now
>> >> > xwikidoc.fullName is also selected in the HQL, a comparison to
>> >> > doc.getDocumentReference() can expose an imminent collision before
>> data
>> >> is
>> >> > overwritten. At least an XWikiException should be thrown in this
>> case. A
>> >> > similar thing could be done before saving BaseObjects on L1203
>> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
>> >> > to avoid collisions on Object IDs.
>> >> >
>> >> > I don't think a change like this would be difficult to implement, I
>> could
>> >> > provide a PR of that sort. The performance penalty has to be tested
>> for
>> >> > your systems though, since the full name isn't indexed afaik.
>> >> >
>> >> > Regards
>> >> >
>> >> > Marc Sladek
>> >> > synventis gmbh
>> >> >
>> >> > On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]> wrote:
>> >> >
>> >> >> Hi Marc,
>> >> >>
>> >> >> Here are some answers:
>> >> >>
>> >> >> 1) MD5 was already a dependency of our oldcore and using SHA1 would
>> have
>> >> >> added a dependency without bringing much benefit. Since we only used
>> 64
>> >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a
>> better
>> >> >> distribution.
>> >> >>
>> >> >> 2) Such a collision detection is difficult to be introduced in the
>> >> >> existing code base, for some API it is even impossible. What you
>> >> experience
>> >> >> with the 32-bit ids had been my motivation to the changes in 4.x and
>> I
>> >> >> could say, based on my long XWiki experience, that even with the poor
>> >> >> 32 bit ids, very few users had been affected. Therefore, by improving
>> >> the
>> >> >> hash algorithm, the size of the ids, and the quality of the hashed
>> key,
>> >> we
>> >> >> have considered ourselves to be saved enough for a normal usage.
>> >> >>
>> >> >> 3) That’s the worst point. I cannot answer about the first decision,
>> I
>> >> >> wasn’t yet involve, but regarding the changes introduced in 4.0,  a
>> >> change
>> >> >> had been considered. The ids are only there to satisfy Hibernate and
>> its
>> >> >> loading mechanism. If we had used a counter, we had to manage a
>> >> conversion
>> >> >> table between ids and entity references with all the additional
>> >> complexity
>> >> >> (consistency issues, caching, ...). This is so because we use entity
>> >> >> reference to point directly to document (or even objects) everywhere
>> in
>> >> >> XWiki. This would have been a huge work to introduce that behaviour
>> and
>> >> at
>> >> >> the same time keeping all the existing API unchanged. It would
>> probably
>> >> >> have introduced a performance penalty as well. This is why we
>> resigned
>> >> and
>> >> >> go for an improved hash solution. IMO, if we had to make such a
>> change,
>> >> we
>> >> >> are even better rewriting the storage service completely, and even
>> stop
>> >> >> using Hibernate, which, to be honest, does not bring much benefit to
>> >> >> XWiki with its ORM aspects.
>> >> >>
>> >> >> But if you really want the complete answers, you can look at those
>> >> threads:
>> >> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
>> >> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> --
>> >> >> Denis Gervalle
>> >> >> SOFTEC sa - CEO
>> >> >>
>> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <[hidden email]
>> >,
>> >> >> wrote:
>> >> >> > Dear XWiki devs
>> >> >> >
>> >> >> > We are using the XWiki platform for our applications but sadly are
>> >> still
>> >> >> > stuck with 2.7.2. Lately we ran into issues on a large database and
>> >> >> noticed
>> >> >> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
>> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
>> >> collided
>> >> >> > (hash collisions) and overwrote other objects without any trace -
>> >> neither
>> >> >> > visible in the history nor in a log file.
>> >> >> >
>> >> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
>> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> >> xpn/xwiki/doc/XWikiDocument.java#L841
>> >> >> > and BaseElement
>> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> >> xpn/xwiki/objects/BaseElement.java#L237
>> >> >> > and
>> >> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5,
>> which
>> >> >> > makes a collision less likely. I have a few questions regarding
>> your
>> >> >> > solution:
>> >> >> >
>> >> >> > 1) Is there any specific reason why you have chosen MD5 over SHA-1
>> or
>> >> 2?
>> >> >> >
>> >> >> > 2) Collisions are still possible and would be extremely hard to
>> notice
>> >> >> > since they are completely silent. Have you considered to implement
>> a
>> >> >> > collision detection to at least log occurring collisions - or even
>> >> better
>> >> >> > reserve 1-2bits of the 64bit to be used as collision counter in the
>> >> case
>> >> >> of
>> >> >> > it happening?
>> >> >> >
>> >> >> > 3) To question the concept of generating a hash for an ID in
>> general:
>> >> >> > Wouldn't a database defined "auto increment" be a much more robust
>> >> >> solution
>> >> >> > for the hibernate IDs? A collision would be impossible and
>> >> >> > clustering/scalability is still possible with e.g. the InnoDB
>> >> >> “interleaved”
>> >> >> > autoincrement lock mode. Why have you chosen a hash based solution
>> in
>> >> the
>> >> >> > first place?
>> >> >> >
>> >> >> > I'm sorry if these questions were already answered in the dev
>> mailing
>> >> >> list
>> >> >> > or on issues, please link me to them since I couldn't find any
>> >> concrete
>> >> >> > answers.
>> >> >> >
>> >> >> > Thanks for your time and regards
>> >> >> >
>> >> >> > Marc Sladek
>> >> >> > synventis gmbh
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Mortagne
>> >>
>>
>>
>>
>> --
>> Thomas Mortagne
>>



--
Thomas Mortagne
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Marc Sladek
Ok, I understand now and agree, thanks for clarifying. Two caveats though:

1) As you mentioned, it applies only to document hash collisions. I'd
however expect object hash collisions to be more frequent because of a
higher object than document count in the average database.

2) It's not guaranteed that all projects using the XWiki platform call
getDocument before saveDocument on the same instance. For example our CMS
'celements' uses an XWikiDocumentCreator
<https://github.com/celements/celements-xwiki/blob/dev/celements-model/src/main/java/com/celements/model/access/DefaultXWikiDocumentCreator.java#L23>
to build XWikiDocuments from scratch without a getDocument call beforehand.
It's possible that there is more code around creating documents this way,
they would not be covered by your proposed change.


On 2 February 2018 at 16:14, Thomas Mortagne <[hidden email]>
wrote:

> By "most of the code" I mean most of the code I know :)
>
> AFAIK everything in XWiki platform and in most extensions is doing
> this and it includes of course everything you do trough the standard
> UI.
>
> On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <[hidden email]>
> wrote:
> > With "most of the code", do you also mean when new documents are being
> > created and stored (the scenario where collisions happen)?
> >
> > Imho there is nothing preventing collisions when doing:
> > * new XWikiDocument(docRef)
> > * modify instance
> > * exists check and save
> >
> > On 2 February 2018 at 15:28, Thomas Mortagne <[hidden email]>
> > wrote:
> >
> >> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[hidden email]>
> >> wrote:
> >> > Introducing this certainly doesn't hurt, but 'm not sure how useful it
> >> is.
> >>
> >> I never said it was the solution to all collisions but it will cover
> >> most of the (very rare and never reported) overwrites at 0 cost.
> >>
> >> > Firstly, it can show collisions only after a document is already
> >> > overwritten, thus the damage is already done.
> >>
> >> Again keep in mind that most of the code does:
> >> * getDocument(DocumentReference)
> >> * modify the XWikiDocument instance
> >> * saveDocument()
> >>
> >> so if getDocument() fail you are not going to overwrite anything.
> >>
> >> > Secondly, loadXWikiDoc has to
> >> > be called for the document which doesn't exist anymore, I guess this
> >> > doesn't happen so often since the system won't list it anymore.
> >>
> >> Not sure which use case you are referring to here. Are you talking
> >> about document deleted the a document with a different reference but
> >> same hash is saved ?
> >>
> >> >
> >> > On 2 February 2018 at 13:36, Thomas Mortagne <
> [hidden email]>
> >> > wrote:
> >> >
> >> >> For document what could help a lot already without any performance
> >> >> penalty is to compare the loaded document reference and the passed
> one
> >> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
> >> >> in XWiki apply the following logic: getDocument(), modify it,
> >> >> saveDocument().
> >> >>
> >> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <
> [hidden email]
> >> >
> >> >> wrote:
> >> >> > Hi Denis,
> >> >> >
> >> >> > Thanks a lot for your answer. I know it's been a while, but I'd
> still
> >> >> like
> >> >> > to follow up on it since it's quite the fundamental issue.
> >> >> >
> >> >> >> Therefore, by improving the hash algorithm, the size of the ids,
> and
> >> the
> >> >> > quality of the hashed key, we have considered ourselves to be saved
> >> >> enough
> >> >> > for a normal usage.
> >> >> >
> >> >> > Still, with enough bad luck, documents and objects may be
> overwritten
> >> >> > without a trace. This is not a stable implementation. And even
> worse,
> >> if
> >> >> on
> >> >> > any XWiki installation hash collisions will happen in the future
> (or
> >> have
> >> >> > already happened since 4.x), they probably won't be easily
> associated
> >> >> with
> >> >> > this issue because it's nearly impossible to debug.
> >> >> >
> >> >> > While I do now understand the motivation to stick with hashes, I'm
> >> still
> >> >> > not sure why a collision detection would be difficult to introduce
> and
> >> >> why
> >> >> > it's even "impossible for some API". Let me briefly outline an
> idea:
> >> >> >
> >> >> > In XWikiHibernateStore#saveXWikiDoc on L615
> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
> >> >> > an exists check on the doc id is already performed. If now
> >> >> > xwikidoc.fullName is also selected in the HQL, a comparison to
> >> >> > doc.getDocumentReference() can expose an imminent collision before
> >> data
> >> >> is
> >> >> > overwritten. At least an XWikiException should be thrown in this
> >> case. A
> >> >> > similar thing could be done before saving BaseObjects on L1203
> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
> >> >> > to avoid collisions on Object IDs.
> >> >> >
> >> >> > I don't think a change like this would be difficult to implement, I
> >> could
> >> >> > provide a PR of that sort. The performance penalty has to be tested
> >> for
> >> >> > your systems though, since the full name isn't indexed afaik.
> >> >> >
> >> >> > Regards
> >> >> >
> >> >> > Marc Sladek
> >> >> > synventis gmbh
> >> >> >
> >> >> > On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]>
> wrote:
> >> >> >
> >> >> >> Hi Marc,
> >> >> >>
> >> >> >> Here are some answers:
> >> >> >>
> >> >> >> 1) MD5 was already a dependency of our oldcore and using SHA1
> would
> >> have
> >> >> >> added a dependency without bringing much benefit. Since we only
> used
> >> 64
> >> >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a
> >> better
> >> >> >> distribution.
> >> >> >>
> >> >> >> 2) Such a collision detection is difficult to be introduced in the
> >> >> >> existing code base, for some API it is even impossible. What you
> >> >> experience
> >> >> >> with the 32-bit ids had been my motivation to the changes in 4.x
> and
> >> I
> >> >> >> could say, based on my long XWiki experience, that even with the
> poor
> >> >> >> 32 bit ids, very few users had been affected. Therefore, by
> improving
> >> >> the
> >> >> >> hash algorithm, the size of the ids, and the quality of the hashed
> >> key,
> >> >> we
> >> >> >> have considered ourselves to be saved enough for a normal usage.
> >> >> >>
> >> >> >> 3) That’s the worst point. I cannot answer about the first
> decision,
> >> I
> >> >> >> wasn’t yet involve, but regarding the changes introduced in 4.0,
> a
> >> >> change
> >> >> >> had been considered. The ids are only there to satisfy Hibernate
> and
> >> its
> >> >> >> loading mechanism. If we had used a counter, we had to manage a
> >> >> conversion
> >> >> >> table between ids and entity references with all the additional
> >> >> complexity
> >> >> >> (consistency issues, caching, ...). This is so because we use
> entity
> >> >> >> reference to point directly to document (or even objects)
> everywhere
> >> in
> >> >> >> XWiki. This would have been a huge work to introduce that
> behaviour
> >> and
> >> >> at
> >> >> >> the same time keeping all the existing API unchanged. It would
> >> probably
> >> >> >> have introduced a performance penalty as well. This is why we
> >> resigned
> >> >> and
> >> >> >> go for an improved hash solution. IMO, if we had to make such a
> >> change,
> >> >> we
> >> >> >> are even better rewriting the storage service completely, and even
> >> stop
> >> >> >> using Hibernate, which, to be honest, does not bring much benefit
> to
> >> >> >> XWiki with its ORM aspects.
> >> >> >>
> >> >> >> But if you really want the complete answers, you can look at those
> >> >> threads:
> >> >> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
> >> >> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
> >> >> >>
> >> >> >> Regards,
> >> >> >>
> >> >> >> --
> >> >> >> Denis Gervalle
> >> >> >> SOFTEC sa - CEO
> >> >> >>
> >> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <
> [hidden email]
> >> >,
> >> >> >> wrote:
> >> >> >> > Dear XWiki devs
> >> >> >> >
> >> >> >> > We are using the XWiki platform for our applications but sadly
> are
> >> >> still
> >> >> >> > stuck with 2.7.2. Lately we ran into issues on a large database
> and
> >> >> >> noticed
> >> >> >> > "disappearing" BaseObjects. We were able to link it to
> XWIKI-6990
> >> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
> >> >> collided
> >> >> >> > (hash collisions) and overwrote other objects without any trace
> -
> >> >> neither
> >> >> >> > visible in the history nor in a log file.
> >> >> >> >
> >> >> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> >> xpn/xwiki/doc/XWikiDocument.java#L841
> >> >> >> > and BaseElement
> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> >> xpn/xwiki/objects/BaseElement.java#L237
> >> >> >> > and
> >> >> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5,
> >> which
> >> >> >> > makes a collision less likely. I have a few questions regarding
> >> your
> >> >> >> > solution:
> >> >> >> >
> >> >> >> > 1) Is there any specific reason why you have chosen MD5 over
> SHA-1
> >> or
> >> >> 2?
> >> >> >> >
> >> >> >> > 2) Collisions are still possible and would be extremely hard to
> >> notice
> >> >> >> > since they are completely silent. Have you considered to
> implement
> >> a
> >> >> >> > collision detection to at least log occurring collisions - or
> even
> >> >> better
> >> >> >> > reserve 1-2bits of the 64bit to be used as collision counter in
> the
> >> >> case
> >> >> >> of
> >> >> >> > it happening?
> >> >> >> >
> >> >> >> > 3) To question the concept of generating a hash for an ID in
> >> general:
> >> >> >> > Wouldn't a database defined "auto increment" be a much more
> robust
> >> >> >> solution
> >> >> >> > for the hibernate IDs? A collision would be impossible and
> >> >> >> > clustering/scalability is still possible with e.g. the InnoDB
> >> >> >> “interleaved”
> >> >> >> > autoincrement lock mode. Why have you chosen a hash based
> solution
> >> in
> >> >> the
> >> >> >> > first place?
> >> >> >> >
> >> >> >> > I'm sorry if these questions were already answered in the dev
> >> mailing
> >> >> >> list
> >> >> >> > or on issues, please link me to them since I couldn't find any
> >> >> concrete
> >> >> >> > answers.
> >> >> >> >
> >> >> >> > Thanks for your time and regards
> >> >> >> >
> >> >> >> > Marc Sladek
> >> >> >> > synventis gmbh
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thomas Mortagne
> >> >>
> >>
> >>
> >>
> >> --
> >> Thomas Mortagne
> >>
>
>
>
> --
> Thomas Mortagne
>
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Thomas Mortagne
Administrator
As I said it's not a perfect solution but while waiting for one that
does not reduce a lot the performances it cost nothing and probably
covers most existing use cases.

On Wed, Feb 7, 2018 at 10:55 AM, Marc Sladek <[hidden email]> wrote:

> Ok, I understand now and agree, thanks for clarifying. Two caveats though:
>
> 1) As you mentioned, it applies only to document hash collisions. I'd
> however expect object hash collisions to be more frequent because of a
> higher object than document count in the average database.
>
> 2) It's not guaranteed that all projects using the XWiki platform call
> getDocument before saveDocument on the same instance. For example our CMS
> 'celements' uses an XWikiDocumentCreator
> <https://github.com/celements/celements-xwiki/blob/dev/celements-model/src/main/java/com/celements/model/access/DefaultXWikiDocumentCreator.java#L23>
> to build XWikiDocuments from scratch without a getDocument call beforehand.
> It's possible that there is more code around creating documents this way,
> they would not be covered by your proposed change.
>
>
> On 2 February 2018 at 16:14, Thomas Mortagne <[hidden email]>
> wrote:
>
>> By "most of the code" I mean most of the code I know :)
>>
>> AFAIK everything in XWiki platform and in most extensions is doing
>> this and it includes of course everything you do trough the standard
>> UI.
>>
>> On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <[hidden email]>
>> wrote:
>> > With "most of the code", do you also mean when new documents are being
>> > created and stored (the scenario where collisions happen)?
>> >
>> > Imho there is nothing preventing collisions when doing:
>> > * new XWikiDocument(docRef)
>> > * modify instance
>> > * exists check and save
>> >
>> > On 2 February 2018 at 15:28, Thomas Mortagne <[hidden email]>
>> > wrote:
>> >
>> >> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[hidden email]>
>> >> wrote:
>> >> > Introducing this certainly doesn't hurt, but 'm not sure how useful it
>> >> is.
>> >>
>> >> I never said it was the solution to all collisions but it will cover
>> >> most of the (very rare and never reported) overwrites at 0 cost.
>> >>
>> >> > Firstly, it can show collisions only after a document is already
>> >> > overwritten, thus the damage is already done.
>> >>
>> >> Again keep in mind that most of the code does:
>> >> * getDocument(DocumentReference)
>> >> * modify the XWikiDocument instance
>> >> * saveDocument()
>> >>
>> >> so if getDocument() fail you are not going to overwrite anything.
>> >>
>> >> > Secondly, loadXWikiDoc has to
>> >> > be called for the document which doesn't exist anymore, I guess this
>> >> > doesn't happen so often since the system won't list it anymore.
>> >>
>> >> Not sure which use case you are referring to here. Are you talking
>> >> about document deleted the a document with a different reference but
>> >> same hash is saved ?
>> >>
>> >> >
>> >> > On 2 February 2018 at 13:36, Thomas Mortagne <
>> [hidden email]>
>> >> > wrote:
>> >> >
>> >> >> For document what could help a lot already without any performance
>> >> >> penalty is to compare the loaded document reference and the passed
>> one
>> >> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
>> >> >> in XWiki apply the following logic: getDocument(), modify it,
>> >> >> saveDocument().
>> >> >>
>> >> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <
>> [hidden email]
>> >> >
>> >> >> wrote:
>> >> >> > Hi Denis,
>> >> >> >
>> >> >> > Thanks a lot for your answer. I know it's been a while, but I'd
>> still
>> >> >> like
>> >> >> > to follow up on it since it's quite the fundamental issue.
>> >> >> >
>> >> >> >> Therefore, by improving the hash algorithm, the size of the ids,
>> and
>> >> the
>> >> >> > quality of the hashed key, we have considered ourselves to be saved
>> >> >> enough
>> >> >> > for a normal usage.
>> >> >> >
>> >> >> > Still, with enough bad luck, documents and objects may be
>> overwritten
>> >> >> > without a trace. This is not a stable implementation. And even
>> worse,
>> >> if
>> >> >> on
>> >> >> > any XWiki installation hash collisions will happen in the future
>> (or
>> >> have
>> >> >> > already happened since 4.x), they probably won't be easily
>> associated
>> >> >> with
>> >> >> > this issue because it's nearly impossible to debug.
>> >> >> >
>> >> >> > While I do now understand the motivation to stick with hashes, I'm
>> >> still
>> >> >> > not sure why a collision detection would be difficult to introduce
>> and
>> >> >> why
>> >> >> > it's even "impossible for some API". Let me briefly outline an
>> idea:
>> >> >> >
>> >> >> > In XWikiHibernateStore#saveXWikiDoc on L615
>> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
>> >> >> > an exists check on the doc id is already performed. If now
>> >> >> > xwikidoc.fullName is also selected in the HQL, a comparison to
>> >> >> > doc.getDocumentReference() can expose an imminent collision before
>> >> data
>> >> >> is
>> >> >> > overwritten. At least an XWikiException should be thrown in this
>> >> case. A
>> >> >> > similar thing could be done before saving BaseObjects on L1203
>> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
>> >> >> > to avoid collisions on Object IDs.
>> >> >> >
>> >> >> > I don't think a change like this would be difficult to implement, I
>> >> could
>> >> >> > provide a PR of that sort. The performance penalty has to be tested
>> >> for
>> >> >> > your systems though, since the full name isn't indexed afaik.
>> >> >> >
>> >> >> > Regards
>> >> >> >
>> >> >> > Marc Sladek
>> >> >> > synventis gmbh
>> >> >> >
>> >> >> > On 30 November 2017 at 15:21, Denis Gervalle <[hidden email]>
>> wrote:
>> >> >> >
>> >> >> >> Hi Marc,
>> >> >> >>
>> >> >> >> Here are some answers:
>> >> >> >>
>> >> >> >> 1) MD5 was already a dependency of our oldcore and using SHA1
>> would
>> >> have
>> >> >> >> added a dependency without bringing much benefit. Since we only
>> used
>> >> 64
>> >> >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a
>> >> better
>> >> >> >> distribution.
>> >> >> >>
>> >> >> >> 2) Such a collision detection is difficult to be introduced in the
>> >> >> >> existing code base, for some API it is even impossible. What you
>> >> >> experience
>> >> >> >> with the 32-bit ids had been my motivation to the changes in 4.x
>> and
>> >> I
>> >> >> >> could say, based on my long XWiki experience, that even with the
>> poor
>> >> >> >> 32 bit ids, very few users had been affected. Therefore, by
>> improving
>> >> >> the
>> >> >> >> hash algorithm, the size of the ids, and the quality of the hashed
>> >> key,
>> >> >> we
>> >> >> >> have considered ourselves to be saved enough for a normal usage.
>> >> >> >>
>> >> >> >> 3) That’s the worst point. I cannot answer about the first
>> decision,
>> >> I
>> >> >> >> wasn’t yet involve, but regarding the changes introduced in 4.0,
>> a
>> >> >> change
>> >> >> >> had been considered. The ids are only there to satisfy Hibernate
>> and
>> >> its
>> >> >> >> loading mechanism. If we had used a counter, we had to manage a
>> >> >> conversion
>> >> >> >> table between ids and entity references with all the additional
>> >> >> complexity
>> >> >> >> (consistency issues, caching, ...). This is so because we use
>> entity
>> >> >> >> reference to point directly to document (or even objects)
>> everywhere
>> >> in
>> >> >> >> XWiki. This would have been a huge work to introduce that
>> behaviour
>> >> and
>> >> >> at
>> >> >> >> the same time keeping all the existing API unchanged. It would
>> >> probably
>> >> >> >> have introduced a performance penalty as well. This is why we
>> >> resigned
>> >> >> and
>> >> >> >> go for an improved hash solution. IMO, if we had to make such a
>> >> change,
>> >> >> we
>> >> >> >> are even better rewriting the storage service completely, and even
>> >> stop
>> >> >> >> using Hibernate, which, to be honest, does not bring much benefit
>> to
>> >> >> >> XWiki with its ORM aspects.
>> >> >> >>
>> >> >> >> But if you really want the complete answers, you can look at those
>> >> >> threads:
>> >> >> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
>> >> >> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>> >> >> >>
>> >> >> >> Regards,
>> >> >> >>
>> >> >> >> --
>> >> >> >> Denis Gervalle
>> >> >> >> SOFTEC sa - CEO
>> >> >> >>
>> >> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <
>> [hidden email]
>> >> >,
>> >> >> >> wrote:
>> >> >> >> > Dear XWiki devs
>> >> >> >> >
>> >> >> >> > We are using the XWiki platform for our applications but sadly
>> are
>> >> >> still
>> >> >> >> > stuck with 2.7.2. Lately we ran into issues on a large database
>> and
>> >> >> >> noticed
>> >> >> >> > "disappearing" BaseObjects. We were able to link it to
>> XWIKI-6990
>> >> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
>> >> >> collided
>> >> >> >> > (hash collisions) and overwrote other objects without any trace
>> -
>> >> >> neither
>> >> >> >> > visible in the history nor in a log file.
>> >> >> >> >
>> >> >> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
>> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> >> >> xpn/xwiki/doc/XWikiDocument.java#L841
>> >> >> >> > and BaseElement
>> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> >> >> xpn/xwiki/objects/BaseElement.java#L237
>> >> >> >> > and
>> >> >> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5,
>> >> which
>> >> >> >> > makes a collision less likely. I have a few questions regarding
>> >> your
>> >> >> >> > solution:
>> >> >> >> >
>> >> >> >> > 1) Is there any specific reason why you have chosen MD5 over
>> SHA-1
>> >> or
>> >> >> 2?
>> >> >> >> >
>> >> >> >> > 2) Collisions are still possible and would be extremely hard to
>> >> notice
>> >> >> >> > since they are completely silent. Have you considered to
>> implement
>> >> a
>> >> >> >> > collision detection to at least log occurring collisions - or
>> even
>> >> >> better
>> >> >> >> > reserve 1-2bits of the 64bit to be used as collision counter in
>> the
>> >> >> case
>> >> >> >> of
>> >> >> >> > it happening?
>> >> >> >> >
>> >> >> >> > 3) To question the concept of generating a hash for an ID in
>> >> general:
>> >> >> >> > Wouldn't a database defined "auto increment" be a much more
>> robust
>> >> >> >> solution
>> >> >> >> > for the hibernate IDs? A collision would be impossible and
>> >> >> >> > clustering/scalability is still possible with e.g. the InnoDB
>> >> >> >> “interleaved”
>> >> >> >> > autoincrement lock mode. Why have you chosen a hash based
>> solution
>> >> in
>> >> >> the
>> >> >> >> > first place?
>> >> >> >> >
>> >> >> >> > I'm sorry if these questions were already answered in the dev
>> >> mailing
>> >> >> >> list
>> >> >> >> > or on issues, please link me to them since I couldn't find any
>> >> >> concrete
>> >> >> >> > answers.
>> >> >> >> >
>> >> >> >> > Thanks for your time and regards
>> >> >> >> >
>> >> >> >> > Marc Sladek
>> >> >> >> > synventis gmbh
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thomas Mortagne
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Mortagne
>> >>
>>
>>
>>
>> --
>> Thomas Mortagne
>>



--
Thomas Mortagne
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Denis Gervalle-2
Hi Marc and Thomas,

I followed your discussion with great interest. I agree that Thomas very light proposal is good to put in place, since it has almost no negative impact and only benefit. I think there is also a possibility to mitigate the object issue with something close (check integrity of what we get, to at least detect an issue), but that's not perfect of course.

That’s said, I would like to point you to this interesting question on StackOverflow (https://stackoverflow.com/questions/22029012/probability-of-64bit-hash-code-collisions) and remind you that base on the Birthday Paradox, with the released of 4.x, we have raised our worrying threshold of documents/objects from 65535, to more than 4 billion… and it took a while (4 versions of XWiki) before we had the strong feeling we need to raise. So, while before 4.x, the worrying threshold was really low, the effective happening of a collision was already low.

My own experience was the risk before 4.x was really high with generated names, much hight than with names use by real user. When I was it by that issue, I remember being really bad about it. This is also probably why you have raised this thread. The previous hash was too small and had also a discutable distribution.

The MD5 algorithm like many crypto hashes is particularly well suited for providing a good distribution (http://michiel.buddingh.eu/distribution-of-hash-values), the cutting at 64 bits may lower this, but I doubt it would be significant for us. So, personally, I feel really comfortable with the current implementation, and I think you can sleep in peace as well.

Just my thought about not raising fears when it’s no more really justified.
Regards,

--
Denis Gervalle
SOFTEC sa - CEO

On 7 Feb 2018, 16:10 +0100, Denis Gervalle <[hidden email]>, wrote:

>
> Hi Marc and Thomas,
>
> I followed your discussion with great interest. I agree that Thomas very light proposal is good to put in place, since it has almost no negative impact and only benefit. I think there is also a possibility to mitigate the object issue with something close (check integrity of what we get, to at least detect an issue), but that's not perfect of course.
>
> That’s said, I would like to point you to this interesting question on StackOverflow (https://stackoverflow.com/questions/22029012/probability-of-64bit-hash-code-collisions) and remind you that base on the Birthday Paradox, with the released of 4.x, we have raised our worrying threshold of documents/objects from 65535, to more than 4 billion… and it took a while (4 versions of XWiki) before we had the strong feeling we need to raise. So, while before 4.x, the worrying threshold was really low, the effective happening of a collision was already low.
>
> My own experience was the risk before 4.x was really high with generated names, much hight than with names use by real user. When I was it by that issue, I remember being really bad about it. This is also probably why you have raised this thread. The previous hash was too small and had also a discutable distribution.
>
> The MD5 algorithm like many crypto hashes is particularly well suited for providing a good distribution (http://michiel.buddingh.eu/distribution-of-hash-values), the cutting at 64 bits may lower this, but I doubt it would be significant for us. So, personally, I feel really comfortable with the current implementation, and I think you can sleep in peace as well.
>
> Just my thought about not raising fears when it’s no more really justified.
> Regards,
Reply | Threaded
Open this post in threaded view
|

Re: BaseObject ID collisions

Fabian Pichler
Hi Denis

I personaly think, that this is not a discussion about propabilities but
about fail-safe and robust implementation.

I understand that you do not want to implement a logic to the id generation
which can mitigate collisions
(e.g. collision-bit), because the probability that this logic ever solves a
problem is "low" and the code must
be maintained. I agree that this is a question about probabilities if this
feature is useful and necessary.

Yet, I think if anybody hits a collision, which still may happen with
enough bad luck, than this person wants
the system to behave robust and fail-safe and that it never ever corrupts
stored data. IMHO this is therefore
not a question of probabilites but a question of fail-safe implementation
and thus a question of software quality.

Only my 5cents.
Regards,
Fabian


2018-02-08 9:24 GMT+01:00 Denis Gervalle <[hidden email]>:

> Hi Marc and Thomas,
>
> I followed your discussion with great interest. I agree that Thomas very
> light proposal is good to put in place, since it has almost no negative
> impact and only benefit. I think there is also a possibility to mitigate
> the object issue with something close (check integrity of what we get, to
> at least detect an issue), but that's not perfect of course.
>
> That’s said, I would like to point you to this interesting question on
> StackOverflow (https://stackoverflow.com/questions/22029012/
> probability-of-64bit-hash-code-collisions) and remind you that base on
> the Birthday Paradox, with the released of 4.x, we have raised our worrying
> threshold of documents/objects from 65535, to more than 4 billion… and it
> took a while (4 versions of XWiki) before we had the strong feeling we need
> to raise. So, while before 4.x, the worrying threshold was really low, the
> effective happening of a collision was already low.
>
> My own experience was the risk before 4.x was really high with generated
> names, much hight than with names use by real user. When I was it by that
> issue, I remember being really bad about it. This is also probably why you
> have raised this thread. The previous hash was too small and had also a
> discutable distribution.
>
> The MD5 algorithm like many crypto hashes is particularly well suited for
> providing a good distribution (http://michiel.buddingh.eu/
> distribution-of-hash-values), the cutting at 64 bits may lower this, but
> I doubt it would be significant for us. So, personally, I feel really
> comfortable with the current implementation, and I think you can sleep in
> peace as well.
>
> Just my thought about not raising fears when it’s no more really justified.
> Regards,
>
> --
> Denis Gervalle
> SOFTEC sa - CEO
>
> On 7 Feb 2018, 16:10 +0100, Denis Gervalle <[hidden email]>,
> wrote:
> >
> > Hi Marc and Thomas,
> >
> > I followed your discussion with great interest. I agree that Thomas very
> light proposal is good to put in place, since it has almost no negative
> impact and only benefit. I think there is also a possibility to mitigate
> the object issue with something close (check integrity of what we get, to
> at least detect an issue), but that's not perfect of course.
> >
> > That’s said, I would like to point you to this interesting question on
> StackOverflow (https://stackoverflow.com/questions/22029012/
> probability-of-64bit-hash-code-collisions) and remind you that base on
> the Birthday Paradox, with the released of 4.x, we have raised our worrying
> threshold of documents/objects from 65535, to more than 4 billion… and it
> took a while (4 versions of XWiki) before we had the strong feeling we need
> to raise. So, while before 4.x, the worrying threshold was really low, the
> effective happening of a collision was already low.
> >
> > My own experience was the risk before 4.x was really high with generated
> names, much hight than with names use by real user. When I was it by that
> issue, I remember being really bad about it. This is also probably why you
> have raised this thread. The previous hash was too small and had also a
> discutable distribution.
> >
> > The MD5 algorithm like many crypto hashes is particularly well suited
> for providing a good distribution (http://michiel.buddingh.eu/
> distribution-of-hash-values), the cutting at 64 bits may lower this, but
> I doubt it would be significant for us. So, personally, I feel really
> comfortable with the current implementation, and I think you can sleep in
> peace as well.
> >
> > Just my thought about not raising fears when it’s no more really
> justified.
> > Regards,
>