Lucene search Problem with Wiki Farm Member "it"

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene search Problem with Wiki Farm Member "it"

Tronicek
Hi,
we've updatet XWiki 1.7 XE to 2.7.33656 and are using the Wiki Manager to have a Wiki Farm.

There is a strange behaviour we have not realized immediately related to search requests.
It seams that the name of the virtual wiki is causing the problem. Its name is "it" and is used as solution base for IT problems.

We can reproduce the problem by:
- create a new virtual wiki with name "it" (without quotation marks).
- import xwiki-enterprise-wiki-2.7.xar
- search with lucene (no results): .../view/Main/LuceneSearch?text=sandbox&space=
- search with old engine (see pages): .../view/Main/WebSearch?text=sandbox&space=

We tried to change the analyzer in xwiki.cfg:
xwiki.plugins.lucene.analyzer=org.apache.lucene.analysis.de.GermanAnalyzer
-> no success

Our virtual wikis are mapped via virtual path (xwiki.cfg: xwiki.virtual.usepath=1).

It would be nice to keep the virtual wiki name. Is there a workaround to handle this problem?

Regards,
Rudolf
Reply | Threaded
Open this post in threaded view
|

Re: Lucene search Problem with Wiki Farm Member "it"

Thomas Mortagne
Administrator
Hi Rudolf,

Sorry for the delay.

It definitely looks like a bug and probably an escaping bug.

Could you create an issue on jira.xwiki.org with all the details to
reproduce it ?

On Fri, Feb 4, 2011 at 11:15, Tronicek <[hidden email]> wrote:

>
> Hi,
> we've updatet XWiki 1.7 XE to 2.7.33656 and are using the Wiki Manager to
> have a Wiki Farm.
>
> There is a strange behaviour we have not realized immediately related to
> search requests.
> It seams that the name of the virtual wiki is causing the problem. Its name
> is "it" and is used as solution base for IT problems.
>
> We can reproduce the problem by:
> - create a new virtual wiki with name "it" (without quotation marks).
> - import xwiki-enterprise-wiki-2.7.xar
> - search with lucene (no results):
> .../view/Main/LuceneSearch?text=sandbox&space=
> - search with old engine (see pages):
> .../view/Main/WebSearch?text=sandbox&space=
>
> We tried to change the analyzer in xwiki.cfg:
> xwiki.plugins.lucene.analyzer=org.apache.lucene.analysis.de.GermanAnalyzer
> -> no success
>
> Our virtual wikis are mapped via virtual path (xwiki.cfg:
> xwiki.virtual.usepath=1).
>
> It would be nice to keep the virtual wiki name. Is there a workaround to
> handle this problem?
>
> Regards,
> Rudolf
>
> --
> View this message in context: http://xwiki.475771.n2.nabble.com/Lucene-search-Problem-with-Wiki-Farm-Member-it-tp5992070p5992070.html
> Sent from the XWiki- Users mailing list archive at Nabble.com.
> _______________________________________________
> users mailing list
> [hidden email]
> http://lists.xwiki.org/mailman/listinfo/users
>



--
Thomas Mortagne
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Lucene search Problem with Wiki Farm Member "it"

Ludovic Dubost
In reply to this post by Tronicek

We have found some issues with the analyzer code that analyzes the wiki
name.
Though with the english analyzer this should not be a problem.
We are fixing this for the next versions of XWiki.

Now if you are sure the problem is the name of the wiki, rename it and
use a wiki "alias".

The result for the user will be the same URL, but the wiki will have
internally another name

Ludovic

Le 04/02/11 11:15, Tronicek a écrit :

> Hi,
> we've updatet XWiki 1.7 XE to 2.7.33656 and are using the Wiki Manager to
> have a Wiki Farm.
>
> There is a strange behaviour we have not realized immediately related to
> search requests.
> It seams that the name of the virtual wiki is causing the problem. Its name
> is "it" and is used as solution base for IT problems.
>
> We can reproduce the problem by:
> - create a new virtual wiki with name "it" (without quotation marks).
> - import xwiki-enterprise-wiki-2.7.xar
> - search with lucene (no results):
> .../view/Main/LuceneSearch?text=sandbox&space=
> - search with old engine (see pages):
> .../view/Main/WebSearch?text=sandbox&space=
>
> We tried to change the analyzer in xwiki.cfg:
> xwiki.plugins.lucene.analyzer=org.apache.lucene.analysis.de.GermanAnalyzer
> ->  no success
>
> Our virtual wikis are mapped via virtual path (xwiki.cfg:
> xwiki.virtual.usepath=1).
>
> It would be nice to keep the virtual wiki name. Is there a workaround to
> handle this problem?
>
> Regards,
> Rudolf
>

--
Ludovic Dubost
Blog: http://blog.ludovic.org/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost


_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Lucene search Problem with Wiki Farm Member "it"

vmassol
Administrator
Is it this issue:
http://jira.xwiki.org/jira/browse/XWIKI-5976

?

If so it's fixed in 3.0M3

Thanks
-Vincent

On Feb 16, 2011, at 12:23 AM, Ludovic Dubost wrote:

>
> We have found some issues with the analyzer code that analyzes the wiki name.
> Though with the english analyzer this should not be a problem.
> We are fixing this for the next versions of XWiki.
>
> Now if you are sure the problem is the name of the wiki, rename it and use a wiki "alias".
>
> The result for the user will be the same URL, but the wiki will have internally another name
>
> Ludovic
>
> Le 04/02/11 11:15, Tronicek a écrit :
>> Hi,
>> we've updatet XWiki 1.7 XE to 2.7.33656 and are using the Wiki Manager to
>> have a Wiki Farm.
>>
>> There is a strange behaviour we have not realized immediately related to
>> search requests.
>> It seams that the name of the virtual wiki is causing the problem. Its name
>> is "it" and is used as solution base for IT problems.
>>
>> We can reproduce the problem by:
>> - create a new virtual wiki with name "it" (without quotation marks).
>> - import xwiki-enterprise-wiki-2.7.xar
>> - search with lucene (no results):
>> .../view/Main/LuceneSearch?text=sandbox&space=
>> - search with old engine (see pages):
>> .../view/Main/WebSearch?text=sandbox&space=
>>
>> We tried to change the analyzer in xwiki.cfg:
>> xwiki.plugins.lucene.analyzer=org.apache.lucene.analysis.de.GermanAnalyzer
>> ->  no success
>>
>> Our virtual wikis are mapped via virtual path (xwiki.cfg:
>> xwiki.virtual.usepath=1).
>>
>> It would be nice to keep the virtual wiki name. Is there a workaround to
>> handle this problem?
>>
>> Regards,
>> Rudolf
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Lucene search Problem with Wiki Farm Member "it"

Sergiu Dumitriu-2
On 02/17/2011 03:56 PM, Vincent Massol wrote:
> Is it this issue:
> http://jira.xwiki.org/jira/browse/XWIKI-5976
>
> ?
>
> If so it's fixed in 3.0M3

No, that's related to Hibernate. The problem is in the indexer, it
analyzes all the document fields. The default analyzer processes the
text in different ways:
- splits into tokens
- removes apostrophes and other punctuation
- *removes common words* (called "stop words")
- stems words
- transforms to lowercase

Since "it" is a common (stop) word, this actually means that a document
coming from the "it" wiki will actually have an empty "wiki:" field.

The solution appears to be simple, just stop analyzing special tokens:
wiki, language, space, creator...

In reality it is a lot more difficult, since even if the indexed data
will be correct, the default query parser runs the query through the
same analyzer, and it doesn't know not to process tokens with an
explicit field. This means that searching for "wiki:it" will actually
remove this search token from the query. Searching for the
"space:Apples" will actually try to match an "apple" token against the
"Apples" index, so it won't give the right results.

I've stopped working on this for the moment, if someone else wants to
pick up the remaining work (writing a more intelligent query parser that
knows which fields should be analyzed and which not), I can help, but
it's not a priority for 3.0 for me.

A quick fix is possible for the wiki field, since it is not passed to
the query parser, but is manually added to the query, unanalyzed.

> Thanks
> -Vincent
>
> On Feb 16, 2011, at 12:23 AM, Ludovic Dubost wrote:
>
>>
>> We have found some issues with the analyzer code that analyzes the wiki name.
>> Though with the english analyzer this should not be a problem.
>> We are fixing this for the next versions of XWiki.
>>
>> Now if you are sure the problem is the name of the wiki, rename it and use a wiki "alias".
>>
>> The result for the user will be the same URL, but the wiki will have internally another name
>>
>> Ludovic
>>
>> Le 04/02/11 11:15, Tronicek a écrit :
>>> Hi,
>>> we've updatet XWiki 1.7 XE to 2.7.33656 and are using the Wiki Manager to
>>> have a Wiki Farm.
>>>
>>> There is a strange behaviour we have not realized immediately related to
>>> search requests.
>>> It seams that the name of the virtual wiki is causing the problem. Its name
>>> is "it" and is used as solution base for IT problems.
>>>
>>> We can reproduce the problem by:
>>> - create a new virtual wiki with name "it" (without quotation marks).
>>> - import xwiki-enterprise-wiki-2.7.xar
>>> - search with lucene (no results):
>>> .../view/Main/LuceneSearch?text=sandbox&space=
>>> - search with old engine (see pages):
>>> .../view/Main/WebSearch?text=sandbox&space=
>>>
>>> We tried to change the analyzer in xwiki.cfg:
>>> xwiki.plugins.lucene.analyzer=org.apache.lucene.analysis.de.GermanAnalyzer
>>> ->   no success
>>>
>>> Our virtual wikis are mapped via virtual path (xwiki.cfg:
>>> xwiki.virtual.usepath=1).
>>>
>>> It would be nice to keep the virtual wiki name. Is there a workaround to
>>> handle this problem?
>>>
>>> Regards,
>>> Rudolf


--
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users