Use Lucene Syntax in Fulltext Search

There is an open PR https://github.com/dhlab-basel/Knora/pull/1379 that unifies the handling of fulltext search for v1 and v2.

So far, a fulltext search has been preprocessed before it was executed by the triplestore. I have added this preprocessing because most people would expect search terms to be combined with a logical AND, while Lucene’s default is a logical OR.

Now I am wondering if this preprocessing is a good approach. It is still very basic and doesn’t handle cases correctly in which people search for words that have a special meaning in Lucene. In the PR, I have started to make it more powerful using a regex. But I think this will get very complex while the benefits seem quite limited.

I am rather inclined to let people use the Lucene syntax directly. I need to document this in the Knora docs and in the GUI. In the GUI, some examples have to be provided, too. For instance:

  • search for texts that contain Leonhard and Euler: Leonhard AND Euler, alternatively: +Leonhard +Euler
  • search for an exact match of Leonhard Euler: “Leonhard Euler”

I think preprocessing makes still sense for the “search as you type” cases, when searching for linked resources in their labels (when adding a link or when doing an extended search for a linked resource in the GUI).

Please let me know about your opinion.

Addendum:

If necessary, the GUI could provide some preprocessing for the fulltext search. https://www.google.com/advanced_search does it that way. There is a text input field “any of these words:” and then those terms get combined with an OR, resulting in “Leonhard OR Euler”.

So the Knora-ui fulltext search could offer different search modes with examples.

I would also be in favour of using Lucene syntax directly. This would mean less processing in Knora and would therefore be less error-prone.

1 Like

@t.schweizer If I understand correctly, this was done in PR 1379, is that right?

@benjamingeer yes, that’s right!