There is an open PR https://github.com/dhlab-basel/Knora/pull/1379 that unifies the handling of fulltext search for v1 and v2.
So far, a fulltext search has been preprocessed before it was executed by the triplestore. I have added this preprocessing because most people would expect search terms to be combined with a logical AND, while Lucene’s default is a logical OR.
Now I am wondering if this preprocessing is a good approach. It is still very basic and doesn’t handle cases correctly in which people search for words that have a special meaning in Lucene. In the PR, I have started to make it more powerful using a regex. But I think this will get very complex while the benefits seem quite limited.
I am rather inclined to let people use the Lucene syntax directly. I need to document this in the Knora docs and in the GUI. In the GUI, some examples have to be provided, too. For instance:
- search for texts that contain Leonhard and Euler: Leonhard AND Euler, alternatively: +Leonhard +Euler
- search for an exact match of Leonhard Euler: “Leonhard Euler”
I think preprocessing makes still sense for the “search as you type” cases, when searching for linked resources in their labels (when adding a link or when doing an extended search for a linked resource in the GUI).
Please let me know about your opinion.
If necessary, the GUI could provide some preprocessing for the fulltext search. https://www.google.com/advanced_search does it that way. There is a text input field “any of these words:” and then those terms get combined with an OR, resulting in “Leonhard OR Euler”.
So the Knora-ui fulltext search could offer different search modes with examples.