lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <>
Subject Re: Design optimal Solr Schema
Date Thu, 11 Dec 2014 13:53:12 GMT

You have a difficult use case. You seem to have a speech recognition
domain and you want to be able to search that transcribed text with
reference back to timing. It's an interesting problem, but not an easy
one. Certainly not something one can give you the answer all at once.

The issue here is representation of that text. You want it both
per-word (so you have timing) and as a flowing text (so you could find
it). And then, you also have problems how to express it from the PHP

But here are things you need to think about:
1) Do you have groups in your word sequence. You say find "how are
you" but what about "there ah how" which would be still together in
the stream but is the end of one sentence and start of another. If you
do want to find any sequence of consequent words, you need to index
them together and you end up with one very long document. If not, you
need to decide how you are going to break your continuous text into
groups (based on SILENCE, timing, or something else)

2) Then you have the association of multi-word sequence to time. You
say "Good morning to you" is at 5.25, but that's not possible as each
word has it's own duration. Does it mean the word Good was 5.25? Can
they find "Morning to you" and will it still return 5.25? or 5.28?
This design decision will affect how you index it.

3) And what happens if the matched text happens twice like "Chao" -
hello and "Chao" - goodbye. If you want two separate documents
returned, this implies two documents in Solr. So, that goes hand in
hand with (1) above.

4) Then you have a whole highlighting issue, which I am not even going
to start on, except that the text being highlighted needs to be in one
field, so that has impact too.

Personal: and @arafalov
Solr resources and newsletter: and @solrstart
Solr popularizers community:

On 11 December 2014 at 03:33, tomas.kalas <> wrote:
> Thanks for help, but how wrote Alex, I used synonm filter and it is what i
> want. When i wrote to synonym for example Hello, Hi. And sentence is Hello
> how are you and my query is Hi how are you, so that find it too.
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

View raw message