I don't think so ... Let me be specific:
First, consider the case of one 'analysis': an input token maps to a lemma
and a sequence of components.
So, we product
surface form
lemma PI 0
comp1 PI 0
comp2 PI 1
.....
with PL set appropriately to cover the pieces. All the information is there.
Now, if we have another analysis, we want to 'rewind' position, and deliver
another lemma and another set of components, but, of course, we can't do
that.
The best we could do is something like:
surface form
lemma1 PI 0
lemma2 PI 0
....
lemmaN PI 0
comp01 PI 0
comp11 PI 0
....
....
comp0N
compMN
That is, group all the firstcomponents, and all the secondcomponents.
But now the bits and pieces of the compounds are interspersed. Maybe that's
OK.
On Fri, Oct 24, 2014 at 5:44 PM, Will Martin <wmartinusa@gmail.com> wrote:
> HI Benson:
>
> This is the case with ngramming (though you have a more complicated start
> chooser than most I imagine). Does that help get your ideas unblocked?
>
> Will
>
> Original Message
> From: Benson Margulies [mailto:bimargulies@gmail.com]
> Sent: Friday, October 24, 2014 4:43 PM
> To: javauser@lucene.apache.org
> Subject: A really hairy token graph case
>
> Consider a case where we have a token which can be subdivided in several
> ways. This can happen in German. We'd like to represent this with
> positionIncrement/positionLength, but it does not seem possible.
>
> Once the position has moved out from one set of 'subtokens', we see no way
> to move it back for the second set of alternatives.
>
> Is this something that was considered?
>
> 
> To unsubscribe, email: javauserunsubscribe@lucene.apache.org
> For additional commands, email: javauserhelp@lucene.apache.org
>
>
>
> 
> To unsubscribe, email: javauserunsubscribe@lucene.apache.org
> For additional commands, email: javauserhelp@lucene.apache.org
>
>
