lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arturas Mazeika <maze...@gmail.com>
Subject Re: some parent documents
Date Tue, 03 Apr 2018 15:21:35 GMT
Hi Mikhail,

Thanks a lot for the reply.

You mentioned that

q=+{!parent which.. v='+text:hello +person:A'} +{!parent
which..v='+text:ciao +person:B'}

is the way to go. How would it look like precisely for the following
collection?

{
    "id":1,
    "_childDocuments_":
    [
        {"id":"1_1", "person":"Vai"         , "time":"3:14", "msg":"Hello"},
        {"id":"1_2", "person":"Arturas"     , "time":"3:14", "msg":"Hello"},
        {"id":"1_3", "person":"Vai"         , "time":"3:15", "msg":"Coz
Mathias is working on another system- different screen."},
        {"id":"1_4", "person":"Vai"         , "time":"3:15", "msg":"It can
get annoying"},
        {"id":"1_5", "person":"Arturas"     , "time":"3:15", "msg":"Thank
you. this is very nice of you"},
        {"id":"1_6", "person":"Vai"         , "time":"3:16", "msg":"ciao"},
        {"id":"1_7", "person":"Arturas"     , "time":"3:16", "msg":"ciao"}
    ]
},
{
    "id":2,
    "_childDocuments_":
    [
        {"id":"2_1", "person":"Vai"         , "time":"4:14", "msg":"Hello"},
        {"id":"2_2", "person":"Arturas"     , "time":"4:14", "msg":"IBM
Watson"},
        {"id":"2_3", "person":"Vai"         , "time":"4:15", "msg":"need to
retain content"},
        {"id":"2_4", "person":"Vai"         , "time":"4:15", "msg":"It can
get annoying"},
        {"id":"2_5", "person":"Arturas"     , "time":"4:15", "msg":"You can
make all your meetings more access"},
        {"id":"2_6", "person":"Vai"         , "time":"4:16", "msg":"Make
every meeting a Skype meeting"},
        {"id":"2_7", "person":"Arturas"     , "time":"4:16", "msg":"ciao"}
    ]
}

Cheers,
Arturas


On Tue, Apr 3, 2018 at 4:33 PM, Mikhail Khludnev <mkhl@apache.org> wrote:

> Hello, Arturas.
>
> TLDR; Please find inline below.
>
> On Tue, Apr 3, 2018 at 5:14 PM, Arturas Mazeika <mazeika@gmail.com> wrote:
>
> > Hi Solr Fans,
> >
> > I am trying to make sense of information retrieval using expressions like
> > "some parent", "*only parent*", " *all parent*". I am also trying to
> > understand the syntax "!parent which" and "!child of". On the technical
> > level, I am reading the following documents:
> >
> > [1]
> > https://lucene.apache.org/solr/guide/7_2/other-parsers.
> > html#block-join-query-parsers
> > [2]
> > https://lucene.apache.org/solr/guide/7_2/uploading-data-
> > with-index-handlers.html#nested-child-documents
> > [3] http://yonik.com/solr-nested-objects/
> >
> > and I am confused to read:
> >
> > This parser takes a query that matches some parent documents and returns
> > their children. The syntax for this parser is: q={!child
> > of=<allParents>}<someParents>. The parameter allParents is a filter
that
> > matches *only parent documents*; here you would define the field and
> value
> > that you used to identify *all parent documents*. The parameter
> someParents
> > identifies a query that will match some of the parent documents. The
> output
> > is the children.
> >
> > The first sentence talks about "matching" but does not define what that
> > means (and why it is only some parents matching?). The second sentence
> > introduces a syntax of the parser, but blurs the understanding as "some"
> > and "all" of parents are combined into one sentence. My understanding is
> > that all documents are retrieve that satisfy a query. The query must
> > express some constraints on the parent node and some on the child node. I
> > have a feeling that "only parent documents" reads "criteria is formulated
> > over the parent part of {parent document}->{child document} of entity.
> > My simplified conceptual world of solr looks in the following way:
> >
> > 1. Every document has an ID.
> > 2. Every document may have additional attributes
> > 3. Text attributes is what's at stake in solr. Sure we can search for
> > products that costs at most X, but this is the added functionality. For
> > simplicity I am neglecting those here.
> > 4. The user has an information need. She expresses it with (key)words and
> > hopes to find matching documents. For simplicity, I am skipping all
> issues
> > related to the information presentation of the documents
> > 5. Analysis chain (and inverse index) are the key technologies solr is
> > based upon. Once the chain-processing is applied, mathematical logic
> kicks
> > in, retrieving the documents (that are a set of processed, normalized,
> > enriched tokens) matching the query (processed, normalized and enriched
> > tokens). Clearly, the logic function can be a fancy one (at least one of
> > query token is in the document set of tokens, etc.), ranking is used to
> > sort the results.
> > 6. A nested document concept is introduced in solr. It needs to be
> uploaded
> > into the index structure using a specific handlers [2]. A nested
> documents
> > is a tree. A root may contain children documents, which may be parents of
> > grandchildren documents.
> > 7. Querying nested documents is supported in the following manner:
> >     7.1 Child documents are return that satisfies {parent
> > document}->{document}
> >     7.2 Parent documents are return that satisfy {document}->{child
> > document}
> >
> > Would I be very wrong to have this conceptual picture?
> >
> > From this point, the situation is a bit bury in my head. At the core, I
> do
> > not really understand what "a document" is anymore (since the complete
> json
> > or xml, so is a sub-json and sub-xml are documents, every document must
> > have an ID, does that meant the the subdocuments must have and ID too, or
> > sub-ids are also fine?), how to formulate mathematical expressions over
> > documents and what it means that the document satisfies my (key)word
> query?
> > Can we define a document to be the largest entity of information that
> does
> > not contain any other nested documents [4]? If this is defined and
> > communicated like this already where can I find it? There is a use of the
> > clarification, as the concept of the document means different things in
> > different contexts (e.g., you can update only the "complete document" in
> > the index vs. parent document, etc.).
> >
> > Is it possible to formulate what's going on using mathematical logic? Can
> > one express something like
> >
> > { give documents d : d is a document, d is parent of document c, d
> > satisfies logical criteria C1,....,CN, c satisfies logical criteria
> > C1',...,CM'}
> > { give documents c : c is a document, d is parent of document c, d
> > satisfies logical criteria C1,....,CN, c satisfies logical criteria
> > C1',...,CM'}
> >
> > here the meaning of document is as in definition [4] above.
> >
> > 1. Is it possible to retrieve all parent documents that have two children
> > c1 and c2? Consider a document that is a skype chat, and children are
> > individual lines of communication in the chat. I would be looking for the
> > (parent) documents that have "hello" said by person A and "ciao" said by
> > person B (as two different sub-documents).
> >
>
> q=+{!parent which.. v='+text:hello +person:A'} +{!parent which..
> v='+text:ciao +person:B'}
> The query syntax is really tricky and cumbersome.
>
>
> >
> > 2. Is it possible to search for documents such that they have a
> grandchild
> > and the grandchild has the word "hello"?
> >
>
> http://blog-archive.griddynamics.com/2013/12/grandchildren-and-siblings-
> with-block.html
>
>
> >
> > 3. Is it possible to search for documents that do not have children?
> >
> q=-{!parent which..}type:child
> Beware that mixing parents and childfree products is not supported and
> causes pain. as a workaround you need to put empty child placeholder doc.
> Sic. Sorry.
>
>
> > Is this the right venue to discuss documentation of solr?
> >
> > Thanks!
> > Arturas
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message