lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bram Biesbrouck <bram.biesbro...@reinvention.be>
Subject Question regarding negated block join queries
Date Mon, 17 Jun 2019 10:46:16 GMT
Dear all,

I'm new to this list, so let me introduce myself. I'm Bram, author of a
linked data framework called Stralo. We're working toward version 1.0, in
which we're integrating Solr indexing and querying of RDF triples (
https://github.com/republic-of-reinvention/com.stralo.framework/milestone/3)

I'm running to inconsistent results regarding block join queries and I
wondered if any of you could help me out. We're indexing our parent-child
relationships using a field called "parentUri". The field contains the URI
(the id of the document) of the parent document, is just omitted when the
document itself if a parent.

Here's an example of a child document:

{
        "language":"en",
        "resource":"/resource/1130494009577889453",
        "parentUri":"/en/blah",
        "uri":"/resource/1130494009577889453",
        "label":"Label of the object",
        "description":"Example of some sub text",
        "typeOf":"ror:Page",
        "rdf:type":["ror:Page"],
        "rdfs:label":["Label of the object"],
        "ror:text":["Example of some sub text"],
        "ror:testNumber":[4],
        "ror:testDate":["2019-05-10T00:00:00Z"],
        "_version_":1636582287436939264
}

(Please ignore the CURIE syntax we're using as field names. We know it's
slightly illegal in Solr, but it works just fine and it makes our lives
indexing tripes so much more convenient)

Here's it's parent document:

{
        "language":"en",
        "resource":"/resource/1106177060466942658",
        "uri":"/en/blah",
        "label":"rdfs label test 3",
        "description":"Hi, we are the Republic \n        we do video
technology",
        "typeOf":"ror:BlogPost",
        "rdf:type":["ror:BlogPost"],
        "rdfs:label":["rdfs label test 3"],
        "meta:created":["2019-04-04T09:08:35.736Z"],
        "meta:creator":["/users/2"],
        "meta:modified":["2019-06-17T10:14:54.134Z"],
        "meta:contributor":["/users/2",
          "/users/1"],
        "ror:testEditor":["Blah, dit is inhoud van test editor"],
        "ror:testEnum":["af"],
        "ror:testDate":["2019-05-31T00:00:00Z"],
        "ror:testResource":["/resource/Page/800895161299715471"],
        "ror:testObject":["/resource/1130494009577889453"],
        "ror:text":["Hi, we are the Republic we do video technology"],
        "_version_":1636582287436939264
}

As said, we're struggling with block joins, because we don't have a clear
field that contains "this" for parent documents and "that" for child
documents. Instead, it's omitted for parent documents. So, to fire a block
join child query, we use this approach (just an example):

q={!parent which=-(parentUri:*)}*:*

What we expect is that the allParents filter selects all those documents
where the "parentUri" field doesn't exist using a negated wildcard query
(which works just fine when used alone). The someParents fitler just
selects everything since this is an example. Alas, this doesn't yield any
results.

Since the docs say:
When subordinate clause (<someParents>) is omitted, it’s parsed as a
segmented and cached filter for children documents. More precisely,
q={!child of=<allParents>} is equivalent to q=*:* -<allParents>.

I tried to run this query (assuming a double negation becomes a plus):

*:* +(parentUri:*)

And this yields correct results, so I'm assuming it's possible, but I'm
overlooking something in my block join children query syntax.

Could anyone put me in the right direction to use block join queries with
non-existent or existent fields?

all the best,

b.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message