lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Grotzke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
Date Thu, 16 Jun 2011 14:13:47 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050435#comment-13050435
] 

Martin Grotzke commented on SOLR-2583:
--------------------------------------

bq. Are you sure real floats are actually needed?
In our case score values are e.g. 158870000 (one example just taken from one of the files).
With this sample this test fails:
{noformat}
byte small = SmallFloat.floatToByte315(104626500f);
assertEquals(104626500f, SmallFloat.byte315ToFloat(small), 0f);
-> AssertionError: expected:<1.04626496E8> but was:<1.00663296E8>
{noformat}

This shows that even we have a case where this will produce wrong results, and even if we
could fix this in our case there might be someone else with the same issue.


bq. it would also good to measure performance...
I'd not expect that the boxing makes a real difference here, especially in relation to the
rest of the time spent during a search request.
A time based performance comparison that has a real value would take some time, it would have
to put in relation to the rest of a search request (how do you do this?) and finally it would
require proper interpretation when everything is together. Right now I don't think it's worth
the effort.


{quote}
bq. that uses a fixed size and an increasing number of puts
I'm not certain how realistic that is, remember behind the scenes compactbytearray uses blocks,
and if you touch every one (by putting every K docid or something) then you are just testing
the worst case.
{quote}
Do you want to change the test to s.th. that's more realistic?


@Yonik: what do you say regarding the suggestion to use HashMap up to ~5.5% and above that
using the float[]?

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -------------------------------------------------------------------------
>
>                 Key: SOLR-2583
>                 URL: https://issues.apache.org/jira/browse/SOLR-2583
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Martin Grotzke
>            Priority: Minor
>         Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in the index.
The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource
is created per external scoring file. FileFloatSource creates a float array with the size
of the number of docs (this is also done if the file to load is not found). If there are much
less entries in the scoring file than there are number of docs in total the big float array
wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map contains as
many entries as there are scoring entries in the external file, but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message