lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Multivalued field
Date Tue, 06 Dec 2011 19:06:34 GMT
<field name="id" type="string" stored="true" indexed="true" required="true" />
<field name="data" type="text_en" stored="true" indexed="false" />


Then sometime later
<uniqueKey>id</uniqueKey>

(all this in your schema.xml file).

That's it. The data field isn't analyzed at all, so the type is largely
irrelevant. what you put in it is all your pairs of doubles in some
kind of delimited format, e.g. 2345.0,<timestamp> | 873945.7,<timestamp>
Now you just get your data field back, split it up and go.

Getting the report document will be about as fast as anything you could
do in Solr, lookup by what is essentially the primary key.

Updating your reports is just re-indexing (use the timestamp in your
DB) and it'll automatically replace documents with the same
id.

You *might* be able to use the "binary" type, but that's base64 encoded
so whether it would be faster than parsing your pairs from text
is an open question.

But what's really unclear is how ginormous your double/timestamp pairs
are. If you're pulling a billion pairs out, Solr performance won't be
your problem <G>....

Best
Erick


On Mon, Dec 5, 2011 at 2:24 PM, Alan Miller <alan.miller3@gmail.com> wrote:
>
> I know I'm using SolR for a task that is better suited for the DB to handle but I'm
> doing this for reasons related to the overall design of my system. My DB is going to
> become very large over time and it is constantly being  updated via Hadoop jobs that
> collect,analyze some data and generate the final (report) results.
>
> The front end web-app needs to be VERY fast and only needs access to a subset of the
data.
> It also let's us decouple the state of the DB and the front end, ie we can control when
we sync
> the data from the DB to the SolR indexes.
> You could say I'm using SolR as an in memory cache of my DB indexes.
>
> We're also a small team and all our development is in java Hadoop, GWT so it was very
> easy for us to integrate SolR and Solrj into our app.
>
> If somebody could toss in an example of what the scheme might look like that'd be great.
> I have a very simple VALUE table that has columns:
>     value_pk INTEGER  ; primary-key
>     report_fk INT ; foreign-key to report table
>     tstamp TIMESTAMP
>     value NUMERIC(7,4)
>
> Alan
>
> On Dec 5, 2011, at 14:34, Erick Erickson <erickerickson@gmail.com> wrote:
>
>> Well, Solr is a text search engine, and a good one. But this sure
>> feels like a problem that RDBMSs were built to handle. Why do
>> you want to do this? Is your current performance a problem?
>> Are you blowing your space resources out of the water? Do you
>> want to distribute your app to places not connected to your RDBMS?
>> Is there too much traffic on your RDBMS machine?
>>
>> Something about "if it ain't broke, don't fix it".
>>
>> In general, you have to tell us the problem you're trying to solve
>> so we don't go off into XY land.
>> http://people.apache.org/~hossman/#xyproblem
>>
>> Best
>> Erick
>>
>> On Fri, Dec 2, 2011 at 1:33 PM, Alan Miller <alan.miller3@gmail.com> wrote:
>>> Hi I have a webapp that plots a bunch of time series
>>> Data which are just doubles coupled with a timestamp
>>>
>>> Every chart in my webapp has a reportid in my db and i am wondering if it would
be effective to usr solr to serve the data th my app instead of keeping the data in my rdbms.
>>>
>>> Currently im using hadoop to calc and generate the report data and the sticking
it in my rdbms but i could use solrj client to upload the data to a solr index
>>>
>>> I know solr if for indexing text documents but would it be effective to use solr
in this way?
>>>
>>> I want to query by reportid and get back a series of timestamp:double pairs.
>>>
>>> Regards
>>> Alan

Mime
View raw message