lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Command Line Indexer
Date Tue, 18 Sep 2018 21:16:00 GMT
Uhm, inline:

On 18 September 2018 at 17:05, Dan Brown <dan@likethecolor.com> wrote:
> 1. Thank you.
>
> 2. I think this is what you're looking for.  You'd be able to be more
> specific than with bin/post.  For instance:
> a. specify the CSV delimiter, CSV quote character, and multivalued field
> delimiter
http://lucene.apache.org/solr/guide/7_4/uploading-data-with-index-handlers.html
separator - (global and field local for multivalued)
encapsulator - for CSV quote characters

> b. the dynamic-fields feature let's you write plugins in Java to define
> values (very simple example: combine field values f_name, m_name, l_name to
> populate a full_name field)
UpdateRequestProcessors. Your example specifically:

> c. specify field order for mapping onto SOLR fields, data types, date
> formats of source data; perhaps your CSV headers/JSON keys don't cleanly
> map to SOLR field names
> d. flag whether the first row of a CSV is the header and should not be
> indexed
> e. use literal values - e.g., instead of having to alter the source data to
> have a column whose value is "foo" you can configure a field to always have
> the same literal value for all documents
> f. set the number of times to retry when there is an error and the amount
> of time between retries (e.g., sometimes zk was not consistently responsive)
> g. skip fields - e.g., your data have 10 columns but you only want to index
> columns 1, 3, 5, and 9
> h. send soft commits after a specified number of batches
> i. combine fields to generate the uniqueKey value
>
> 3. Yes, atomic updates.  For instance, index data using DIH then use this
> index to provide additional values to fields in those documents (e.g.,
> maybe the extra data come from a different data source like BigQuery).
>
> I hope this brings more clarity to this tool's features and answers all
> your questions.  Please ask questions if anyone has more.
>
> Dan
>
>
> On Tue, Sep 18, 2018 at 3:21 PM Christopher Schultz <
> chris@christopherschultz.net> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Dan,
>>
>> On 9/18/18 2:51 PM, Dan Brown wrote:
>> > I've been working on this for a while and it's finally in a state
>> > where it's ready for public consumption.
>> >
>> > This is a command line indexer that will index CSV or JSON
>> > documents: https://github.com/likethecolor/solr-indexer
>> >
>> > There are quite a few parameters/options that can be set.
>> >
>> > One thing to note is that it will update individual fields.  That
>> > is, unlike the Data Import Handler, it does not replace entire
>> > documents.
>> >
>> > Please check it out and let me know what you think.
>>
>> How is this different from the bin/post tool that ships with Solr?
>>
>> Or is that you meant when you said "this is unlike the Data Import
>> Handler".
>>
>> AIUI, Solr doesn't support updating a single field in a document. The
>> document is replaced no matter how hard to try to be surgical about
>> updating a single field.
>>
>> - -chris
>> -----BEGIN PGP SIGNATURE-----
>> Comment: GPGTools - http://gpgtools.org
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>>
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluhXlYACgkQHPApP6U8
>> pFjIeQ/+PRIx+I+IDW9XTqGNV5TIWYf+yQKC/4JpTV4Ndj7MZLsEEw+cfMvFTvQt
>> 44dK7CnDKEDgQHZlMccWKd9/Th1k/5g40VMugBMsayRwUc83Onawdi4HQfnig4et
>> VN0/RaZ/IBo2AThsgEvUNplXYyY3BtyrUt6miiBsVkhKstI/BnmKqZvsRgvVjH0P
>> K1Xc5F2LNyXswvoIZqd3YmEa9p7CYMy7COsFV9KOeSymKlB7UoHulZqpJ9MRYkmn
>> YWjc9dHIRjpz5TUrJqWhZUG03uGXGtTnaXEku1Hb98WyIUZcHxkwN8W7qm6/B0CG
>> inPxfGRFH9EbUdcK4qeXmbQqty2sbKMQ6hogpRd/NEzgSWjDapiEUT1xz+p5V6wG
>> XM0ILaiLJ8zHJA6oUY0w5SNNyhdnd76CDpCK7T7YBm+aIxUDv9zoj6TLNceEaLi0
>> SjfI83LvaR1gM/ZeVO77d+1IY9maU1+5m0EZFjAETfMGj5dwYRvBub0Oo6QQuLUm
>> roF5R5b/bg/WjjPF1n4CJ7gTr/WBMzahKFnnQvoYD3OQqZpoasoEUifPpSd9OgvO
>> yEok0VqwxPeXdHgE+Vy+BlXn6QqshB3BYnUSNbpFXlNsOIQojfJXkjcCa+dP1nyF
>> JCElvmEgBG8K1WzGo4WAtVqJs7WDzQlmY2RDrETGsVbnqkTojXA=
>> =AmkJ
>> -----END PGP SIGNATURE-----
>>

Mime
View raw message