lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: MySQL data import
Date Mon, 12 Dec 2011 18:34:42 GMT
You might want to consider just doing the whole
thing in SolrJ with a JDBC connection. When things
get complex, it's sometimes more straightforward.

Best
Erick...

P.S. Yes, it's pretty standard to have a single
field be the destination for several copyField
directives.

On Mon, Dec 12, 2011 at 12:48 PM, Gora Mohanty <gora@mimirtech.com> wrote:
> On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb
> <brian.lamb@journalexperts.com> wrote:
>> Hi all,
>>
>> I have a few questions about how the MySQL data import works. It seems it
>> creates a separate connection for each entity I create. Is there any way to
>> avoid this?
>
> Not sure, but I do not think that it is possible. However, from your description
> below, I think that you are unnecessarily multiplying entities.
>
>> By nature of my schema, I have several multivalued fields. Each one I
>> populate with a separate entity. Is there a better way to do it? For
>> example, could I pull in all the singular data in one sitting and then come
>> back in later and populate with the multivalued items.
>
> Not quite sure as to what you mean. Would it be possible for you
> to post your schema.xml, and the DIH configuration file? Preferably,
> put these on pastebin.com, and send us links. Also, you should
> obfuscate details like access passwords.
>
>> An alternate approach in some cases would be to do a GROUP_CONCAT and then
>> populate the multivalued column with some transformation. Is that possible?
> [...]
>
> This is how we have been handling it. A complete description would
> be long, but here is the gist of it:
> * A transformer will be needed. In this case, we found it easiest
>  to use a Java-based transformer. Thus, your entity should include
>  something like
>  <entity name="myname" dataSource="mysource"
> transformer="com.mycompany.search.solr.handler.JobsNumericTransformer...>
>  ...
>  </entity>
>  Here, the class name to be used for the transformer attribute follows
>  the usual Java rules, and the .jar needs to be made available to Solr.
> * The SELECT statement for the entity looks something like
>  select group_concat( myfield SEPARATOR '@||@')...
>  The separator should be something that does not occur in your
>  normal data stream.
> * Within the entity, define
>   <field column="myfield"/>
> * There are complications involved if NULL values are allowed
>   for the field, in which case you would need to use COALESCE,
>   maybe along with CAST
> * The transformer would look up "myfield", split along the separator,
>   and populate the multi-valued field.
>
> This *is* a little complicated, so I would also like to hear about
> possible alternatives.
>
> Regards,
> Gora

Mime
View raw message