lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Caserta <pierre.case...@gmail.com>
Subject Re: DataImportHandler with a managed-schema only import id and version
Date Wed, 10 Aug 2016 08:28:26 GMT
It did not work,
I tried many things and ended up trying this:

  <requestHandler name="/dataimport" initParams="myInitParams" class="solr.DataImportHandler">
      <lst name="defaults">
        <str name="config">solr-data-config.xml</str>
      </lst>
  </requestHandler>
  <initParams name="myInitParams" path="/update/**,/dataimport">
    <lst name="defaults">
      <str name="update.chain">add-unknown-fields-to-the-schema</str>
    </lst>
  </initParams>

Regards,
Pierre

> On 10 Aug 2016, at 18:08, Alexandre Rafalovitch <arafalov@gmail.com> wrote:
> 
> Your initParams section does not apply to /dataimport handler as
> defined. Try modifying it to say:
> path="/update/**,/dataimport"
> 
> Hopefully, that's all that takes.
> 
> Managed schema is enabled by default, but schemaless mode is the next
> layer on top. With managed schema, you can use the API to add your
> fields (or new Admin UI in the Schema screen). With schemaless mode,
> it tries to guess the field type as it adds it automatically.
> 
> 
> Regards,
>    Alex.
> 
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
> On 10 August 2016 at 18:04, Pierre Caserta <pierre.caserta@gmail.com> wrote:
>> Hi Alex,
>> thanks for your answer.
>> 
>> Yes my solrconfig.xml contains the add-unknown-fields-to-the-schema.
>> 
>>  <initParams path="/update/**">
>>    <lst name="defaults">
>>      <str name="update.chain">add-unknown-fields-to-the-schema</str>
>>    </lst>
>>  </initParams>
>> 
>> I created my core using this command:
>> 
>> curl http://192.168.99.100:8999/solr/admin/cores?action=CREATE&name=solrexchange&instanceDir=/opt/solr/server/solr/solrexchange&configSet=data_driven_schema_configs_custom
>> 
>> I am using the example configset data_driven_schema_configs and I simply added:
>> 
>>  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar"
/>
>>  <requestHandler name="/dataimport" class="solr.DataImportHandler">
>>      <lst name="defaults">
>>        <str name="config">data-config.xml</str>
>>      </lst>
>>  </requestHandler>
>> 
>> I thought the schemaless mode was enable by default but I also tried adding this
config but I get the same result.
>> 
>>  <schemaFactory class="ManagedIndexSchemaFactory">
>>    <bool name="mutable">true</bool>
>>    <str name="managedSchemaResourceName">managed-schema</str>
>>  </schemaFactory>
>> 
>> How can I update my schemaless URP chain and add the parameter to call it to DIH?
>> 
>> 
>>> On 10 Aug 2016, at 17:43, Alexandre Rafalovitch <arafalov@gmail.com> wrote:
>>> 
>>> Do you have the actual fields defined? If not, then I am guessing that
>>> your 'post' test was against a different collection that had
>>> schemaless mode enabled and your DIH one is against one where
>>> schemaless mode is not enabled (look for
>>> 'add-unknown-fields-to-the-schema' in the solrconfig.xml to confirm).
>>> Solr examples for DIH do not have schemaless mode enabled.
>>> 
>>> I _believe_ you can copy the schemaless URP chain and add the
>>> parameter to call it to DIH handler and it _should_ work. But I am not
>>> betting on it without testing it, as DIH also has some magic code to
>>> ignore fields not defined in schema because it is designed to work
>>> with only extracting relevant fields from the database even with
>>> 'select *' statement.
>>> 
>>> 
>>> Regards,
>>>  Alex.
>>> ----
>>> Newsletter and resources for Solr beginners and intermediates:
>>> http://www.solr-start.com/
>>> 
>>> 
>>> On 10 August 2016 at 17:12, Pierre Caserta <pierre.caserta@gmail.com> wrote:
>>>> Hi,
>>>> It seems that using the DataImportHandler with a XPathEntityProcessor config
>>>> with a managed-schema setup, only import the id and version field.
>>>> 
>>>> data-config.xml
>>>> 
>>>> <dataConfig>
>>>>   <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>   <document>
>>>>       <entity name="post"
>>>>           processor="XPathEntityProcessor"
>>>>           stream="true"
>>>>           forEach="/posts/row/"
>>>>           url="${dataimporter.request.dataurl}"
>>>> 
>>>> transformer="RegexTransformer,DateFormatTransformer,HTMLStripTransformer"
>>>>> 
>>>>           <field column="id"        xpath="/posts/row/@Id" />
>>>>           <field column="postTypeId"     xpath="/posts/row/@PostTypeId"
/>
>>>>           <field column="acceptedAnswerId"
>>>> xpath="/posts/row/@AcceptedAnswerId" />
>>>>           <field column="creationDate" xpath="/posts/row/@CreationDate"
>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" />
>>>>           <field column="postScore"  xpath="/posts/row/@Score" />
>>>>           <field column="viewCount"  xpath="/posts/row/@ViewCount" />
>>>>           <field column="body"  xpath="/posts/row/@Body" stripHTML="true"
>>>> />
>>>>           <field column="ownerUserId"  xpath="/posts/row/@OwnerUserId"
/>
>>>>           <field column="lastEditorUserId"
>>>> xpath="/posts/row/@LastEditorUserId" />
>>>>           <field column="lastEditorDisplayName"
>>>> xpath="/posts/row/@LastEditorDisplayName" />
>>>>           <field column="lastActivityDate"
>>>> xpath="/posts/row/@LastActivityDate"
>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" />
>>>>           <field column="title"  xpath="/posts/row/@Title" />
>>>>           <field column="trimmedTags" xpath="/posts/row/@Tags"
>>>> regex="&lt;(.*)&gt;" />
>>>>           <field column="tags" sourceColName="trimmedTags"
>>>> splitBy="&gt;&lt;" />
>>>>           <field column="answerCount"  xpath="/posts/row/@AnswerCount"
/>
>>>>           <field column="commentCount"  xpath="/posts/row/@CommentCount"
>>>> />
>>>>           <field column="favoriteCount"  xpath="/posts/row/@FavoriteCount"
>>>> />
>>>>           <field column="communityOwnedDate"
>>>> xpath="/posts/row/@CommunityOwnedDate"
>>>> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss.SSS" />
>>>>       </entity>
>>>>   </document>
>>>> </dataConfig>
>>>> 
>>>> 
>>>> http://192.168.99.100:8999/solr/solrexchange/select?indent=on&q=*:*&wt=json
>>>> {
>>>> "responseHeader":{
>>>>   "status":0,
>>>>   "QTime":0,
>>>>   "params":{
>>>>     "q":"*:*",
>>>>     "indent":"on",
>>>>     "wt":"json",
>>>>     "_":"1470811193595"}},
>>>> "response":{"numFound":8,"start":0,"docs":[
>>>>     {
>>>>       "id":"38822",
>>>>       "_version_":1542258196375142400},
>>>>     {
>>>>       "id":"38836",
>>>>       "_version_":1542258196387725312},
>>>>     {
>>>>       "id":"63896",
>>>>       "_version_":1542258196388773888},
>>>>     {
>>>>       "id":"65406",
>>>>       "_version_":1542258196391919616},
>>>>     {
>>>>       "id":"1357173",
>>>>       "_version_":1542258196391919617},
>>>>     {
>>>>       "id":"5339763",
>>>>       "_version_":1542258196392968192},
>>>>     {
>>>>       "id":"9932722",
>>>>       "_version_":1542258196392968193},
>>>>     {
>>>>       "id":"9217299",
>>>>       "_version_":1542258196392968194}]
>>>> }}
>>>> 
>>>> data_search.xml (8 rows)
>>>> 
>>>> 
>>>> 
>>>> the url I am hitting (with custom dataurl parameter)
>>>> 
>>>> curl
>>>> 'http://192.168.99.100:8999/solr/solrexchange/dataimport?command=full-import&commit=true&dataurl=/code/solr/data/search/dih/data_search.xml'
>>>> 
>>>> I changed my data to use <add> <doc> <field> and use the
bin/post tool and
>>>> this is working as expected.
>>>> Now I am interested to make it work with the DataImportHandler.
>>>> How can I use the DataImportHandler to import my document ?
>>>> 
>>>> Thanks,
>>>> Pierre Caserta
>>>> 
>>>> 
>> 


Mime
View raw message