lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ahammad <>
Subject Re: Issue with delta import (not finding data in a column)
Date Wed, 12 May 2010 13:46:25 GMT


I was doing some more testing but I could not find a definitive reason for
this behavior. The following is my transformer:

	public Map<String, Object> transformRow(Map<String, Object> row, Context
context) {
	    List<Map<String, String>> fields = context.getAllEntityFields();
	    for (Map<String, String> field : fields) 
            // Check if this field has blob="true" specified in the
            String blob = field.get("blob");
            if ("true".equals(blob))        
                String columnName = field.get("column");
                // Get the field's value from the current row
                Blob data = (Blob) row.get(columnName);
                // Transform the blob and store back into the same column
                if (data != null) {
                        row.put(columnName, process(data));
                	log.error("Blob is null.");
		return row;

Note: The function "process" is the function that actually takes care of the
whole transformation. 

What I noticed is that the "row" variable only has the ID, probably due to

deltaQuery="select ID from TABLE1 where (LASTMODIFIED >
to_date('${dataimporter.last_index_time}', 'yyyy-mm-dd HH24:MI:SS'))"

However, even if I change it to a "select * " statement, I get everything
except the column that contains the blob (it is returned as null).

Something tells me that the data-config may be incorrect. I cannot explain
how this works for full-imports and not delta-imports.

I hope that I explained this issue properly. I am really stuck on this. Any
help would be highly appreciated.

ahammad wrote:
> I have a Solr core that retrieves data from an Oracle DB. The DB table has
> a few columns, one of which is a Blob that represents a PDF document. In
> order to retrieve the actual content of the PDF file, I wrote a Blob
> transformer that converts the Blob into the PDF file, and subsequently
> reads it using PDFBox. The blob is contained in a DB column called
> DOCUMENT, and the data goes into a Solr field called fileContent, which is
> required.
> This works fine when doing full imports, but it fails for delta imports. I
> debugged my transformer, and it appears that when it attempts to fetch the
> blob stored in the column, it gets nothing back (i.e. null). Because the
> data is essentially null, it cannot retrieve anything, and cannot store
> anything into Solr. As a result, the document does not get imported. I am
> not sure what the problem is, because this only occurs with delta imports.
> Here is my data-config file:
> <dataConfig>
>     <dataSource driver="oracle.jdbc.driver.OracleDriver" url="address"
> user="user" password="pass"/>
>     <document name="table1">
>         <entity name="TABLE1" pk="ID" query="select * from TABLE1"
>             deltaImportQuery="select * from TABLE1 where ID
> ='${}'"
> 			deltaQuery="select ID from TABLE1 where (LASTMODIFIED >
> to_date('${dataimporter.last_index_time}', 'yyyy-mm-dd HH24:MI:SS'))"			
> 			transformer="BlobTransformer">
> 				<field column="ID" name="id" />
> 				<field column="TITLE" name="title" />
> 				<field column="FILENAME" name="filename" />
> 				<field column="DOCUMENT" name="fileContent" blob="true"/>
> 				<field column="LASTMODIFIED" name="lastModified" />
> 		</entity>
>     </document>
> </dataConfig>
> Thanks.

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message