lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Deleting Fields
Date Sat, 30 May 2015 18:48:30 GMT
Faceting on very high cardinality fields can use up memory, no doubt
about that. I think the entire delete question was a red herring, but
you know that already ;)....

So I think you can forget about the delete stuff. Although do note
that if you do re-index your old documents, the new version won't have
the field, and as segments are merged the deleted documents will have
all their resources reclaimed, effectively deleting the field from the
old docs.... So you could gradually re-index your corpus and get this
stuff out of there.

Best,
Erick

On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger
<joeo@lovehorsepower.com> wrote:
> Thank you Erick.  I was thinking that it actually went through and removed
> the index data; that you for the clarification.  What happened was I had
> some bad data that created a lot of fields (some 8000).  I was getting some
> errors adding new fields where solr could not talk to zookeeper, and I
> thought it may be because there are so many fields.  The index size is some
> 420million docs.
> I'm hesitant to try to re-create as when the shards crash, they leave a
> write.lock file in HDFS, and I need to manually delete that file (on 27
> machines) before bringing them back up.
> I believe this is the stack trace - but this looks to be related to facets,
> and I'm not 100% sure that this is the correct trace!  Sorry - I if it
> happens again I will update.
>
> ERROR - 2015-05-29 20:39:34.707; [UNCLASS shard9 core_node14 UNCLASS]
> org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
> java.lang.OutOfMemoryError: unable to create new native thread
>         at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
>         at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>         at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>         at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>         at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>         at org.eclipse.jetty.server.Server.handle(Server.java:368)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>         at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
>         at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>         at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>         at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:714)
>         at
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
>         at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
>         at
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:637)
>         at
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:280)
>         at
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:106)
>         at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
>         at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
>         ... 26 more
>
> Then later:
>
> ERROR - 2015-05-29 21:57:22.370; [UNCLASS shard9 core_node14 UNCLASS]
> org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
> java.lang.OutOfMemoryError: Java heap space
>         at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
>         at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>         at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>         at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>         at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>         at org.eclipse.jetty.server.Server.handle(Server.java:368)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>         at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
>         at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>         at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>         at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>
>
> -Joe
>
>
> On 5/30/2015 12:32 AM, Erick Erickson wrote:
>>
>> Yes, but deleting fields from the schema only means that _future_
>> documents will throw an "undefined field" error. All the documents
>> currently in the index will retain that field.
>>
>> Why you're hitting an OOM is a mystery though. But delete field isn't
>> removing the contents if indexed documents. Showing us the full stack
>> when you hit an OOM would be helpful.
>>
>> Best,
>> Erick
>>
>> On Fri, May 29, 2015 at 4:58 PM, Joseph Obernberger
>> <joeo@lovehorsepower.com> wrote:
>>>
>>> Thank you Shawn - I'm referring to fields in the schema.  With Solr 5,
>>> you
>>> can delete fields from the schema.
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField
>>>
>>> -Joe
>>>
>>>
>>> On 5/29/2015 7:30 PM, Shawn Heisey wrote:
>>>>
>>>> On 5/29/2015 5:08 PM, Joseph Obernberger wrote:
>>>>>
>>>>> Hi All - I have a lot of fields to delete, but noticed that once I
>>>>> started deleting them, I quickly ran out of heap space.  Is
>>>>> delete-field a memory intensive operation?  Should I delete one field,
>>>>> wait a while, then delete the next?
>>>>
>>>> I'm not aware of a way to delete a field.  I may have a different
>>>> definition of what a field is than you do, though.
>>>>
>>>> Solr lets you delete entire documents, but deleting a field from the
>>>> entire index would involve re-indexing every document in the index,
>>>> excluding that field.
>>>>
>>>> Can you be more specific about exactly what you are doing, what you are
>>>> seeing, and what you want to see instead?
>>>>
>>>> Also, please be aware of this:
>>>>
>>>> http://people.apache.org/~hossman/#threadhijack
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>

Mime
View raw message