lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <javam...@gmail.com>
Subject Re: indexing java byte code in classes / jars
Date Fri, 08 May 2015 21:27:09 GMT
Erik,

Thanks for the pretty much OOTB approach.

I think I'm going to just try a range of approaches, and see how far I get.

The "IDE does this suggestion" would be worth looking into as well.




On 8 May 2015 at 22:14, Mark <javamark@gmail.com> wrote:

>
> https://searchcode.com/
>
> looks really interesting, however I want to crunch as much searchable
> aspects out of jars sititng on a classpath or under a project structure...
>
> Really early days so I'm open to any suggestions
>
>
>
> On 8 May 2015 at 22:09, Mark <javamark@gmail.com> wrote:
>
>> To answer why bytecode - because mostly the use case I have is looking to
>> index as much detail from jars/classes.
>>
>> extract class names,
>> method names
>> signatures
>> packages / imports
>>
>> I am considering using ASM in order to generate an analysis view of the
>> class
>>
>> The sort of usecases I have would be method / signature searches.
>>
>> For example;
>>
>> 1) show any classes with a method named parse*
>>
>> 2) show any classes with a method named parse that passes in a type *json*
>>
>> ...etc
>>
>> In the past I have written something to reverse out javadocs from just
>> java bytecode, using solr would move this idea considerably much more
>> powerful.
>>
>> Thanks for the suggestions so far
>>
>>
>>
>>
>>
>>
>>
>> On 8 May 2015 at 21:19, Erik Hatcher <erik.hatcher@gmail.com> wrote:
>>
>>> Oh, and sorry, I omitted a couple of details:
>>>
>>> # creating the “java” core/collection
>>> bin/solr create -c java
>>>
>>> # I ran this from my Solr source code checkout, so that
>>> SolrLogFormatter.class just happened to be handy
>>>
>>>         Erik
>>>
>>>
>>>
>>>
>>> > On May 8, 2015, at 4:11 PM, Erik Hatcher <erik.hatcher@gmail.com>
>>> wrote:
>>> >
>>> > What kinds of searches do you want to run?  Are you trying to extract
>>> class names, method names, and such and make those searchable?   If that’s
>>> the case, you need some kind of “parser” to reverse engineer that
>>> information from .class and .jar files before feeding it to Solr, which
>>> would happen before analysis.   Java itself comes with a javap command that
>>> can do this; whether this is the “best” way to go for your scenario I don’t
>>> know, but here’s an interesting example pasted below (using Solr 5.x).
>>> >
>>> > —
>>> > Erik Hatcher, Senior Solutions Architect
>>> > http://www.lucidworks.com
>>> >
>>> >
>>> > javap
>>> build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class >
>>> test.txt
>>> > bin/post -c java test.txt
>>> >
>>> > now search for "coreInfoMap"
>>> http://localhost:8983/solr/java/browse?q=coreInfoMap
>>> >
>>> > I tried to be cleverer and use the stdin option of bin/post, like this:
>>> > javap
>>> build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class |
>>> bin/post -c java -url http://localhost:8983/solr/java/update/extract
>>> -type text/plain -params "literal.id=SolrLogFormatter" -out yes -d
>>> > but something isn’t working right with the stdin detection like that
>>> (it does work to `cat test.txt | bin/post…` though, hmmm)
>>> >
>>> > test.txt looks like this, `cat test.txt`:
>>> > Compiled from "SolrLogFormatter.java"
>>> > public class org.apache.solr.SolrLogFormatter extends
>>> java.util.logging.Formatter {
>>> >  long startTime;
>>> >  long lastTime;
>>> >  java.util.Map<org.apache.solr.SolrLogFormatter$Method,
>>> java.lang.String> methodAlias;
>>> >  public boolean shorterFormat;
>>> >  java.util.Map<org.apache.solr.core.SolrCore,
>>> org.apache.solr.SolrLogFormatter$CoreInfo> coreInfoMap;
>>> >  public java.util.Map<java.lang.String, java.lang.String> classAliases;
>>> >  static java.lang.ThreadLocal<java.lang.String> threadLocal;
>>> >  public org.apache.solr.SolrLogFormatter();
>>> >  public void setShorterFormat();
>>> >  public java.lang.String format(java.util.logging.LogRecord);
>>> >  public void appendThread(java.lang.StringBuilder,
>>> java.util.logging.LogRecord);
>>> >  public java.lang.String _format(java.util.logging.LogRecord);
>>> >  public java.lang.String getHead(java.util.logging.Handler);
>>> >  public java.lang.String getTail(java.util.logging.Handler);
>>> >  public java.lang.String formatMessage(java.util.logging.LogRecord);
>>> >  public static void main(java.lang.String[]) throws
>>> java.lang.Exception;
>>> >  public static void go() throws java.lang.Exception;
>>> >  static {};
>>> > }
>>> >
>>> >> On May 8, 2015, at 3:31 PM, Mark <javamark@gmail.com> wrote:
>>> >>
>>> >> I looking to use Solr search over the byte code in Classes and Jars.
>>> >>
>>> >> Does anyone know or have experience of Analyzers, Tokenizers, and
>>> Token
>>> >> Filters for such a task?
>>> >>
>>> >> Regards
>>> >>
>>> >> Mark
>>> >
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message