lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reitzel, Charles" <>
Subject RE: indexing java byte code in classes / jars
Date Fri, 08 May 2015 20:57:49 GMT
There are a number of reverse compilers for Java.   Some are quite good and very detailed,
so long as the byte code has not been deliberately obfuscated.   Of course the original sources
would be better for picking up comments.   But, then you'd need a java parser (the compiler
front end), of which there are a few available as well.

Hmm, this looks interesting ...

-----Original Message-----
From: Erik Hatcher [] 
Sent: Friday, May 08, 2015 4:19 PM
Subject: Re: indexing java byte code in classes / jars

Oh, and sorry, I omitted a couple of details:

# creating the “java” core/collection
bin/solr create -c java 

# I ran this from my Solr source code checkout, so that SolrLogFormatter.class just happened
to be handy


> On May 8, 2015, at 4:11 PM, Erik Hatcher <> wrote:
> What kinds of searches do you want to run?  Are you trying to extract class names, method
names, and such and make those searchable?   If that’s the case, you need some kind of “parser”
to reverse engineer that information from .class and .jar files before feeding it to Solr,
which would happen before analysis.   Java itself comes with a javap command that can do this;
whether this is the “best” way to go for your scenario I don’t know, but here’s an
interesting example pasted below (using Solr 5.x).
> —
> Erik Hatcher, Senior Solutions Architect
> javap 
> build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class > 
> test.txt bin/post -c java test.txt
> now search for "coreInfoMap" 
> http://localhost:8983/solr/java/browse?q=coreInfoMap
> I tried to be cleverer and use the stdin option of bin/post, like this: 
> javap 
> build/solr-core/classes/java/org/apache/solr/SolrLogFormatter.class | 
> bin/post -c java -url http://localhost:8983/solr/java/update/extract 
> -type text/plain -params "" -out yes -d but 
> something isn’t working right with the stdin detection like that (it 
> does work to `cat test.txt | bin/post…` though, hmmm)
> test.txt looks like this, `cat test.txt`:
> Compiled from ""
> public class org.apache.solr.SolrLogFormatter extends 
> java.util.logging.Formatter {  long startTime;  long lastTime;  
> java.util.Map<org.apache.solr.SolrLogFormatter$Method, 
> java.lang.String> methodAlias;  public boolean shorterFormat;  
> java.util.Map<org.apache.solr.core.SolrCore, 
> org.apache.solr.SolrLogFormatter$CoreInfo> coreInfoMap;  public 
> java.util.Map<java.lang.String, java.lang.String> classAliases;  
> static java.lang.ThreadLocal<java.lang.String> threadLocal;  public 
> org.apache.solr.SolrLogFormatter();
>  public void setShorterFormat();
>  public java.lang.String format(java.util.logging.LogRecord);
>  public void appendThread(java.lang.StringBuilder, 
> java.util.logging.LogRecord);  public java.lang.String 
> _format(java.util.logging.LogRecord);
>  public java.lang.String getHead(java.util.logging.Handler);
>  public java.lang.String getTail(java.util.logging.Handler);
>  public java.lang.String formatMessage(java.util.logging.LogRecord);
>  public static void main(java.lang.String[]) throws 
> java.lang.Exception;  public static void go() throws 
> java.lang.Exception;  static {}; }
>> On May 8, 2015, at 3:31 PM, Mark <> wrote:
>> I looking to use Solr search over the byte code in Classes and Jars.
>> Does anyone know or have experience of Analyzers, Tokenizers, and 
>> Token Filters for such a task?
>> Regards
>> Mark

This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete

View raw message