lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Bugger" <>
Subject Re: Search in HTML code
Date Tue, 03 Oct 2006 14:49:10 GMT
My crawler indexing crawled pages with these code:
Document doc = new Document();
doc.add(new Field("body", page.getHtmlData(), Store.YES, Index.UN_TOKENIZED
doc.add(new Field("url", page.getUrl(), Store.YES, Index.UN_TOKENIZED));
doc.add(new Field("title", page.getTitle(), Store.YES, Index.TOKENIZED));
doc.add(new Field("id", Integer.toString(page.getId()), Store.YES, Index.NO
try {
catch (Exception e) {

I need to write application able to search through indexed pages' html code
using code patterns like:
<table width="100%" height="50" style="border: 1px solid red;">
This should match all documents with such code regardless of order of tag
Is it possible with lucene engine?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message