lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel Alfonso "Velázquez" "Díaz" <>
Subject Re: Regarding Setup Lucine for my site
Date Wed, 05 Mar 2003 16:28:41 GMT

Please point me to the web link to read more about lucene, I have read all the documentation
with the distribution (which is all most the same as the site). 
About the problem you mentioned about URL to file mapings, what about if I issue a code line
  myurl = URLEncode.encode(myurl); 
wouldn't that solve posibly malformed URLs at the web app level?
* On the other hand I'm using org.apache.lucene.demo.IndexHTML wich was provided with the
documentation. Is there any problem using this demo class for a web production site?
I'm an application developer and it would be hard to understand the hole lucene code to use
it. It would be almost imposible for my develop phase timings to try to do this.
* Regarding you comment: Lucene does not index web pages. I thougth lucene main goal was to
index web pages ¿? and as an after thougth it should be able to index text files or some other
information (for example mail databases).
Regards and thanks for your comments!!!!!!!
I'm considering egothor search engine. I succesfully set a web application for searching my
web site but I didn't see a mailing list or a forum with the level of participation like lucene.
Greatings to every one!!
 Otis Gospodnetic <> wrote:Samuel,

Some basic understanding of what Lucene is what is missing here.
Lucene does not index web pages.
Lucene indexes text.
Lucene is not automatically aware of your wb site nor your domain.
Lucene is aware only of what you 'feed it' at index time.
If you index files, which IndexDemo does, Lucene index will have only
information about files (information such as file path). Lucene has no
clue that you really want to index your web site.
Even if you could replace C:\..... with http://.... it wouldn't be a
good solution, as directory structures and file paths do not always map
directly to URLs.

In short, you have a bit more reading to do :)
The information is all there, it just has to be read :(
Good luck!


--- Samuel Alfonso Velázquez Díaz wrote:
> Yes I have
> 1.- The directory with the files to index:
> C:/filesToIndex/www/
> 2.- A path where the index files from the search engine will be
> created, lets say
> C:/index/
> 3.- I have an internet domain whose name is:
> 4.- A web application context that runs at
> Once I have set all the above things I want to be able to use the
> search aplication:
> And I dont want that the results that I get from the index (step 2)
> give me results like
> Your file is at
> C:/filesToIndex/www/some_html/my_doc.html
> The results should be:
> Your file is at
> For the comments I have read (THANK YOU VERY MUTCH) I conclude that
> there is no way to generate the index with some custom prefix (as
> for the documents at C:/filesToIndex/www/).
> It seems that I have to modify my web application
> ( to include some logic to
> repalce "C:/filesToIndex/www/" to "".
> If you could point me to the source code of lucene to include this
> logic and this way fix it once and for all, will appreciate a lot.
> The command I used to generate this index was:
> java org.apache.lucene.demo.IndexHTML -create -index index C:\index
> C:\filesToIndex\ www\
> Now in the web application I have to modify 
> IndexSearcher searcher;
> Query query; 
> Hits hits; 
> // some code after...
> hits =; 
> for ( /* search through the hit list*/)
> Document doc = hits.doc(i); 
> String doctitle = doc.get("title");
> String url = doc.get("url"); 
> I have to do some thing like url = "" +
> url.substring("C:/filesToIndex/www/".length);
> Regards!!!
> And thanks again
> Pinky Iyer 
> I dont understand the explanantion. When I try and index the
> documents as mentioned in the examples, and then when i run the app
> and do a sample search, it does point to the directory structure say
> "c:/filesToIndex/www/" instead of "http://localhost:8080/www/". So
> how can this be changed to reflect the website domain as mentioned by
> you. Could you explain again. Say my docs are under a directory
> c:/filesToIndex/www/ and the wesite is as you said
> http://localhost:8080/ , then how to proceed!
> Thanks in advance!
> Samuel Alfonso Velázquez Díaz wrote:
> Oh ok, I thougth it was going to be some thing like the egothor
> search engine (A java based search engine). When you create the
> Index, you issue a command like:
> java org.egothor.indexer.mirror.DoTanker /tmp/my_www
> Project/Egothor/var/www as http://localhost:8080
> /thmp/my_www: Is the path to the directory where the index is to be
> created
> Project/Egothor/var/www: is the path to the local file system files
> to be indexed.
> and as http://localhost:8080 is the prefix that the index will keep
> on the hit list. This way the index will be relative to
> http://localhost:8080. Even if your production site may be an other
> site.
> Thanks for your comments, any way now I know that I have to modify
> code to do this.
> Regards!
> Jeff Linwood wrote:Hi,
> I'm not a hundred percent sure I understand what you are asking, but
> when
> you get the results back from Lucene (the hits) it's up to you to
> format
> them to display on a web page - you can always do the modification
> there
> when you display the links to the results.
> Jeff
> ----- Original Message -----
> From: "Samuel Alfonso Velázquez Díaz" 
> To: "Lucene Users List" 
> Sent: Tuesday, March 04, 2003 11:33 AM
> Subject: Regarding Setup Lucine for my site
> >
> > The documentation says:
> >
> > Once you've gotten this far you're probably itching to go. Let's
> start by
> creating the index you'll need for the web examples. Since you've
> already
> set your classpath in the previous examples, all you need to do is
> type
> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir}
> ..".
> You'll need to do this from a (any) subdirectory of your
> {tomcat}/webapps
> directory (make sure you didn't leave off the ".." or you'll get a
> null
> pointer exception). {index-dir} should be a directory that Tomcat has
> permission to read and write, but is outside of a web accessible
> context. By
> default the webapp is configured to look in /opt/lucene/index for
> this
> index.
> >
> > A copy of my site is in:
> >
> > C:\CopiaSite20030228\
> >
> > My web application runs on
> >
> >
> >
> > how can I make the lucene index map the URLs of the indexed files
> to:
> >
> >
> >
> >
> >
> > Please help!
> >
> >
> > Samuel Alfonso Velázquez Díaz
> >
> >
> >
> >
> > ---------------------------------
> > Do you Yahoo!?
> > Yahoo! Tax Center - forms, calculators, tips, and more
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> Samuel Alfonso Velázquez Díaz
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more
> Samuel Alfonso Velázquez Díaz
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more

Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more

To unsubscribe, e-mail:
For additional commands, e-mail:

Samuel Alfonso Velázquez Díaz

Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message