nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilia S. Yatsenko" <shortn...@yandex.ru>
Subject RE: both html parser have bug with javascript
Date Mon, 04 Jul 2005 03:43:26 GMT
And this <%@ Language=VBScript %> shown in summaries

I thought ANY text between < and > should be always ignored and unknown tags
too.

:)

-----Original Message-----
From: Ilia S. Yatsenko [mailto:shortname@yandex.ru] 
Sent: Monday, July 04, 2005 6:33 AM
To: nutch-dev@lucene.apache.org
Subject: RE: both html parser have bug with javascript

I thought "javascript" shown in summaries because I enable parse-js plug-in.
I have disabled it, made new database but got the same result :(

-----Original Message-----
From: Ilia S. Yatsenko [mailto:shortname@yandex.ru] 
Sent: Sunday, July 03, 2005 7:09 PM
To: nutch-dev@lucene.apache.org
Subject: RE: both html parser have bug with javascript

Opps, I see my mistake O-)

-----Original Message-----
From: Ilia S. Yatsenko [mailto:shortname@yandex.ru] 
Sent: Sunday, July 03, 2005 6:06 PM
To: nutch-dev@lucene.apache.org
Subject: both html parser have bug with javascript

Hello :)

Sorry my little English

 

I have issue with both html parsers.

I see in summaries next text: 

 

2JavaScript1.3JavaScriptJavaScriptjavascriptjavascript1.1javascript1.2javasc
ript1.3javascript my text description.

 

Or 

 

2javascript my text description.

 

Or

 

javascriptjavascript1.2javascript my text description.

 

But summary not should have it

 

Respectfully 








Mime
View raw message