nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Whitman (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-414) parse-mp3 plugin concatenating previous tags for text field
Date Tue, 12 Dec 2006 15:29:20 GMT
parse-mp3 plugin concatenating previous tags for text field
-----------------------------------------------------------

                 Key: NUTCH-414
                 URL: http://issues.apache.org/jira/browse/NUTCH-414
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.9.0
         Environment: -
            Reporter: Brian Whitman


The parse-mp3 plugin seems to be saving a state of the previous parse's text content. For
every new mp3 file parsed, it is putting the contents of all the previous text fields in the
plain text field for that file.

You can see this by fetching a set of mp3s in one segment, then viewing their plain text in
the nutch webapp. The plaintext will include the contents of all files fetched in that round,
which makes searching fruitless.

I made a tiny band-aid change to MP3Parser.java and MetadataCollector.java against the nightly.
It seems to fix the problem.



--- MP3Parser.java      2006-12-10 09:43:26.000000000 -0500
+++ MP3Parser.java.new  2006-12-10 16:37:03.000000000 -0500
@@ -67,7 +67,7 @@
       fos.write(raw);
       fos.close();
       MP3File mp3 = new MP3File(tmp);
-
+         metadataCollector.clearText();
       if (mp3.hasID3v2Tag()) {
         parse = getID3v2Parse(mp3, content.getMetadata());
       } else if (mp3.hasID3v1Tag()) {

--- MetadataCollector.java      2006-12-10 09:43:26.000000000 -0500
+++ MetadataCollector.java.new  2006-12-10 16:37:28.000000000 -0500
@@ -42,6 +42,10 @@
       this.conf = conf;
   }

+  public void clearText() {
+       text = "";
+  }
+
   public void notifyProperty(String name, String value) throws
MalformedURLException {
     if (name.equals("TIT2-Text"))
       setTitle(value);





-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message