From nutch-dev-return-1188-apmail-incubator-nutch-dev-archive=www.apache.org@incubator.apache.org Wed Jun 15 10:27:54 2005 Return-Path: Delivered-To: apmail-incubator-nutch-dev-archive@www.apache.org Received: (qmail 63824 invoked from network); 15 Jun 2005 10:27:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 15 Jun 2005 10:27:54 -0000 Received: (qmail 17082 invoked by uid 500); 15 Jun 2005 10:27:54 -0000 Mailing-List: contact nutch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-dev@incubator.apache.org Delivered-To: mailing list nutch-dev@incubator.apache.org Received: (qmail 17069 invoked by uid 99); 15 Jun 2005 10:27:53 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of himars@gmail.com designates 64.233.162.194 as permitted sender) Received: from zproxy.gmail.com (HELO zproxy.gmail.com) (64.233.162.194) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 15 Jun 2005 03:27:53 -0700 Received: by zproxy.gmail.com with SMTP id 8so1473114nzo for ; Wed, 15 Jun 2005 03:27:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=kLFHorFjeGL6SD37COnpDS5226tA+GR3bh0nLh9sF/2yzAm94NiAxnwEMt2TKVjUaKsOglD00ztw6o3KSvPKHBHTyLfdCWqX7P9UW0mAUVM5CvYqtnPTGFmVEMznluirQnxIK11I1+ThhDUpEK8DeP2J89OiPT7Q2AVvAGe7m4I= Received: by 10.36.19.20 with SMTP id 20mr4300281nzs; Wed, 15 Jun 2005 03:27:03 -0700 (PDT) Received: by 10.36.80.11 with HTTP; Wed, 15 Jun 2005 03:27:03 -0700 (PDT) Message-ID: <1263a5c70506150327db147dc@mail.gmail.com> Date: Wed, 15 Jun 2005 18:27:03 +0800 From: Jack Tang Reply-To: Jack Tang To: nutch-dev@incubator.apache.org Subject: Nutch Query Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi All I have customized some query filters in passed two weeks. And one question here. As I mentioned in my previous email, the target website is made up of two part: text-only and graphic. My goal is to tag the index with "textonly" and "graphic". Here I show two approaches to reach the goal. Both query filters implements FieldQueryFilter. 1. Tagging the content(parse.getText()) with the name("textonly" and "graphic"), so the query string should look like: textonly:queryString=20 or=20 graphic:queryString 2. Adding another field whose name is "version", and the available values are "textonly" and "graphic". So the query string looks like: version:textonly queryString=20 or version:graphic queryString In my eyes, if queryString is the same, the search result should be the same. Right? But in my test, the later query filter show all textonly/graphic pages and ignore the queryString. The first one seems OK. So, can someone explain it more? BTW: In Query.class Query: version:graphic file Parsed: version:graphic file Translated: +version:graphic +(url:file^4.0 anchor:file^2.0 content:file) Regards /Jack