manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: URL Mappings
Date Thu, 16 Apr 2015 10:20:43 GMT
Hi Luca,

It has been a long time since I've looked at this, and I don't see that
anyone wrote up this feature in the documentation either.  The problem is
in the mapping target format.  See the code below:

>>>>>>
    protected EvaluatorToken nextToken()
      throws ManifoldCFException
    {
      char x;
      // Fetch the next token
      while (true)
      {
        if (pos == text.length())
          return null;
        x = text.charAt(pos);
        if (x > ' ')
          break;
        pos++;
      }

      StringBuilder sb;

      if (x == '"')
      {
        // Parse text
        pos++;
        sb = new StringBuilder();
        while (true)
        {
          if (pos == text.length())
            break;
          x = text.charAt(pos);
          pos++;
          if (x == '"')
          {
            break;
          }
          if (x == '\\')
          {
            if (pos == text.length())
              break;
            x = text.charAt(pos++);
          }
          sb.append(x);
        }

        return new EvaluatorToken(sb.toString());
      }

      if (x == ',')
      {
        pos++;
        return new EvaluatorToken();
      }

      // Eat number at beginning
      sb = new StringBuilder();
      while (true)
      {
        if (pos == text.length())
          break;
        x = text.charAt(pos);
        if (x >= '0' && x <= '9')
        {
          sb.append(x);
          pos++;
          continue;
        }
        break;
      }
      String numberValue = sb.toString();
      int groupNumber = 0;
      if (numberValue.length() > 0)
        groupNumber = new Integer(numberValue).intValue();
      // Save the next char position
      int modifierPos = pos;
      // Go to the end of the word
      while (true)
      {
        if (pos == text.length())
          break;
        x = text.charAt(pos);
        if (x == ',' || x >= '0' && x <= '9' || x <= ' ' && x >=
0)
          break;
        pos++;
      }

      int style = EvaluatorToken.GROUPSTYLE_NONE;
      if (modifierPos != pos)
      {
        String modifier = text.substring(modifierPos,pos);
        if (modifier.startsWith("u"))
          style = EvaluatorToken.GROUPSTYLE_UPPER;
        else if (modifier.startsWith("l"))
          style = EvaluatorToken.GROUPSTYLE_LOWER;
        else if (modifier.startsWith("m"))
          style = EvaluatorToken.GROUPSTYLE_MIXED;
        else
          throw new ManifoldCFException("Unknown style: "+modifier);
      }
      return new EvaluatorToken(groupNumber,style);
    }
  }

<<<<<<

Basically, this is an *old style* mapping specification, back from prior to
2009, and it was never upgrade to accept "new style" target
specifications.  You need to include any actual new text in quote marks
("), any references to groups using tokens beginning with a group number
and ending with nothing or "u", "l", or "m", and separate all fields by
commas.

So try this as a target expression:

1

Thanks,
Karl



On Thu, Apr 16, 2015 at 4:28 AM, Basso Luca <
LBasso@regione.emilia-romagna.it> wrote:

>  Here it is…
>
>
>
> ERROR 2015-04-15 12:23:27,006 (Worker thread '47') - Exception tossed:
> Unknown style: $(
>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unknown style:
> $(
>
>                 at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$EvaluatorTokenStream.nextToken(WebcrawlerConnector.java:7512)
>
>                 at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$EvaluatorTokenStream.peek(WebcrawlerConnector.java:7407)
>
>                 at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$MappingRule.map(WebcrawlerConnector.java:7636)
>
>                 at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$MappingRules.map(WebcrawlerConnector.java:7715)
>
>                 at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$DocumentURLFilter.isDocumentIndexable(WebcrawlerConnector.java:8061)
>
>                 at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocument(WebcrawlerConnector.java:1315)
>
>                 at
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:747)
>
>                 at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:379)
>
>
>
> Luca
>
>
>
> *Da:* Karl Wright [mailto:daddywri@gmail.com]
> *Inviato:* mercoledì 15 aprile 2015 16:38
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: URL Mappings
>
>
>
> Can you include the full text of the error?
>
>
>
> Thanks,
>
> Karl
>
>
>
>
>
> On Wed, Apr 15, 2015 at 9:33 AM, Basso Luca <
> LBasso@regione.emilia-romagna.it> wrote:
>
> Hi Karl,
>
> we are running ManifoldCF 2.0.2 with the Web Repository connector and the
> Solr Output connector,
>
> which are working pretty well, but as soon as we try to use the ‘URL
> Mappings’ tab in the job definition,
>
> in order to remove a final slash, with the following simple regex:
>
>
>
> (.*)/$     -->    $(1)
>
>
>
> we get an ‘Unknown style…’ error.
>
> Please note that we’ve successfully tested such rule in advance here:
>
> http://www.regular-expressions.info/javascriptexample.html
>
>
>
> What is going wrong?
>
> Is our syntax correct?
>
> Can you please make a test on your reference system?
>
>
>
> Thank you.
>
>
>
> Best regards,
>
> Luca
>
>
>
>
>
>
>

Mime
View raw message