lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <>
Subject [jira] [Commented] (SOLR-7107) bin/post example should use for crawls
Date Sat, 14 Feb 2015 00:04:12 GMT


Jan Høydahl commented on SOLR-7107:

Crawling with bin/post fails with 500 errors due to a bunch of CMS pages
lacking the {{<html>}} and {{</html>}} tags. I don't know the history of this,
was it intentional? I tried to fix it, but it's a bit confusing.

I *think* we're fine if all templates referred to from {{lib/}} have {{<html>}}
tags added, and that none of them include eachother. Currently, {{core.html}} is both a top-page
and also included from {{mirrors-core-latest-redir.html}} and {{mirrors-core-redir.html}}
for some reason.

To reproduce the crawl errors:
bin/post -c gettingstarted

> bin/post example should use for crawls
> --------------------------------------------------------
>                 Key: SOLR-7107
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: scripts and tools
>            Reporter: Jan Høydahl
>            Assignee: Erik Hatcher
>            Priority: Minor
>             Fix For: 5.1
>         Attachments: SOLR-7107.patch
> We should not encourage crawl of non-ASF sites in examples and tutorials. The {{bin/post}}
script will be changed from crawling to
> However, there are some bad 500 errors from Tika complaining about not well-formed HTML
code on our site, so I'm committing some CMS fixes for that first.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message