manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Problems while indexing Jira/ and an other problem
Date Mon, 25 Oct 2010 13:37:19 GMT
A couple of comments.
(1) The place where you start is called the "seed".  Let's use that
terminology so I don't get confused.
(2) Every session-protected area has a regular expression that is
supposed to match all protected pages in the area.  In the connection
edit page Access Credentials tab, for Session authentication, this
appears as a column called "URL regular expression".  It is this
expression I was referring to when I said that it seemed incorrect for
your task.  This has nothing to do with the seeds for the crawl.
(3) Even with javascript login, it is usually possible to fill in the
form with information that performs the login.  The only exception is
when some complex logic, such as MD5 or string manipulation, is used
to fill in the form fields.  A good way to do it is to read the
Javascript and figure out what it is trying to do, and then just fill
in the form fields with what the Javascript would have done.

Hope this helps.  I'd actually try crawling a Jira instance myself to
be of further assistance, but I'm really pressed for time right now.

Karl

On Mon, Oct 25, 2010 at 3:14 PM, Fred Schmitt <fredschmitt83@web.de> wrote:
> Hi,
> sry it don't work or i missunderstand you. Here the way I tried.
>
> I started at "http://.../jira/". These site redirects me to "http://.../jira/secure/Dashboard.jspa".
That don't work and I have to exclude many pages because i get stuff I don't want and the
logging is not really working.
> "http://.../jira/secure/Dashboard.jspa" is another loggin site but it needs javascript
to login and that don't work.
> I have to be logged in the "jira/secure/IssueNavigator" ,because there I can browse through
my projects, and the projects under "jira/browse/projectname" and there issues "jira/browse/projectname-issuenumber".
>
> At the moment the only way is that I log in for "http://.../jira/IssueNavigator.jspa"
and start there and then I have to write for each project another login sequenz "http://.../jira/browse/projectname"
and then i am logged in for all issues of this project but only one project each login sequenz.
>
> best Regards
> Fred
>
> -----Ursprüngliche Nachricht-----
> Von: "Karl Wright" <daddywri@gmail.com>
> Gesendet: 25.10.2010 11:29:16
> An: connectors-user@incubator.apache.org
> Betreff: Re: Problems while indexing Jira/ and an other problem
>
>>Fred, did this answer help you?  Are you all set now?
>>Karl
>>
>>On Sat, Oct 23, 2010 at 3:20 AM, Karl Wright <daddywri@gmail.com> wrote:
>>> The web connector will not send a secured site's cookies to a page
>>> that does not match the regular expression that defines the overall
>>> secured area.  Your secured area url seems to be
>>> "http://.../jira/secure/IssueNavigator.jspa", which is not sufficient
>>> obviously.
>>>
>>> Karl
>>>
>>> On Fri, Oct 22, 2010 at 4:21 AM, Fred Schmitt <fredschmitt83@web.de> wrote:
>>>> Hi,
>>>>
>>>> thanks for the quick answer.
>>>> I have tested your suggestion and started at the page http://..../jira/secure/IssueNavigator.jspa
and wrote a logging sequence.
>>>> Here is a extract of my Access Credentials of my web Connector.
>>>>
>>>> URL regular expression
>>>> http://.../jira/secure/IssueNavigator.jspa
>>>>
>>>> Login Pages
>>>> Login URL regular expression      Page type           Form name/link
target regular expression       Override form parameters
>>>>                                                  
   link                          http://.../jira/login.jsp
>>>>  http://.../jira/login.jsp                   form          
                                                           
                 Parameter regular expression      Value       Password
>>>>                                                  
                                                           
                                              username        
                      fred
>>>>                                                  
                                                           
                                              password        
                                       ******
>>>>
>>>> I am logged in and can browse through all issues but when I fetch an index
an issue, for example "http://.../jira/browse/project-5", I get the message that I am not
logged in anymore. It works when I write a login sequence for this project, but only for the
specified one and it's issues. That means that I have to write a login sequence for each project.
>>>> How could i solve this problem and log in for all pages without writing many
login sequences?
>>>> I have already tried to write a login sequence which included "http://.../jira/browse/*"
or  "http://.../jira/browse/", but both haven't worked.
>>>>
>>>> best Regards,
>>>> Fred
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: "Karl Wright" <daddywri@gmail.com>
>>>> Gesendet: 20.10.2010 13:11:23
>>>> An: connectors-user@incubator.apache.org
>>>> Betreff: Re: Problems while indexing Jira/ and an other problem
>>>>
>>>>>Hi,
>>>>>I think you should open a JIRA ticket for the Windows Share connector.
>>>>> It sounds like the javascript for handling the insert link might be
>>>>>broken in the UI.
>>>>>
>>>>>As for the web session login, the MCF crawler of course handles
>>>>>cookies - that is a major piece of session authentication.  The
>>>>>question is whether it is recording the cookie set that happens as a
>>>>>result of the login sequence.  What you want to be sure of is that all
>>>>>the parts of the login, including the final redirection back to the
>>>>>content page, are considered part of the login sequence.  You also
>>>>>want to be sure that you don't use as your seed URL the login page
>>>>>itself, because then there is no place to resume when the login is
>>>>>done.  Instead you want a seed which is the root or home page.  If
>>>>>login is mandatory, then presumably there would be a redirection that
>>>>>takes you to the login page.  That redirection should *also* be part
>>>>>of the login sequence.
>>>>>In short, the login sequence needs to cover every fetch that isn't
>>>>>actual indexable content.  The cookies that are set at the end of that
>>>>>sequence are what will be retained for all subsequent fetches from the
>>>>>protected area of the site that you specify with your url regular
>>>>>expression.
>>>>>
>>>>>Hope this helps.
>>>>>
>>>>>Karl
>>>> ___________________________________________________________
>>>> GRATIS! Movie-FLAT mit über 300 Videos.
>>>> Jetzt freischalten unter http://movieflat.web.de
>>>>
>>>
> ___________________________________________________________
> GRATIS! Movie-FLAT mit über 300 Videos.
> Jetzt freischalten unter http://movieflat.web.de
>

Mime
View raw message