manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jetnet <jet...@gmail.com>
Subject Re: Session-based authentication
Date Thu, 07 Jul 2016 12:49:39 GMT
hi Karl,

the problem was the host name in the seeding URL, not the FQDN. So,
the default cookie policy does woks with FQDNs only.
That's why the obtained cookies were never used for the further requests.
Changing the seeding URL to the "full host name" format solved the problem.

jeeeez, that was a weird one...

How about adding the next line to the code?

cookie.setAttribute(ClientCookie.DOMAIN_ATTR, "true");

Thanks!
Konstantin

2016-07-07 13:24 GMT+02:00 Karl Wright <daddywri@gmail.com>:
> Hi Konstantin,
>
> The mock site that the test crawls and logs into is generated by
> MockSessionWebService.java, under
> connectors/webcrawler/connector/src/test/java/org/apache/manifoldcf/crawler/connectors/webcrawler/tests.
> It does almost precisely what your site is doing.  The test itself is
> SessionTester.java.  Your setup should be similar to how the test sets up
> the login sequence and protected content area.
>
> Thanks,
> Karl
>
>
> On Thu, Jul 7, 2016 at 7:17 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>> Hi Konstantin,
>>
>> There is an advanced Web Connector integration test, which currently
>> passes, that tests session login and cookie transmission.  I'll look over
>> the test to be sure it is complete, but if so you should really be looking
>> at your login sequence and verifying that the cookie set takes place in a
>> request that is part of the login sequence.
>>
>> Thanks,
>> Karl
>>
>>
>> On Thu, Jul 7, 2016 at 6:58 AM, jetnet <jetnet@gmail.com> wrote:
>>>
>>> Thanks for the hint regarding the httpclient logging!
>>> So, it turned out, the cookies do NOT get added to the request:
>>>
>>> DEBUG 2016-07-07 12:49:26,015 (Worker thread '4') - WEB: Get method
>>> for '/sitemap.xml'
>>> DEBUG 2016-07-07 12:49:26,015 (Worker thread '4') - WEB: Adding 2
>>> cookies for '/sitemap.xml'
>>> DEBUG 2016-07-07 12:49:26,015 (Worker thread '4') - WEB:  Cookie
>>> '[version: 0][name: PHPSESSID][value:
>>> 8jegbs2dqb6r9oc3mb4pt0q777][domain: wikisite][path: /][expiry: null]'
>>> added
>>> DEBUG 2016-07-07 12:49:26,015 (Worker thread '4') - WEB:  Cookie
>>> '[version: 0][name: authtoken][value:
>>> 920_636034784213249598_d2f40072be60b4de7bee72d74fc04400][domain:
>>> wikisite][path: /][expiry: Thu Jul 14 10:53:41 CEST 2016]' added
>>> DEBUG 2016-07-07 12:49:26,030 (Thread-1214) - CookieSpec selected:
>>> standard
>>> DEBUG 2016-07-07 12:49:26,093 (Thread-1214) - Auth cache not set in the
>>> context
>>> DEBUG 2016-07-07 12:49:26,093 (Thread-1214) - Connection request:
>>> [route: {}->http://wikisite:80][total kept alive: 0; route allocated:
>>> 0 of 1; total allocated: 0 of 20]
>>> DEBUG 2016-07-07 12:49:26,140 (Thread-1214) - Connection leased: [id:
>>> 0][route: {}->http://wikisite:80][total kept alive: 0; route
>>> allocated: 1 of 1; total allocated: 1 of 20]
>>> DEBUG 2016-07-07 12:49:26,140 (Thread-1214) - Opening connection
>>> {}->http://wikisite:80
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - Connecting to
>>> wikisite/10.0.0.100:80
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - Connection established
>>> 10.0.0.184:58501<->10.0.0.100:80
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0: set
>>> socket timeout to 300000
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - Executing request GET
>>> /sitemap.xml HTTP/1.1
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - Target auth state:
>>> UNCHALLENGED
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - Proxy auth state:
>>> UNCHALLENGED
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0 >> GET
>>> /sitemap.xml HTTP/1.1
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0 >>
>>> User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler;
>>> email@wikisite.com)
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0 >> From:
>>> email@wikisite.com
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0 >> Accept:
>>> */*
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0 >>
>>> Accept-Encoding: gzip,deflate
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0 >> Host:
>>> wikisite
>>> DEBUG 2016-07-07 12:49:26,155 (Thread-1214) - http-outgoing-0 >>
>>> Connection: Keep-Alive
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 << HTTP/1.1
>>> 200 OK
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 <<
>>> Content-Type: application/xml; charset=utf-8
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 <<
>>> Server: Microsoft-IIS/7.5
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 <<
>>> X-Powered-By: PHP/5.2.14
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 <<
>>> Set-Cookie: PHPSESSID=bk9487elppchvshc38c7pfnv01; path=/; HttpOnly
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 <<
>>> X-Powered-By: ASP.NET
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 << Date:
>>> Thu, 07 Jul 2016 10:49:38 GMT
>>> DEBUG 2016-07-07 12:49:37,768 (Thread-1214) - http-outgoing-0 <<
>>> Content-Length: 684207
>>>
>>>
>>> Jira tiket? :)
>>>
>>> Thanks,
>>> Konstantin
>>>
>>>
>>> 2016-07-07 12:37 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>> > It really does add cookies as stated.
>>> >
>>> > That doesn't mean, however, that the cookies being sent correspond to a
>>> > session that is correctly logged in.  There's no way to tell this from
>>> > the
>>> > logs.
>>> >
>>> > You can possibly get more information about the back-and-forth by
>>> > enabling
>>> > httpcomponents/httpclient wire logging.  Headers only should be
>>> > sufficient.
>>> > You should see the exact cookies and be able to verify that the cookies
>>> > sent
>>> > are the ones that were returned.  You still won't be able to tell if
>>> > the
>>> > login was successful or not.
>>> >
>>> > Karl
>>> >
>>> >
>>> >
>>> > On Thu, Jul 7, 2016 at 6:25 AM, jetnet <jetnet@gmail.com> wrote:
>>> >>
>>> >> ok, so, it means, that I do not need the 3rd stage at all? As the
>>> >> second stage (form authentication) records the cookies and redirects
>>> >> back:
>>> >>
>>> >> the second stage:
>>> >>
>>> >> DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post method
>>> >> for '/Special:UserLogin'
>>> >> DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post
>>> >> parameter name 'username' value 'someuser' for '/Special:UserLogin'
>>> >> DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post
>>> >> parameter name 'returntourl' value 'http://wikisite/sitemap.xml' for
>>> >> '/Special:UserLogin'
>>> >> DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Post
>>> >> parameter name 'password' value 'XXXXXX' for '/Special:UserLogin'
>>> >> DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB: Adding 2
>>> >> cookies for '/Special:UserLogin'
>>> >> DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB:  Cookie
>>> >> '[version: 0][name: PHPSESSID][value:
>>> >> bughgf8fbjkkevk79ot4ef2vj1][domain: wikisite][path: /][expiry: null]'
>>> >> added
>>> >> DEBUG 2016-07-07 10:52:48,231 (Worker thread '79') - WEB:  Cookie
>>> >> '[version: 0][name: authtoken][value:
>>> >> 920_636034352097041592_136c71f2ac1fc2dd1ba72de805fcd1b5][domain:
>>> >> wikisite][path: /][expiry: Wed Jul 13 22:53:29 CEST 2016]' added
>>> >> DEBUG 2016-07-07 10:52:48,434 (Worker thread '79') - WEB: Retrieving
>>> >> cookies...
>>> >> DEBUG 2016-07-07 10:52:48,434 (Worker thread '79') - WEB:   Cookie
>>> >> '[version: 0][name: PHPSESSID][value:
>>> >> 589h3f20tjndhkc391nu5u0u51][domain: wikisite][path: /][expiry: null]'
>>> >> DEBUG 2016-07-07 10:52:48,434 (Worker thread '79') - WEB:   Cookie
>>> >> '[version: 0][name: authtoken][value:
>>> >> 920_636034783686256706_585415102d050458acfd91a9d1f223d5][domain:
>>> >> wikisite][path: /][expiry: Thu Jul 14 10:52:48 CEST 2016]'
>>> >>  INFO 2016-07-07 10:52:48,449 (Worker thread '79') - WEB: FETCH
>>> >> LOGIN|http://wikisite/Special:UserLogin|1467881568231+218|302|153|
>>> >> DEBUG 2016-07-07 10:52:48,449 (Worker thread '79') - WEB: Document
>>> >> 'http://wikisite/Special:UserLogin' did not match expected form, link,
>>> >> redirection, or content for sequence 'wikisite'
>>> >>
>>> >> so, the last message means, nothing matches in the sequence anymore
-
>>> >> logon end.
>>> >> And the last two cookies are being used for the next fetch of the
>>> >> sitemap, but the its content still matches the public pattern.
>>> >>
>>> >> Strange things happen... I just tried to use the authtoken cookie from
>>> >> the log direct in the browser - and it gets authenticated without
>>> >> problems: I get the "private" content. But the manifoldcf not...
>>> >> weird...
>>> >>
>>> >> DEBUG 2016-07-07 10:52:48,543 (Worker thread '79') - WEB: Adding 2
>>> >> cookies for '/sitemap.xml'
>>> >> DEBUG 2016-07-07 10:52:48,543 (Worker thread '79') - WEB:  Cookie
>>> >> '[version: 0][name: PHPSESSID][value:
>>> >> 589h3f20tjndhkc391nu5u0u51][domain: wikisite][path: /][expiry: null]'
>>> >> added
>>> >> DEBUG 2016-07-07 10:52:48,543 (Worker thread '79') - WEB:  Cookie
>>> >> '[version: 0][name: authtoken][value:
>>> >> 920_636034783686256706_585415102d050458acfd91a9d1f223d5][domain:
>>> >> wikisite][path: /][expiry: Thu Jul 14 10:52:48 CEST 2016]' added
>>> >>  INFO 2016-07-07 10:52:58,500 (Worker thread '79') - WEB: FETCH
>>> >> URL|http://wikisite/sitemap.xml|1467881568543+9957|200|684072|
>>> >>
>>> >> size: 684072 - is public content.
>>> >>
>>> >> Does it **really** add the cookies to the request? :)
>>> >>
>>> >> Thanks!
>>> >> Konstantin
>>> >>
>>> >> 2016-07-07 11:44 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>> >> > "I thought, when the auth sequence is done
>>> >> > (exit login mode), the redirect to the original page happens
>>> >> > automatically (which is the case here, but somehow the content
is
>>> >> > still "public")."
>>> >> >
>>> >> > That is correct BUT if the final redirection is what sets the
>>> >> > cookies
>>> >> > THEN
>>> >> > the cookies will only be recorded by the web connector if the final
>>> >> > redirection is part of the login sequence.
>>> >> >
>>> >> > Thanks,
>>> >> > Karl
>>> >> >
>>> >> >
>>> >> > On Thu, Jul 7, 2016 at 5:33 AM, jetnet <jetnet@gmail.com>
wrote:
>>> >> >>
>>> >> >> hi Karl,
>>> >> >> thank you for the very prompt feedback!
>>> >> >>
>>> >> >> > 1) Have you made sure to include the redirection back
to the
>>> >> >> > content?
>>> >> >> This is the step I don't quite understand - could you please
>>> >> >> clarify
>>> >> >> how that could be done? I thought, when the auth sequence is
done
>>> >> >> (exit login mode), the redirect to the original page happens
>>> >> >> automatically (which is the case here, but somehow the content
is
>>> >> >> still "public").
>>> >> >>
>>> >> >> > 2) your check for *entering* the login sequence is too
broad and
>>> >> >> > fires
>>> >> >> > again even though the private sitemap page is being returned.
>>> >> >> totally agree, that's why the first step is to look into the
>>> >> >> content
>>> >> >> of the page, to check, if there is a pattern which appears
in the
>>> >> >> public version ONLY.
>>> >> >> This is the only solution I can imagine so far, but any ideas
-
>>> >> >> very
>>> >> >> welcome!
>>> >> >>
>>> >> >> The simple history shows basically the same - the process never
>>> >> >> leaves
>>> >> >> the login stage.
>>> >> >>
>>> >> >> If I remove the 3rd step, then I see, that the login stage
is over
>>> >> >> (logon end), but as the content of the sitemap.xml is still
>>> >> >> "public",
>>> >> >> the login process kicks in again.
>>> >> >>
>>> >> >> Thanks!
>>> >> >> Konstantin
>>> >> >>
>>> >> >> 2016-07-07 11:07 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>> >> >> > Hi Konstantin,
>>> >> >> >
>>> >> >> > There are two possibilities:
>>> >> >> >
>>> >> >> > (1) You have missed one stage when specifying the login
sequence.
>>> >> >> > The
>>> >> >> > cookies are getting set, but not during a step that's
part of the
>>> >> >> > login
>>> >> >> > sequence.  Have you made sure to include the redirection
back to
>>> >> >> > the
>>> >> >> > content?
>>> >> >> > (2) You really are logging in but your check for *entering*
the
>>> >> >> > login
>>> >> >> > sequence is too broad and fires again even though the
private
>>> >> >> > sitemap
>>> >> >> > page
>>> >> >> > is being returned.
>>> >> >> >
>>> >> >> > You can also look at the simple history as well to get
an idea
>>> >> >> > what
>>> >> >> > MCF
>>> >> >> > is
>>> >> >> > doing for your job for session handling.
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> > Karl
>>> >> >> >
>>> >> >> >
>>> >> >> > On Thu, Jul 7, 2016 at 4:35 AM, jetnet <jetnet@gmail.com>
wrote:
>>> >> >> >>
>>> >> >> >> Hi All,
>>> >> >> >>
>>> >> >> >> I've been trying to setup a session-based auth sequence
for a
>>> >> >> >> forked
>>> >> >> >> MediaWiki site (Wiki connector does not work with
this version),
>>> >> >> >> but
>>> >> >> >> somehow got stuck with the configuration.
>>> >> >> >> The idea is to index the site using its sitemap.xml
with hops=1.
>>> >> >> >> The
>>> >> >> >> "public" version (user not logged in) of the sitemap.xml
>>> >> >> >> contains a
>>> >> >> >> different set of links as the "authenticated" one
(user logged
>>> >> >> >> in).
>>> >> >> >> The current auth sequence looks like this (the job's
seeding
>>> >> >> >> URL=http://wikisite/sitemap.xml):
>>> >> >> >>
>>> >> >> >> 1) the first call to the seeding URL should be redirected
to the
>>> >> >> >> login
>>> >> >> >> page
>>> >> >> >> Login URL regexp: sitemap.xml
>>> >> >> >> Page type: content
>>> >> >> >> Identification regular expression: <some content
from the
>>> >> >> >> "public"
>>> >> >> >> version>
>>> >> >> >> Override target URL: /Special:UserLogin
>>> >> >> >>
>>> >> >> >> 2) enter user's credentials on the login page
>>> >> >> >> Login URL regexp: Special:UserLogin
>>> >> >> >> Page type: form
>>> >> >> >> Override form parameters: username=someuser, password=******,
>>> >> >> >> returntourl=http://wikisite/sitemap.xml
>>> >> >> >>
>>> >> >> >> 3) the login page ***should*** redirect back to the
seeding URL
>>> >> >> >> with
>>> >> >> >> the authorized content
>>> >> >> >> Login URL regexp: /Special:UserLogin
>>> >> >> >> Page type: redirection
>>> >> >> >> Identification regular expression: /sitemap.xml
>>> >> >> >>
>>> >> >> >> From the log-file I can see, that first 2 steps work
fine - the
>>> >> >> >> public
>>> >> >> >> content gets recognized, the form data get sent, the
session's
>>> >> >> >> cookies
>>> >> >> >> get set. But the 3rd step returns the "public" version
of the
>>> >> >> >> sitemap.xml again, and the login process is getting
stuck in a
>>> >> >> >> loop.
>>> >> >> >> Am I on the right way or did I miss something?
>>> >> >> >>
>>> >> >> >> here is the log for the 3rd step:
>>> >> >> >>
>>> >> >> >>  INFO 2016-07-06 22:52:27,285 (Worker thread '43')
- WEB: FETCH
>>> >> >> >>
>>> >> >> >> LOGIN|http://wikisite/Special:UserLogin|1467838347082+203|302|153|
>>> >> >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43')
- WEB: Tried
>>> >> >> >> to
>>> >> >> >> match raw url 'http://wikisite/sitemap.xml'
>>> >> >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43')
- WEB: Tried
>>> >> >> >> to
>>> >> >> >> match cooked url 'http://wikisite/sitemap.xml'
>>> >> >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43')
- WEB:
>>> >> >> >> Redirection
>>> >> >> >> link lookup matched 'http://wikisite/sitemap.xml'
>>> >> >> >> DEBUG 2016-07-06 22:52:27,285 (Worker thread '43')
- WEB:
>>> >> >> >> Document
>>> >> >> >> 'http://wikisite/Special:UserLogin' matches preferred
>>> >> >> >> redirection,
>>> >> >> >> so
>>> >> >> >> determined to be login page for sequence 'wikisite'
>>> >> >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43')
- WEB:
>>> >> >> >> Waiting
>>> >> >> >> for
>>> >> >> >> an HttpClient object
>>> >> >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43')
- WEB: For
>>> >> >> >> http://wikisite/sitemap.xml, setting virtual host
to wikisite
>>> >> >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43')
- WEB: Got an
>>> >> >> >> HttpClient object after 0 ms.
>>> >> >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43')
- WEB: Get
>>> >> >> >> method
>>> >> >> >> for '/sitemap.xml'
>>> >> >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43')
- WEB: Adding
>>> >> >> >> 2
>>> >> >> >> cookies for '/sitemap.xml'
>>> >> >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43')
- WEB:
>>> >> >> >> Cookie
>>> >> >> >> '[version: 0][name: PHPSESSID][value:
>>> >> >> >> 1vnhgi0f84dc9pi6eaoj0nau45][domain: wikisite][path:
/][expiry:
>>> >> >> >> null]'
>>> >> >> >> added
>>> >> >> >> DEBUG 2016-07-06 22:52:27,394 (Worker thread '43')
- WEB:
>>> >> >> >> Cookie
>>> >> >> >> '[version: 0][name: authtoken][value:
>>> >> >> >> 920_636034351472613318_616a5fd45ce4d5fed6c5318d73b38070][domain:
>>> >> >> >> wikisite][path: /][expiry: Wed Jul 13 22:52:27 CEST
2016]' added
>>> >> >> >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43')
- WEB:
>>> >> >> >> Retrieving
>>> >> >> >> cookies...
>>> >> >> >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43')
- WEB:
>>> >> >> >> Cookie
>>> >> >> >> '[version: 0][name: PHPSESSID][value:
>>> >> >> >> vqfpr88pqa6d62nl6h4lp03nu1][domain: wikisite][path:
/][expiry:
>>> >> >> >> null]'
>>> >> >> >> DEBUG 2016-07-06 22:52:35,660 (Worker thread '43')
- WEB:
>>> >> >> >> Cookie
>>> >> >> >> '[version: 0][name: authtoken][value:
>>> >> >> >> 920_636034351472613318_616a5fd45ce4d5fed6c5318d73b38070][domain:
>>> >> >> >> wikisite][path: /][expiry: Wed Jul 13 22:52:27 CEST
2016]'
>>> >> >> >>  INFO 2016-07-06 22:52:37,004 (Worker thread '43')
- WEB: FETCH
>>> >> >> >> LOGIN|http://wikisite/sitemap.xml|1467838347394+9610|200|683773|
>>> >> >> >> DEBUG 2016-07-06 22:52:37,004 (Worker thread '43')
- WEB:
>>> >> >> >> Document
>>> >> >> >> 'http://wikisite/sitemap.xml' is text, with encoding
'utf-8';
>>> >> >> >> link
>>> >> >> >> extraction starting
>>> >> >> >> DEBUG 2016-07-06 22:52:37,019 (Worker thread '43')
- WEB:
>>> >> >> >> Document
>>> >> >> >> 'http://wikisite/sitemap.xml' matches content, so
determined to
>>> >> >> >> be
>>> >> >> >> login page for sequence 'wikisite'
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> Thank you!
>>> >> >> >> regards, Konstantin
>>> >> >> >
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>
>>
>

Mime
View raw message