manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Beneitez <gustavo.benei...@gmail.com>
Subject Re: web crawler not sharing cookies
Date Wed, 25 Jul 2018 20:35:59 GMT
I agree, but the fact is that if my "login sequence" defines a login
credential for domain "Z.com" and the crawler reaches "Y.Z.com" or "
X.Y.Z.com", none of the sub-sites receives that cookie, I need to write
same cookie  for every sub-domain, that solves the situation (and
thankfully is a language cookie and not a dynamic one).

Regards.

El mié., 25 jul. 2018 a las 19:17, Karl Wright (<daddywri@gmail.com>)
escribió:

> You should not need to fill the database by hand.  Your login sequence
> should include whatever redirection etc is used to set the cookies though.
>
> Karl
>
>
> On Wed, Jul 25, 2018 at 1:06 PM Gustavo Beneitez <
> gustavo.beneitez@gmail.com> wrote:
>
>> Hi again,
>>
>> Thanks Karl, I was able of doing that after defining some "login
>> sequence", but also after filling database (cookiedata table) with certain
>> values due to "domain constrictions".
>> Before every web call, I suspect Manifold only takes cookies from URL
>> exact subdomain (i.e. x.y.z.com), so if you define your cookie as "z.com"
>> it won't be sent, so I added every subdomain by hand and started to work.
>>
>> Regards.
>>
>>
>> El vie., 20 jul. 2018 a las 8:12, Gustavo Beneitez (<
>> gustavo.beneitez@gmail.com>) escribió:
>>
>>> Hi,
>>>
>>> thanks a lot, please let me check then the documentation for an example
>>> of that.
>>>
>>> Regards!
>>>
>>> El jue., 19 jul. 2018 a las 21:54, Karl Wright (<daddywri@gmail.com>)
>>> escribió:
>>>
>>>> You are correct that cookies are not shared among threads.  That is by
>>>> design.
>>>>
>>>> The only way to set cookies for the WebConnector is to have there be a
>>>> "login sequence".  The login sequence sets cookies that are then used by
>>>> all subsequent fetches.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Thu, Jul 19, 2018 at 3:38 PM Gustavo Beneitez <
>>>> gustavo.beneitez@gmail.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I have tried to look for an answer before writing this email, no luck.
>>>>> Sorry for the inconvenience if it is already answered.
>>>>>
>>>>> I need to set a cookie at the begining of the web crawling. The cookie
>>>>> rules the language you get the content, and while there are several
>>>>> choices, if no cookie is found there will be a "default language".
>>>>>
>>>>> I made a JSP which sets the cookie and contains several links (href),
>>>>> and pointed ManifoldCF to this page as the repository seed. I expected
to
>>>>> get the crawling engine starting to capture links with correct language
>>>>> indicated by the cookie, but what I really got is a lot of content shown
in
>>>>> default language.
>>>>>
>>>>> What I think about that is that cookies are not shared between thread
>>>>> spiders, so it is not possible to get cookies remain between links. Cookie
>>>>> domain is correct, also cookie expiration
>>>>>
>>>>> I would appreciate so much  if you can help me on this.
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>

Mime
View raw message