manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: URL Mapping
Date Thu, 28 May 2020 18:12:27 GMT
That's a much better case for using the url mapper, yes.


On Thu, May 28, 2020 at 1:40 PM Michael Cizmar <michael.cizmar@mcplusa.com>
wrote:

> Right.  Another case that I'm exploring...crawling an internal site and
> wanting a load balanced url.  So you would crawl something like this:
>
> http://mystaging-server.myco.com/index.html
>
> and then want to change it to:
>
> https://www.myco.com/index.html
>
> Is that better for the url mapper?
>
>
>
> --
>
> Michael Cizmar
> Managing Director
>
> p: 312.585.6396
>
> d: 312.585.6286
> twitter: @michaelcizmar <http://twitter.com/michaelcizmar>
>
> http://www.mcplusa.com/
>
>
> The information contained in this communication is confidential, private,
> proprietary, or otherwise privileged and is intended only for the use of
> the addressee.  This e-mail is intended only for the person or entity to
> whom it is directed.  Unauthorized use, disclosure, distribution or copying
> is strictly prohibited and may be unlawful.  If you are not the intended
> recipient, please notify us immediately and permanently delete this e-mail
> and any attachments.
>
> ------------------------------
> *From:* Karl Wright <daddywri@gmail.com>
> *Sent:* Thursday, May 28, 2020 12:03 PM
> *To:* user@manifoldcf.apache.org <user@manifoldcf.apache.org>
> *Subject:* Re: URL Mapping
>
> Thanks!  It's far better to implement this than to try and hack it.  A
> general way of removing session information with regular expressions is
> probably not going to cut it either, so for now it's got to be in Java.
>
> Karl
>
>
> On Thu, May 28, 2020 at 12:47 PM Michael Cizmar <
> michael.cizmar@mcplusa.com> wrote:
>
> The "!ut" and then a bunch of session information is from Web Sphere
> Portal.  Some information about it here:
>
> https://books.google.com/books?id=bqAXnpmj5LwC&pg=PA180&lpg=PA180&dq=%22!ut%22+session+variables+websphere#v=onepage&q=%22!ut%22%20session%20variables%20websphere&f=false
>
> I'll look at making a change to the web crawler to suppor this like the BV
> and ASP.NET
>
> ------------------------------
> *From:* Karl Wright <daddywri@gmail.com>
> *Sent:* Thursday, May 28, 2020 11:41 AM
> *To:* user@manifoldcf.apache.org <user@manifoldcf.apache.org>
> *Subject:* Re: URL Mapping
>
> Hi,
>
> There are provisions in the URL canonicallization part of the world for
> removal of session information from the URL.  It only knows about some
> kinds of widely used sessions; java app server sessions, for example,
> Broadvision sessions, etc.  If you can convince me that your session
> information is (a) uniquely identifiable, and (b) commonly used, the proper
> approach is to incorporate session removal in this framework.  Please let
> me know.
>
> Karl
>
>
> On Thu, May 28, 2020 at 12:11 PM Michael Cizmar <
> michael.cizmar@mcplusa.com> wrote:
>
> I've got a really long url with a bunch of unnecessary session query
> string parameters.  I've been trying unsuccessfully to map it to the same
> url without the session.
>
> an example of the url below.  I thought I could do this:
>
> url map regular expression:
>
> (.*)\/!ut
>
> replacement configuration:
>
>
>
>
> So the go would be that the url be:
>
> http://localhost:8080/mcplusa/myportal/agents/portal/quoteenroll/digs%20-%20quoting%20%20enrollment%20(individual)/
>
> But the url gets rejected.
>
> Sample Crawl Url
>
>
> http://localhost:8080/mcplusa/myportal/agents/portal/quoteenroll/digs%20-%20quoting%20%20enrollment%20(individual)/!ut/p/a1/rZHLTsMwEEV_hS6yjDx5OWZpdRFImzYCAYk3lZM6D5TYSWoqPh8HFu2GQhHejEeae-aOLmIoQ0zyY1tz3SrJu7lneLfdBtTxI1iRhzsMFEfrpZ_6AFFoBnIzAN88Cj_pXxBDrJR60A3KeS2kvimV1KZaMKhJ886C8U1pIeSkOtNM3Pz5QewO3IJG9WIGDGW7RzkB7hZFIWxyyx3bL8LAJo6L7QoELitMPAH7r4WXLefmpvBkOoqfiTHth6vYTRxIAT1eufMy8D74Z2DqXg2Mf5Fz-zqOjJq05nzeNcr-FpchuVOyTGpjkOvGbmWlUHYmQtmZCGWfoqF_6omHq83G5gUBL-iOa0oXiw9FOxLu/dl5/d5/L0lJS2FZcHBpbW1LYVlwcGltbVlwcGchIS9vSHd3QUFBSXdpRUFJSkRBQ1VZaUVJVTVCZ09DbFFBQUlBQVNvU0FyUnFBQURBQWF0QXdMTzlRQUFFQUJ3WWVBR0tTQUFDa0k1Z21HU3dTaXJTQUFDZ0s5ZzBIUS80SmlHcGhxRWFoR29ScUVhbEdwaC9aNl9PTzVBMTRHMEs4Ukg2MEE2R0xDNFA0MDBHNy9hZ2VudCBjb250ZW50JTBwb3J0YWwlMHF1b3RlZW5yb2xsJTBkaWdzIC0gcXVvdGluZyAgZW5yb2xsbWVudCAoaW5kaXZpZHVhbCkvZjQ0YmEyOWUtODQwOC00YjFlLTg4MzktMTFlMjI4NDgxYTVhL2RpZ3MgLSBxdW90aW5nICBlbnJvbGxtZW50IChpbmRpdmlkdWFsKQ
>
>

Mime
View raw message