incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "giulio.cesare@gmail.com" <giulio.ces...@gmail.com>
Subject Re: Selecting parser based on actual response content, and not just content-type header
Date Wed, 16 Mar 2011 12:55:34 GMT
Hello Bertil,

attached you can find the initial patch to include my suggested
revision of ParserFactory.

Please let me know if it has any chance of getting into the main
codebase or not.

Thanks,

Giulio Cesare




On Wed, Mar 16, 2011 at 9:40 AM, giulio.cesare@gmail.com
<giulio.cesare@gmail.com> wrote:
> Ok. I will try to work first on the refactoring of the ParserFactory,
> and submit the patches for community evaluation.
>
> Later on it would be possible to group together all the different
> factories into a WorkerFactories class.
>
> Thanks,
>
> Giulio Cesare
>
>
>
> On Wed, Mar 16, 2011 at 8:39 AM, Chapuis Bertil <bchapuis@agimem.com> wrote:
>> There is an issue [1] about merging all the factories in a single worker
>> factory. In your scenario after such change you will have to create a worker
>> which is able to select the right parser. Your changes may take this issue
>> into account. If the changes are valued by the community, they will probably
>> be merged back.
>>
>> [1] - https://issues.apache.org/jira/browse/DROIDS-108
>>
>>
>> On 15 March 2011 13:49, giulio.cesare@gmail.com <giulio.cesare@gmail.com>
>> wrote:
>>>
>>> Hello Bertil,
>>>
>>> looking at the code of Droids, I spotted the critical point in the
>>> CrawlerWorker class (line 81):
>>>          Parser parser = droid.getParserFactory().getParser(contentType);
>>>
>>> A nice option would be to pass the full downloaded entity to
>>> ParserFactory in order to pick the right parser for the task.
>>> You may than have a content-type based ParserFactory (like the one
>>> implemented right now), or any other custom form of ParserFactory that
>>> can analyze the full downloaded entity in order to make the right
>>> choice.
>>>
>>> Would such a change have some options to be merged back into the main code
>>> base?
>>>
>>> Regards,
>>>
>>> Giulio Cesare
>>>
>>>
>>>
>>> On Tue, Mar 15, 2011 at 12:32 PM, Chapuis Bertil <bchapuis@agimem.com>
>>> wrote:
>>> > Hello Giulio,
>>> >
>>> > The Worker generally get a Parser from the ParserFactory by calling the
>>> > method with the mime type as argument. If the mime type is enough for
>>> > your
>>> > use case, you may want to create a custom ParserFactory. Otherwise a
>>> > possible solution may be to pass the ContentEntity to the ParserFactory
>>> > instead of the content type. In all case do not hesitate to open a
>>> > ticket if
>>> > you can't solve this issue.
>>> >
>>> > Best regards.
>>> >
>>> > On 15 March 2011 13:10, giulio.cesare@gmail.com
>>> > <giulio.cesare@gmail.com>wrote:
>>> >
>>> >> Hello everybody,
>>> >>
>>> >> I have just started using the Droids library and I am really enjoying
>>> >> it.
>>> >> I have customized a few classes and managed to create a simple proof
>>> >> of concept quite easily; the "major" problem was finding out where to
>>> >> actually set the User-Agent used for making requests.
>>> >>
>>> >> But trying to move to a little more complex scenario, I have stumbled
>>> >> into a problem.
>>> >>
>>> >> I would like to pick the Parser based not on the content-type header,
>>> >> but on actual content of the response.
>>> >>
>>> >> Before trying to sort out how to implement this feature, I wanted to
>>> >> ask if it is worth trying to extend the core classes in order to
>>> >> implement this feature; or is it a problem not worth a generic
>>> >> solution, so I better find a way to implement it without messing with
>>> >> core classes.
>>> >>
>>> >> Thanks for your attention.
>>> >>
>>> >> Regards,
>>> >>
>>> >> Giulio Cesare
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Bertil Chapuis
>>> > Agimem Sàrl
>>> > http://www.agimem.com
>>> >
>>
>>
>>
>> --
>> Bertil Chapuis
>> Agimem Sàrl
>> http://www.agimem.com
>>
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message