nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cihad Guzel <cguz...@gmail.com>
Subject Re: Nutch-1741 in GSOC 2015
Date Wed, 20 May 2015 20:50:47 GMT
Hi all.

I have added my proposal to nutch wiki. You can see details of "Sitemap
Crawler" from here [1].

[1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler

--
Kind Regards


2015-05-19 1:19 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:

> Hi all,
>
> I want to introduce myself.
>
> I am a Computer Engineer and I am doing master now. I like coding.I have
> been following some open source project for about 3 years. I am goaling to
> make some contribution with GSOC in opensource community.
>
> I also worked about frontend, middleware, backed development via
> enterprise java technologies. Furthermore, experienced “Web Technologies”,
> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
> Data". I took place in search engine project that Apache technologies were
> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
> actively in this project. You can see more information on my linkedin
> profile[1] about me.
>
> I mention some information for my process. My subject is "Nutch-1741 -
> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
> from only pages that were scanned before in nutch crawler system. Also, the
> degrees of importance and “change frequence” of these urls are not known
> only guessed. But, it is possible to find the whole of urls in a up-to-date
> sitemap file. For this reason, sitemap files in website should be crawled.
>
> I have explained the features for this project on my proposal. I’ll add it
> to wiki and you can see details of it on wiki at when I share . You can see
> nutch sitemap lifecycle the drawing [3].
>
> [1] https://tr.linkedin.com/in/cihadguzel
>
> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>
> [3]
> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>
> Kind Regards
>
>
> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
>
>> Ok Lewis,
>> I signed up to wiki, my wiki username: cihadguzel
>>
>> Thanks
>>
>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com>:
>>
>>> Fantastic Cihad,
>>> Thank you for introducing yourself.
>>> As you are in the community bonding period right now, please feel free
>>> to provide your wiki username to me and I will grant you access to the wiki.
>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>> Thanks
>>> Lewis
>>>
>>>
>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>>> for being my mentors on this process. I hope I can contribute to this
>>>> project.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>
>>>> Kind Regards
>>>>
>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>

Mime
View raw message