nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cihad Guzel <cguz...@gmail.com>
Subject Re: Nutch-1741 in GSOC 2015
Date Mon, 18 May 2015 22:19:54 GMT
Hi all,

I want to introduce myself.

I am a Computer Engineer and I am doing master now. I like coding.I have
been following some open source project for about 3 years. I am goaling to
make some contribution with GSOC in opensource community.

I also worked about frontend, middleware, backed development via enterprise
java technologies. Furthermore, experienced “Web Technologies”, "Search
Technologies", "Cloud Computing", "Distributed Systems" and "Big Data". I
took place in search engine project that Apache technologies were used such
as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project actively in
this project. You can see more information on my linkedin profile[1] about
me.

I mention some information for my process. My subject is "Nutch-1741 -
Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
from only pages that were scanned before in nutch crawler system. Also, the
degrees of importance and “change frequence” of these urls are not known
only guessed. But, it is possible to find the whole of urls in a up-to-date
sitemap file. For this reason, sitemap files in website should be crawled.

I have explained the features for this project on my proposal. I’ll add it
to wiki and you can see details of it on wiki at when I share . You can see
nutch sitemap lifecycle the drawing [3].

[1] https://tr.linkedin.com/in/cihadguzel

[2] https://issues.apache.org/jira/browse/NUTCH-1741

[3]
https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf

Kind Regards


2015-05-19 1:16 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:

> Ok Lewis,
> I signed up to wiki, my wiki username: cihadguzel
>
> Thanks
>
> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com
> >:
>
>> Fantastic Cihad,
>> Thank you for introducing yourself.
>> As you are in the community bonding period right now, please feel free to
>> provide your wiki username to me and I will grant you access to the wiki.
>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>
>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>> Thanks
>> Lewis
>>
>>
>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I had applied the GSoC 2015 for Apache Nutch Project and my application
>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC
>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of
>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer
>>> for being my mentors on this process. I hope I can contribute to this
>>> project.
>>>
>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>
>>> Kind Regards
>>>
>>
>>
>>
>> --
>> *Lewis*
>>
>
>

Mime
View raw message