nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Talat Uyarer <ta...@uyarer.com>
Subject Re: Nutch-1741 in GSOC 2015
Date Mon, 25 May 2015 08:06:13 GMT
Superb Cihad! This would be easy following your works.

2015-05-25 9:53 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
> Hi all,
>
> I fork nutch on my github acoount [1] . So you can see my next commits.
> [1] https://github.com/cguzel/nutch
>
> --
> Kind Regards
> Cihad Güzel
>
> 2015-05-20 23:50 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
>>
>> Hi all.
>>
>> I have added my proposal to nutch wiki. You can see details of "Sitemap
>> Crawler" from here [1].
>>
>> [1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>>
>> --
>> Kind Regards
>>
>>
>> 2015-05-19 1:19 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
>>>
>>> Hi all,
>>>
>>>
>>> I want to introduce myself.
>>>
>>>
>>> I am a Computer Engineer and I am doing master now. I like coding.I have
>>> been following some open source project for about 3 years. I am goaling to
>>> make some contribution with GSOC in opensource community.
>>>
>>>
>>> I also worked about frontend, middleware, backed development via
>>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>>> Data". I took place in search engine project that Apache technologies were
>>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>>> actively in this project. You can see more information on my linkedin
>>> profile[1] about me.
>>>
>>>
>>> I mention some information for my process. My subject is "Nutch-1741 -
>>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got
>>> from only pages that were scanned before in nutch crawler system. Also, the
>>> degrees of importance and “change frequence” of these urls are not known
>>> only guessed. But, it is possible to find the whole of urls in a up-to-date
>>> sitemap file. For this reason, sitemap files in website should be crawled.
>>>
>>>
>>> I have explained the features for this project on my proposal. I’ll add
>>> it to wiki and you can see details of it on wiki at when I share . You can
>>> see nutch sitemap lifecycle the drawing [3].
>>>
>>>
>>> [1] https://tr.linkedin.com/in/cihadguzel
>>>
>>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>>
>>> [3]
>>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>>
>>>
>>> Kind Regards
>>>
>>>
>>>
>>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
>>>>
>>>> Ok Lewis,
>>>> I signed up to wiki, my wiki username: cihadguzel
>>>>
>>>> Thanks
>>>>
>>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney
>>>> <lewis.mcgibbney@gmail.com>:
>>>>>
>>>>> Fantastic Cihad,
>>>>> Thank you for introducing yourself.
>>>>> As you are in the community bonding period right now, please feel free
>>>>> to provide your wiki username to me and I will grant you access to the
wiki.
>>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>>
>>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>>> Thanks
>>>>> Lewis
>>>>>
>>>>>
>>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cguzelg@gmail.com>
wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741
-
>>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney
and Talat
>>>>>> Uyarer for being my mentors on this process. I hope I can contribute
to this
>>>>>> project.
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>>
>>>>>> Kind Regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lewis
>>>>
>>>>
>>>
>>
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Mime
View raw message