Hi all,

I fork nutch on my github acoount [1] . So you can see my next commits.
[1] https://github.com/cguzel/nutch

--
Kind Regards
Cihad Güzel

2015-05-20 23:50 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
Hi all. 

I have added my proposal to nutch wiki. You can see details of "Sitemap Crawler" from here [1].


--
Kind Regards


2015-05-19 1:19 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:

Hi all,


I want to introduce myself.


I am a Computer Engineer and I am doing master now. I like coding.I have been following some open source project for about 3 years. I am goaling to make some contribution with GSOC in opensource community.


I also worked about frontend, middleware, backed development via enterprise java technologies. Furthermore, experienced “Web Technologies”, "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big Data". I took place in search engine project that Apache technologies were used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project actively in this project. You can see more information on my linkedin profile[1] about me.


I mention some information for my process. My subject is "Nutch-1741 - Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got from only pages that were scanned before in nutch crawler system. Also, the degrees of importance and “change frequence” of these urls are not known only guessed. But, it is possible to find the whole of urls in a up-to-date sitemap file. For this reason, sitemap files in website should be crawled.


I have explained the features for this project on my proposal. I’ll add it to wiki and you can see details of it on wiki at when I share . You can see nutch sitemap lifecycle the drawing [3].


[1] https://tr.linkedin.com/in/cihadguzel

[2] https://issues.apache.org/jira/browse/NUTCH-1741

[3] https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf


Kind Regards



2015-05-19 1:16 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
Ok Lewis,
I signed up to wiki, my wiki username: cihadguzel

Thanks

2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>:
Fantastic Cihad,
Thank you for introducing yourself.
As you are in the community bonding period right now, please feel free to provide your wiki username to me and I will grant you access to the wiki.
Please also feel free to pick up some lingering issues for Nutch 2.3.1
https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
Thanks
Lewis


On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi all, 

I had applied the GSoC 2015 for Apache Nutch Project and my application is accepted. The main reason why I have choosen the Nutch Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 - Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer for being my mentors on this process. I hope I can contribute to this project. 


Kind Regards



--
Lewis