nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From axi <axi...@gmail.com>
Subject Re: Alt text of images as anchor text
Date Wed, 20 Jan 2010 22:08:28 GMT

I'll try that, 
but the real anchor text is in  
On Wed, Jan 20, 2010 at 8:11 PM, axi <axierr@gmail.com> wrote:
>
> If you put image as link, is commonly known that alt text of that image is
> equivalent to the anchor text of text link. Now if you put an image with
> alt
> text inside a link, anchor text for that link is empty and no image alt
> text
> is counted.

are you crawling for images? or

http://svn.apache.org/repos/asf/lucene/nutch/trunk/conf/crawl-urlfilter.txt.template

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$

>
> Nutch Newbie wrote:
>>
>> On Wed, Jan 20, 2010 at 4:16 PM, axi <axierr@gmail.com> wrote:
>>>
>>> after several test, I have noticed that nutch ignores alt text of images
>>> inside  " tags.
>  So, this feature isn't implemented yet right?
>>
>> what exactly you want nutch should do to the "alt text" index it?
>> tokenize it? make this field available as query i.e. "img_alt:my alt
>> tags" or?
>>
>>
>>>
>>>
>>> thanks in advance,
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27244358.html
>>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27247820.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>



-- 
View this message in context: http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27249488.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Mime
View raw message