That ticket applies only to the JCIFS connector, and other connectors that have to map extensions to mime types. The Web connector does not have to do that.
The Web connector has certain mime types it knows it can extract links from, but as far as content, it leaves that up to the output connection. Here's the code:
// There are presumably mime types we can extract links from that we can't index?
if (interestingMimeTypeMap.get(contentType) != null)
boolean rval = activities.checkMimeTypeIndexable(contentType);
if (rval == false && Logging.connectors.isDebugEnabled())
Logging.connectors.debug("Web: For document '"+documentIdentifier+"', not fetching because output connector does not want mimetype '"+contentType+"'");
You can tell if this is what is happening to your document by turning on connector debug (in properties.xml: <property name="org.apache.manifoldcf.connectors" value="DEBUG"/>). But if you are using the Solr connector, you can select the mime types desired on one of the job tabs.