spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Macdonald (JIRA)" <>
Subject [jira] [Commented] (SPARK-19683) Support for libsvm-based learning-to-rank format
Date Wed, 22 Feb 2017 10:52:44 GMT


Craig Macdonald commented on SPARK-19683:

One might argue that ranking tasks can be as prevalent regression or classification. There
are also a multitude of LTR datasets:
  ** MSLR:
  ** LETOR:
  ** Yahoo learning to rank challenge:

I'm happy to make this within a separate application, but my secondary comment was that given
it was a simple extension to the libsvm dataframe reader, I was disappointed about how many
private classes that libsvm used that could not be easily reused.

> Support for libsvm-based learning-to-rank format
> ------------------------------------------------
>                 Key: SPARK-19683
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib
>    Affects Versions: 2.1.0
>            Reporter: Craig Macdonald
>            Priority: Minor
> I would like to use Spark for reading/processing Learning to Rank files. The standard
format is an extension of libsvm:
> {code}
> 0 qid:1 1:2.9 2:9.4 # docid=clueweb09-00-01492
> {code}
> Under the mlib API, a LabeledPoint would need an extension called QueryLabeledPoint.
> I would also like to investigate use through the DataFrame, extending the libsvm source,
however many of the classes/methods used there are private (e.g. LibSVMOptions, Datatype.sameType(),
VectorUDT). So would an extension to handle LTR format be better inside Spark or outside?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message