spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Craig Macdonald (JIRA)" <>
Subject [jira] [Created] (SPARK-19683) Support for libsvm-based learning-to-rank format
Date Tue, 21 Feb 2017 20:12:51 GMT
Craig Macdonald created SPARK-19683:

             Summary: Support for libsvm-based learning-to-rank format
                 Key: SPARK-19683
             Project: Spark
          Issue Type: New Feature
          Components: ML, MLlib
    Affects Versions: 2.1.0
            Reporter: Craig Macdonald
            Priority: Minor

I would like to use Spark for reading/processing Learning to Rank files. The standard format
is an extension of libsvm:

0 qid:1 1:2.9 2:9.4 # docid=clueweb09-00-01492

Under the mlib API, a LabeledPoint would need an extension called QueryLabeledPoint.

I would also like to investigate use through the DataFrame, extending the libsvm source, however
many of the classes/methods used there are private (e.g. LibSVMOptions, Datatype.sameType(),
VectorUDT). So would an extension to handle LTR format be better inside Spark or outside?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message