lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Thacker (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6127) Improve Solr's exampledocs data
Date Mon, 02 Jun 2014 08:40:02 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Thacker updated SOLR-6127:
--------------------------------

    Attachment: freebase_film_dump.py

I thought Freebase would be a good place to get data from. 

[~thetaphi] - Would using the data from freebase ( https://developers.google.com/freebase/faq#rules_for_using_data
) be a licensing issue?

If thats not a concern here is a script which fetches 200 rows of film data ( http://www.freebase.com/film
) and dumps it into JSON, XML and CSV.

The number of documents can be adjusted. You would need to put in the API KEY for it to run.

Any opinions if this is a good idea?

> Improve Solr's exampledocs data
> -------------------------------
>
>                 Key: SOLR-6127
>                 URL: https://issues.apache.org/jira/browse/SOLR-6127
>             Project: Solr
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Varun Thacker
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: freebase_film_dump.py
>
>
> Currently 
> - The CSV example has 10 documents.
> - The JSON example has 4 documents.
> - The XML example has 32 documents.
> 1. We should have equal number of documents and the same documents in all the example
formats
> 2. A data set which is slightly more comprehensive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message