mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Musselman <andrew.mussel...@gmail.com>
Subject Re: clusterdump - structure of JSON output
Date Wed, 02 Apr 2014 20:53:52 GMT
Looks like a bug to me as well; I would have expected something similar to
what you were expecting except maybe something like this which puts the "c"
and "r" values in objects rather than arrays of single-element objects:

{
    "cluster":"VL-10515",
    "n":5924,
    "c":
    {
        "action":0.023,
        "adherence":0.223,
        "administration":0.011
    },
    "r":
    {
       "action":0.446,
       "adherence":1.501,
       "administration":0.306
    }
}

Could you please file a ticket in Jira describing what you've described
here?

Thanks
Andrew


On Wed, Apr 2, 2014 at 3:45 PM, Terry Blankers <terry@amritanet.com> wrote:

> Hi all, I'm working on some automated analysis of the clusterdump output
> using '-of = JSON'. While digging into the structure of the representation
> of the data I've noticed something that seems a little odd to me.
>
> In order to access the data for a particular cluster, the 'cluster', 'n',
> 'c' & 'r' values are all in one continuous string. For example:
>
> {"cluster":"VL-10515{n=5924 c=[action:0.023, adherence:0.223,
> administration:0.011 r=[action:0.446, adherence:1.501,
> administration:0.306]}"}
>
> This is also the case for the "point":
>
> {"point":"013FFD34580BA31AECE5D75DE65478B3D691D138 = [body:6.904,
> harm:10.101]","vector_name":"013FFD34580BA31AECE5D75DE65478
> B3D691D138","weight":"1.0"}
>
> This leads me to believe that the only way I can get to the individual
> data in these items is by string parsing. For JSON deserialization I would
> have expected to see something along the lines of:
>
> {
>     "cluster":"VL-10515",
>     "n":5924,
>     "c":
>     [
>         {"action":0.023},
>         {"adherence":0.223},
>         {"administration":0.011}
>     ],
>     "r":
>     [
>         {"action":0.446},
>         {"adherence":1.501},
>         {"administration":0.306}
>     ]
> }
>
> and:
>
> {
>     "point": {
>         "body": 6.904,
>         "harm": 10.101
>     },
>     "vector_name": "013FFD34580BA31AECE5D75DE65478B3D691D138",
>     "weight": 1.0
> }
>
>
> Please forgive the naive question if I'm missing something obvious, but
> can anybody explain the rationale for the current structure of the JSON? Is
> there another efficient way to access the items in question using JSON
> without using custom string parsing logic? Or would it make sense to modify
> the json output from clusterdump?
>
>
> Thanks,
>
> Terry
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message