johnzon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Dev <hendrikde...@gmail.com>
Subject Re: JsonLocation.getStreamOffset() return value unclear
Date Thu, 24 Jul 2014 11:56:57 GMT
yes, lets make this the default

but beside exception reporting there is another use case for the location:

"Provides the location information of a JSON event in an input source.
The JsonLocation information can be used to identify incorrect JSON or
can be used
by higher frameworks to know about the processing location."

So maybe high level processing stuff will rely on this?

But on the other side the API says that its perfectly valid to always
return -1,-1,-1
"All the information provided by a JsonLocation is optional. For
example, a provider may only report line numbers. Also, there may not
be any location information for an input source."
Crazy, isn't it?

Thanks
Hendrik

On Thu, Jul 24, 2014 at 1:46 PM, Romain Manni-Bucau
<rmannibucau@gmail.com> wrote:
> reviewing quickly JsonLocation is only useful when there is an exception
> "you suck at line 3, column 6, offset 18". So we need to be able to open
> it, go here in gedit/notepad++/other and check the syntax error...otherwise
> whatever clever counting is done it is really useless.
>
> If for passing tcks we need to break it we'll do but I'm sure we'll keep
> this as default, no?
>
>
>
> Romain Manni-Bucau
> Twitter: @rmannibucau
> Blog: http://rmannibucau.wordpress.com/
> LinkedIn: http://fr.linkedin.com/in/rmannibucau
> Github: https://github.com/rmannibucau
>
>
> 2014-07-24 13:42 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>
>> On Thu, Jul 24, 2014 at 1:35 PM, Romain Manni-Bucau
>> <rmannibucau@gmail.com> wrote:
>> > Think we start with 1. But for column I don't really care, we can align
>> on
>> > RI.
>> >
>> > For offset not sure what is complicated but we should ensure offset
>> > corresponds to the sum of previously parsed columns for all lines. While
>> > this is consistent global system works.
>>
>> i agree but API says IMHO different things (column is always chars,
>> offset can be bytes or chars according to jsr)
>>
>> my proposal is to keeps thing easy for now until we have tck. Will
>> start column with 1 (its common and expected IMHO) and defer byte/char
>> count stuff until tck arrives.
>> RI is also not counting different for bytes and chars.
>>
>> Kind regards
>> Hendrik
>>
>>
>> >
>> >
>> >
>> > Romain Manni-Bucau
>> > Twitter: @rmannibucau
>> > Blog: http://rmannibucau.wordpress.com/
>> > LinkedIn: http://fr.linkedin.com/in/rmannibucau
>> > Github: https://github.com/rmannibucau
>> >
>> >
>> > 2014-07-24 12:39 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>> >
>> >> doing this efficiently is more complicated than i thought. Can we not
>> >> simply just count 2 bytes for one char ;-)
>> >>
>> >> BTW, seem the JsonLocation column value leave also room for
>> interpretation:
>> >>
>> >> Is the most left column 0 or 1? Texteditors for example start with
>> >> column 1 (there is never a column 0) but RI starts with 0.
>> >>
>> >> Regards
>> >> Hendrik
>> >>
>> >>
>> >> On Wed, Jul 23, 2014 at 1:49 PM, Hendrik Dev <hendrikdev22@gmail.com>
>> >> wrote:
>> >> > agree, will make it so
>> >> >
>> >> > On Wed, Jul 23, 2014 at 1:28 PM, Romain Manni-Bucau
>> >> > <rmannibucau@gmail.com> wrote:
>> >> >> Hi
>> >> >>
>> >> >> I agree wording is wrong but IMO it is not ambiguous: we get an
>> >> inputstream
>> >> >> or reader (and we *don't* want to check it is a file or not) so
we
>> just
>> >> >> count the chars or bytes we read. All other implementation would
>> lead to
>> >> >> confusion IMO (make default text file reader compliant friendly).
>> >> >>
>> >> >> We can start this way and if we have issues go further but I really
>> >> doubt
>> >> >> we need it.
>> >> >>
>> >> >> What's your opinion?
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> Romain Manni-Bucau
>> >> >> Twitter: @rmannibucau
>> >> >> Blog: http://rmannibucau.wordpress.com/
>> >> >> LinkedIn: http://fr.linkedin.com/in/rmannibucau
>> >> >> Github: https://github.com/rmannibucau
>> >> >>
>> >> >>
>> >> >> 2014-07-23 13:21 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> the JSR 353 API says about JsonLocation.getStreamOffset()
>> >> >>>
>> >> >>> "long getStreamOffset()
>> >> >>>
>> >> >>> Return the stream offset into the input source this location
is
>> >> >>> pointing to. If the input source is a file or a byte stream
then
>> this
>> >> >>> is the byte offset into that stream, but if the input source
is a
>> >> >>> character media then the offset is the character offset. Returns
-1
>> if
>> >> >>> there is no offset available."
>> >> >>>
>> >> >>> There are IMHO two issues here:
>> >> >>>
>> >> >>> 1) How can we know that the input source is a file(stream)?
We can
>> >> >>> only know if the parser  read from an Inputstream (=byte stream)
or
>> >> >>> from an Reader (=character stream). Wording here is
>> unclear/ambiguous.
>> >> >>>
>> >> >>> 2) Since a UTF8 or UTF16 character can map to one, two, three
or
>> four
>> >> >>> bytes the output can be very confusing (especially if the user
don't
>> >> >>> know whether the parser was constructed form a byte or character
>> >> >>> stream and which charset is used).
>> >> >>>
>> >> >>> Seems that the RI is not implementing these distinctions, if
i
>> looked
>> >> >>> correctly they always return character offsets.
>> >> >>>
>> >> >>> So want we want do to?
>> >> >>>
>> >> >>> Thanks
>> >> >>> Hendrik
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Hendrik Saly (salyh, hendrikdev22)
>> >> >>> @hendrikdev22
>> >> >>> PGP: 0x22D7F6EC
>> >> >>>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Hendrik Saly (salyh, hendrikdev22)
>> >> > @hendrikdev22
>> >> > PGP: 0x22D7F6EC
>> >>
>> >>
>> >>
>> >> --
>> >> Hendrik Saly (salyh, hendrikdev22)
>> >> @hendrikdev22
>> >> PGP: 0x22D7F6EC
>> >>
>>
>>
>>
>> --
>> Hendrik Saly (salyh, hendrikdev22)
>> @hendrikdev22
>> PGP: 0x22D7F6EC
>>



-- 
Hendrik Saly (salyh, hendrikdev22)
@hendrikdev22
PGP: 0x22D7F6EC

Mime
View raw message