johnzon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Dev <hendrikde...@gmail.com>
Subject Re: JsonLocation.getStreamOffset() return value unclear
Date Thu, 24 Jul 2014 14:26:26 GMT
https://issues.apache.org/jira/browse/FLEECE-9

On Thu, Jul 24, 2014 at 2:02 PM, Romain Manni-Bucau
<rmannibucau@gmail.com> wrote:
> yes
>
> but user and framework will look for the same I guess, at least short term.
>
>
>
> Romain Manni-Bucau
> Twitter: @rmannibucau
> Blog: http://rmannibucau.wordpress.com/
> LinkedIn: http://fr.linkedin.com/in/rmannibucau
> Github: https://github.com/rmannibucau
>
>
> 2014-07-24 13:56 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>
>> yes, lets make this the default
>>
>> but beside exception reporting there is another use case for the location:
>>
>> "Provides the location information of a JSON event in an input source.
>> The JsonLocation information can be used to identify incorrect JSON or
>> can be used
>> by higher frameworks to know about the processing location."
>>
>> So maybe high level processing stuff will rely on this?
>>
>> But on the other side the API says that its perfectly valid to always
>> return -1,-1,-1
>> "All the information provided by a JsonLocation is optional. For
>> example, a provider may only report line numbers. Also, there may not
>> be any location information for an input source."
>> Crazy, isn't it?
>>
>> Thanks
>> Hendrik
>>
>> On Thu, Jul 24, 2014 at 1:46 PM, Romain Manni-Bucau
>> <rmannibucau@gmail.com> wrote:
>> > reviewing quickly JsonLocation is only useful when there is an exception
>> > "you suck at line 3, column 6, offset 18". So we need to be able to open
>> > it, go here in gedit/notepad++/other and check the syntax
>> error...otherwise
>> > whatever clever counting is done it is really useless.
>> >
>> > If for passing tcks we need to break it we'll do but I'm sure we'll keep
>> > this as default, no?
>> >
>> >
>> >
>> > Romain Manni-Bucau
>> > Twitter: @rmannibucau
>> > Blog: http://rmannibucau.wordpress.com/
>> > LinkedIn: http://fr.linkedin.com/in/rmannibucau
>> > Github: https://github.com/rmannibucau
>> >
>> >
>> > 2014-07-24 13:42 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>> >
>> >> On Thu, Jul 24, 2014 at 1:35 PM, Romain Manni-Bucau
>> >> <rmannibucau@gmail.com> wrote:
>> >> > Think we start with 1. But for column I don't really care, we can
>> align
>> >> on
>> >> > RI.
>> >> >
>> >> > For offset not sure what is complicated but we should ensure offset
>> >> > corresponds to the sum of previously parsed columns for all lines.
>> While
>> >> > this is consistent global system works.
>> >>
>> >> i agree but API says IMHO different things (column is always chars,
>> >> offset can be bytes or chars according to jsr)
>> >>
>> >> my proposal is to keeps thing easy for now until we have tck. Will
>> >> start column with 1 (its common and expected IMHO) and defer byte/char
>> >> count stuff until tck arrives.
>> >> RI is also not counting different for bytes and chars.
>> >>
>> >> Kind regards
>> >> Hendrik
>> >>
>> >>
>> >> >
>> >> >
>> >> >
>> >> > Romain Manni-Bucau
>> >> > Twitter: @rmannibucau
>> >> > Blog: http://rmannibucau.wordpress.com/
>> >> > LinkedIn: http://fr.linkedin.com/in/rmannibucau
>> >> > Github: https://github.com/rmannibucau
>> >> >
>> >> >
>> >> > 2014-07-24 12:39 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>> >> >
>> >> >> doing this efficiently is more complicated than i thought. Can
we not
>> >> >> simply just count 2 bytes for one char ;-)
>> >> >>
>> >> >> BTW, seem the JsonLocation column value leave also room for
>> >> interpretation:
>> >> >>
>> >> >> Is the most left column 0 or 1? Texteditors for example start with
>> >> >> column 1 (there is never a column 0) but RI starts with 0.
>> >> >>
>> >> >> Regards
>> >> >> Hendrik
>> >> >>
>> >> >>
>> >> >> On Wed, Jul 23, 2014 at 1:49 PM, Hendrik Dev <hendrikdev22@gmail.com
>> >
>> >> >> wrote:
>> >> >> > agree, will make it so
>> >> >> >
>> >> >> > On Wed, Jul 23, 2014 at 1:28 PM, Romain Manni-Bucau
>> >> >> > <rmannibucau@gmail.com> wrote:
>> >> >> >> Hi
>> >> >> >>
>> >> >> >> I agree wording is wrong but IMO it is not ambiguous:
we get an
>> >> >> inputstream
>> >> >> >> or reader (and we *don't* want to check it is a file or
not) so we
>> >> just
>> >> >> >> count the chars or bytes we read. All other implementation
would
>> >> lead to
>> >> >> >> confusion IMO (make default text file reader compliant
friendly).
>> >> >> >>
>> >> >> >> We can start this way and if we have issues go further
but I
>> really
>> >> >> doubt
>> >> >> >> we need it.
>> >> >> >>
>> >> >> >> What's your opinion?
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> Romain Manni-Bucau
>> >> >> >> Twitter: @rmannibucau
>> >> >> >> Blog: http://rmannibucau.wordpress.com/
>> >> >> >> LinkedIn: http://fr.linkedin.com/in/rmannibucau
>> >> >> >> Github: https://github.com/rmannibucau
>> >> >> >>
>> >> >> >>
>> >> >> >> 2014-07-23 13:21 GMT+02:00 Hendrik Dev <hendrikdev22@gmail.com>:
>> >> >> >>
>> >> >> >>> Hi,
>> >> >> >>>
>> >> >> >>> the JSR 353 API says about JsonLocation.getStreamOffset()
>> >> >> >>>
>> >> >> >>> "long getStreamOffset()
>> >> >> >>>
>> >> >> >>> Return the stream offset into the input source this
location is
>> >> >> >>> pointing to. If the input source is a file or a byte
stream then
>> >> this
>> >> >> >>> is the byte offset into that stream, but if the input
source is a
>> >> >> >>> character media then the offset is the character offset.
Returns
>> -1
>> >> if
>> >> >> >>> there is no offset available."
>> >> >> >>>
>> >> >> >>> There are IMHO two issues here:
>> >> >> >>>
>> >> >> >>> 1) How can we know that the input source is a file(stream)?
We
>> can
>> >> >> >>> only know if the parser  read from an Inputstream
(=byte stream)
>> or
>> >> >> >>> from an Reader (=character stream). Wording here is
>> >> unclear/ambiguous.
>> >> >> >>>
>> >> >> >>> 2) Since a UTF8 or UTF16 character can map to one,
two, three or
>> >> four
>> >> >> >>> bytes the output can be very confusing (especially
if the user
>> don't
>> >> >> >>> know whether the parser was constructed form a byte
or character
>> >> >> >>> stream and which charset is used).
>> >> >> >>>
>> >> >> >>> Seems that the RI is not implementing these distinctions,
if i
>> >> looked
>> >> >> >>> correctly they always return character offsets.
>> >> >> >>>
>> >> >> >>> So want we want do to?
>> >> >> >>>
>> >> >> >>> Thanks
>> >> >> >>> Hendrik
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Hendrik Saly (salyh, hendrikdev22)
>> >> >> >>> @hendrikdev22
>> >> >> >>> PGP: 0x22D7F6EC
>> >> >> >>>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Hendrik Saly (salyh, hendrikdev22)
>> >> >> > @hendrikdev22
>> >> >> > PGP: 0x22D7F6EC
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Hendrik Saly (salyh, hendrikdev22)
>> >> >> @hendrikdev22
>> >> >> PGP: 0x22D7F6EC
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Hendrik Saly (salyh, hendrikdev22)
>> >> @hendrikdev22
>> >> PGP: 0x22D7F6EC
>> >>
>>
>>
>>
>> --
>> Hendrik Saly (salyh, hendrikdev22)
>> @hendrikdev22
>> PGP: 0x22D7F6EC
>>



-- 
Hendrik Saly (salyh, hendrikdev22)
@hendrikdev22
PGP: 0x22D7F6EC

Mime
View raw message