commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Tompkins <chtom...@gmail.com>
Subject Re: [text][lang] string escaping
Date Sat, 19 Nov 2016 15:38:30 GMT

> On Nov 19, 2016, at 6:33 AM, Benedikt Ritter <britter@apache.org> wrote:
> 
> Hello Gray,
> 
> Gary Gregory <garydgregory@gmail.com> schrieb am Sa., 19. Nov. 2016 um
> 01:07 Uhr:
> 
>> Just a thought:
>> 
>> Does all the current (and future) string escaping code (XML, HTML, ...)
>> really belong in [lang]? Would it be more natural to have it in [text]?
>> 
> 
> My view on the whole think currently is, that we put stuff that is related
> to strings in Lang. Code that works on texts should go to Text. To me a
> text is more than just a string. A text contains works, that make up
> sentences, which in turn build paragraphs.
> 
> Using this description, I'd argue that escaping belongs into lang and not
> into text, because it works on individual characters rather than on texts.

I think this is a difficult distinction to draw because fundamentally anything that does sufficient
text processing necessarily operates on a character by character basis. I propose below a
distinction more along the lines of potential usage.

> 
> But this would also raise the question if the various edit distance
> algorithms works on texts or on strings. So maybe my distinction is not
> good at all.
> 
> Do we need to better specify the scope of text?

I definitely agree with the sentiment that we should find a clear line of distinction between
lang and text with regards to strings. Some thoughts that spring to mind are more in the terms
of how the algorithms are to be used. 

So let’s consider the two extremes of the spectrum of string/word/text algorithms. On one
hand, we have utilities like “StringUtils.isBlank(String s)” which is ubiquitously used
in standard day to day and is a foundational extension of java. On the other hand, we have
algorithms like natural language processing or statistical processing of words for analysis
of biological sequences (two chapters in M. Lothaire’s “Applied Combinatorics on Words).
The extremes seem to point towards day-to-day usage in any variety of java applications, where
as the other extreme seems to point to an application that is specifically designed at string/word/text
processing. I don’t see folks in everyday usage wanting to find edit distance between two
strings unless they’re writing something specifically doing text processing or something
of that nature.

Now clearly the problem with this distinction is the amount of grey area that it leaves in
figuring out what goes where, so I don’t know if it’s the right way to go. It was just
the thought that came to mind.

Any thoughts out there?

Cheers,
-Rob

> 
> Benedikt
> 
> 
>> 
>> Gary
>> 
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <
>> https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8
>>> 
>> 
>> <http:////
>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
>> JUnit in Action, Second Edition
>> <
>> https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22
>>> 
>> 
>> <http:////
>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
>> Spring Batch in Action
>> <
>> https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action
>>> 
>> <http:////
>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message