poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yegor Kozlov <yegor.koz...@dinom.ru>
Subject Re: Apache POI 3.8 (SXSSFWorkbook) - Unreadable Content
Date Wed, 03 Aug 2011 13:29:50 GMT
It would be a valuable contribution if you generate two worksheets
containing strings with \u00a0. One with SXSSF and the other with
XSSF. You can use my code snippet. Then unzip the files, compare and
figure out what's wrong with handling  \u00a0 in SXSSF.

You want to look at sharedStrings.xml and sheet1.xml.

Yegor

On Wed, Aug 3, 2011 at 4:22 PM, Guilherme Vieira <jguilhermemv@gmail.com> wrote:
> I can help you if you can. We can work together to fix the issues and
> develop new features.
>
> Cheers.
> José Guilherme Macedo Vieira
>
>
> 2011/8/3 Yegor Kozlov <yegor.kozlov@dinom.ru>
>
>> So far the plan is to release it in late August.
>>
>> Yegor
>>
>> On Wed, Aug 3, 2011 at 3:01 PM, Guilherme Vieira <jguilhermemv@gmail.com>
>> wrote:
>> > Yegor,
>> >
>> > I'm glad to help you to find this issue. This is exactly the problem. By
>> now
>> > I'm trying to fix it in my own code by replacing these characters. Though
>> > the beta4 is not out yet I'm still using it in my project because I need
>> to
>> > write an excel with >100.000 lines. Do you know when the beta4 is gonna
>> be
>> > out?
>> >
>> > Cheers,
>> > José Guilherme Macedo Vieira
>> >
>> >
>> >
>> > 2011/8/3 Yegor Kozlov <yegor.kozlov@dinom.ru>
>> >
>> >> The culprit is the non-break space (charcode=\u00a0). I was able to
>> >> reproduce the trouble with the following code:
>> >>
>> >>        Workbook wb = new SXSSFWorkbook();
>> >>        Sheet sh = wb.createSheet();
>> >>         Row row = sh.createRow(0);
>> >>        row.createCell(0).setCellValue("ALEXANDRE\u00a0MARINHO DE
>> SOUZA");
>> >>         FileOutputStream out = new FileOutputStream("/temp/test.xlsx");
>> >>        wb.write(out);
>> >>        out.close();
>> >>
>> >> The fix is coming soon and will be included in 3.8-beta4.
>> >>
>> >> Cheers,
>> >> Yegor
>> >>
>> >> On Wed, Aug 3, 2011 at 1:59 PM, Guilherme Vieira <
>> jguilhermemv@gmail.com>
>> >> wrote:
>> >> > Dear Yegor,
>> >> >
>> >> > Your tip didn't work. So I guessed that there was a non-printable
>> >> character
>> >> > instead of white spaces. That said I tried to encode it with
>> >> > URLEncoder.encode("the name goes here","ASCII"); and guess what? The
>> >> encoded
>> >> > name is as below:
>> >> >
>> >> > ALEXANDRE%3BF+MARINHO+DE+SOUZA
>> >> >
>> >> > It interesting because I can't remove it with replace all because we
>> have
>> >> > non-printable characters. So, I'm trying to find a regular expression
>> >> that
>> >> > matches to these expressions (%3BF and + ,respectively). It would be
>> nice
>> >> if
>> >> > I could find a regular expression that matches to any special
>> >> non-printable
>> >> > characters. So, how do I proceed?
>> >> >
>> >> > And thanks in advance for your answer as well for your GREAT work in
>> >> Apache
>> >> > POI with the Big Grid Demo approach. It is just wonderful. Can't wait
>> for
>> >> > the final release (3.8-beta4).
>> >> >
>> >> > Best regards,
>> >> > José Guilherme Macedo Vieira
>> >> >
>> >> >
>> >> > 2011/8/3 Yegor Kozlov <yegor.kozlov@dinom.ru>
>> >> >
>> >> >> Tweak your report generator and try the following tricks before
>> >> >> passing strings to SXSSFCell:
>> >> >>
>> >> >>  (a) string.replaceAll("\\s+", ""); // replace multiple white
spaces
>> >> >> with a single space
>> >> >>  (b) string.replace(' ', '_'); // replace white spaces with
>> underscore
>> >> >>
>> >> >> Does any of (a) and (b) help?
>> >> >>
>> >> >> My hunch is that the problem is in something else, not in double
>> white
>> >> >> spaces. At least, I can't reproduce the problem with the following
>> >> >> code snippet:
>> >> >>
>> >> >>        Workbook wb = new SXSSFWorkbook();
>> >> >>        Sheet sh = wb.createSheet();
>> >> >>        for(int i = 0; i < 10000; i++) {
>> >> >>            Row row = sh.createRow(i);
>> >> >>            row.createCell(0).setCellValue("ALEXANDRE__MARINHO
DE
>> >> SOUZA");
>> >> >>            row.createCell(1).setCellValue("ALEXANDRE MARINHO
DE
>> SOUZA");
>> >> >>            row.createCell(2).setCellValue("ALEXANDRE  MARINHO
DE
>> >> SOUZA");
>> >> >>            row.createCell(3).setCellValue("ALEXANDRE   MARINHO
DE
>> >> SOUZA");
>> >> >>        }
>> >> >>
>> >> >>        FileOutputStream out = new
>> FileOutputStream("/temp/test.xlsx");
>> >> >>        wb.write(out);
>> >> >>        out.close();
>> >> >>
>> >> >> The generated file is readable and all spaces are there.
>> >> >>
>> >> >> Yegor
>> >> >>
>> >> >> On Tue, Aug 2, 2011 at 11:49 PM, Guilherme Vieira
>> >> >> <jguilhermemv@gmail.com> wrote:
>> >> >> > So, I've searched column by column in the problematic line
in order
>> to
>> >> >> > identify the problem. The problem is quite weird. It's a string
>> column
>> >> in
>> >> >> > the database. This column stores people names.
>> >> >> >
>> >> >> > In my problem the name is: ALEXANDRE__MARINHO DE SOUZA
>> >> >> >
>> >> >> > Of course, without the underline character. Instead it is
a
>> whitespace
>> >> >> > character. So, when with double whitespace character the file
is
>> >> >> corrupted.
>> >> >> > And when I manually remove the one whitespace in the IDE,
the file
>> is
>> >> >> also
>> >> >> > corrupted. But when I change the whole name manually in the
IDE,
>> >> setting
>> >> >> the
>> >> >> > value to ALEXANDRE_MARINHO DE SOUZA, it works. It's strange.
I
>> don't
>> >> know
>> >> >> > why SXSSF is not accepting two whitespaces.
>> >> >> >
>> >> >> > Anyone have a clue?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > 2011/8/2 jguilhermemv <jguilhermemv@gmail.com>
>> >> >> >
>> >> >> >> I tried without merged region and it didn't work. So,
I noticed
>> that
>> >> >> there
>> >> >> >> is a line in the file which present the error. It's the
line
>> (2451)
>> >> and
>> >> >> >> until the line 2450 everything works great. But for some
reason
>> when
>> >> it
>> >> >> >> reach the line 2450 it just doesn't work. I checked if
the was any
>> >> null
>> >> >> >> values, but there wasn't. The writing routine is right,
otherwise
>> it
>> >> >> >> wouldn't write until the line 2450.
>> >> >> >>
>> >> >> >> What can I do now?
>> >> >> >>
>> >> >> >> Best regards.
>> >> >> >> José Guilherme Macedo Vieira
>> >> >> >>
>> >> >> >>
>> >> >> >> 2011/8/2 Nick Burch-11 [via Apache POI] <
>> >> >> >> ml-node+4658878-753894702-237524@n5.nabble.com>
>> >> >> >>
>> >> >> >> > On Tue, 2 Aug 2011, jguilhermemv wrote:
>> >> >> >> > > Regarding the file, it makes use of some CellStyles
and Merged
>> >> >> Regions.
>> >> >> >> >
>> >> >> >> > Try without them, and see if that fixes it. You need
to narrow
>> your
>> >> >> >> > problem down before you can figure out what to correct.
Try to
>> >> >> identify
>> >> >> >> > the simplest file that fails, and the most complex
one that
>> works,
>> >> the
>> >> >> >> gap
>> >> >> >> > there is your issue
>> >> >> >> >
>> >> >> >> > Nick
>> >> >> >> >
>> >> >> >> >
>> >> ---------------------------------------------------------------------
>> >> >> >> > To unsubscribe, e-mail: [hidden email]<
>> >> >> >> http://user/SendEmail.jtp?type=node&node=4658878&i=0>
>> >> >> >> > For additional commands, e-mail: [hidden email]<
>> >> >> >> http://user/SendEmail.jtp?type=node&node=4658878&i=1>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ------------------------------
>> >> >> >> >  If you reply to this email, your message will be
added to the
>> >> >> discussion
>> >> >> >> > below:
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> http://apache-poi.1045710.n5.nabble.com/Apache-POI-3-8-SXSSFWorkbook-Unreadable-Content-tp4658852p4658878.html
>> >> >> >> >  To unsubscribe from Apache POI 3.8 (SXSSFWorkbook)
- Unreadable
>> >> >> Content,
>> >> >> >> click
>> >> >> >> > here<
>> >> >> >>
>> >> >>
>> >>
>> http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4658852&code=amd1aWxoZXJtZW12QGdtYWlsLmNvbXw0NjU4ODUyfDg3MzU2ODc4NA==
>> >> >> >> >.
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> View this message in context:
>> >> >> >>
>> >> >>
>> >>
>> http://apache-poi.1045710.n5.nabble.com/Apache-POI-3-8-SXSSFWorkbook-Unreadable-Content-tp4658852p4659737.html
>> >> >> >> Sent from the POI - Dev mailing list archive at Nabble.com.
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> >> >> For additional commands, e-mail: dev-help@poi.apache.org
>> >> >>
>> >> >>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> >> For additional commands, e-mail: dev-help@poi.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> For additional commands, e-mail: dev-help@poi.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message