poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guilherme Vieira <jguilherm...@gmail.com>
Subject Re: Apache POI 3.8 (SXSSFWorkbook) - Unreadable Content
Date Wed, 03 Aug 2011 13:31:48 GMT
Yegor,

Where can I find information about zipping and unzipping these files?

Best regards,
José Guilherme Macedo Vieira


2011/8/3 Yegor Kozlov <yegor.kozlov@dinom.ru>

> It would be a valuable contribution if you generate two worksheets
> containing strings with \u00a0. One with SXSSF and the other with
> XSSF. You can use my code snippet. Then unzip the files, compare and
> figure out what's wrong with handling  \u00a0 in SXSSF.
>
> You want to look at sharedStrings.xml and sheet1.xml.
>
> Yegor
>
> On Wed, Aug 3, 2011 at 4:22 PM, Guilherme Vieira <jguilhermemv@gmail.com>
> wrote:
> > I can help you if you can. We can work together to fix the issues and
> > develop new features.
> >
> > Cheers.
> > José Guilherme Macedo Vieira
> >
> >
> > 2011/8/3 Yegor Kozlov <yegor.kozlov@dinom.ru>
> >
> >> So far the plan is to release it in late August.
> >>
> >> Yegor
> >>
> >> On Wed, Aug 3, 2011 at 3:01 PM, Guilherme Vieira <
> jguilhermemv@gmail.com>
> >> wrote:
> >> > Yegor,
> >> >
> >> > I'm glad to help you to find this issue. This is exactly the problem.
> By
> >> now
> >> > I'm trying to fix it in my own code by replacing these characters.
> Though
> >> > the beta4 is not out yet I'm still using it in my project because I
> need
> >> to
> >> > write an excel with >100.000 lines. Do you know when the beta4 is
> gonna
> >> be
> >> > out?
> >> >
> >> > Cheers,
> >> > José Guilherme Macedo Vieira
> >> >
> >> >
> >> >
> >> > 2011/8/3 Yegor Kozlov <yegor.kozlov@dinom.ru>
> >> >
> >> >> The culprit is the non-break space (charcode=\u00a0). I was able to
> >> >> reproduce the trouble with the following code:
> >> >>
> >> >>        Workbook wb = new SXSSFWorkbook();
> >> >>        Sheet sh = wb.createSheet();
> >> >>         Row row = sh.createRow(0);
> >> >>        row.createCell(0).setCellValue("ALEXANDRE\u00a0MARINHO DE
> >> SOUZA");
> >> >>         FileOutputStream out = new
> FileOutputStream("/temp/test.xlsx");
> >> >>        wb.write(out);
> >> >>        out.close();
> >> >>
> >> >> The fix is coming soon and will be included in 3.8-beta4.
> >> >>
> >> >> Cheers,
> >> >> Yegor
> >> >>
> >> >> On Wed, Aug 3, 2011 at 1:59 PM, Guilherme Vieira <
> >> jguilhermemv@gmail.com>
> >> >> wrote:
> >> >> > Dear Yegor,
> >> >> >
> >> >> > Your tip didn't work. So I guessed that there was a non-printable
> >> >> character
> >> >> > instead of white spaces. That said I tried to encode it with
> >> >> > URLEncoder.encode("the name goes here","ASCII"); and guess what?
> The
> >> >> encoded
> >> >> > name is as below:
> >> >> >
> >> >> > ALEXANDRE%3BF+MARINHO+DE+SOUZA
> >> >> >
> >> >> > It interesting because I can't remove it with replace all because
> we
> >> have
> >> >> > non-printable characters. So, I'm trying to find a regular
> expression
> >> >> that
> >> >> > matches to these expressions (%3BF and + ,respectively). It would
> be
> >> nice
> >> >> if
> >> >> > I could find a regular expression that matches to any special
> >> >> non-printable
> >> >> > characters. So, how do I proceed?
> >> >> >
> >> >> > And thanks in advance for your answer as well for your GREAT work
> in
> >> >> Apache
> >> >> > POI with the Big Grid Demo approach. It is just wonderful. Can't
> wait
> >> for
> >> >> > the final release (3.8-beta4).
> >> >> >
> >> >> > Best regards,
> >> >> > José Guilherme Macedo Vieira
> >> >> >
> >> >> >
> >> >> > 2011/8/3 Yegor Kozlov <yegor.kozlov@dinom.ru>
> >> >> >
> >> >> >> Tweak your report generator and try the following tricks before
> >> >> >> passing strings to SXSSFCell:
> >> >> >>
> >> >> >>  (a) string.replaceAll("\\s+", ""); // replace multiple white
> spaces
> >> >> >> with a single space
> >> >> >>  (b) string.replace(' ', '_'); // replace white spaces with
> >> underscore
> >> >> >>
> >> >> >> Does any of (a) and (b) help?
> >> >> >>
> >> >> >> My hunch is that the problem is in something else, not in
double
> >> white
> >> >> >> spaces. At least, I can't reproduce the problem with the following
> >> >> >> code snippet:
> >> >> >>
> >> >> >>        Workbook wb = new SXSSFWorkbook();
> >> >> >>        Sheet sh = wb.createSheet();
> >> >> >>        for(int i = 0; i < 10000; i++) {
> >> >> >>            Row row = sh.createRow(i);
> >> >> >>            row.createCell(0).setCellValue("ALEXANDRE__MARINHO
DE
> >> >> SOUZA");
> >> >> >>            row.createCell(1).setCellValue("ALEXANDRE MARINHO
DE
> >> SOUZA");
> >> >> >>            row.createCell(2).setCellValue("ALEXANDRE  MARINHO
DE
> >> >> SOUZA");
> >> >> >>            row.createCell(3).setCellValue("ALEXANDRE   MARINHO
DE
> >> >> SOUZA");
> >> >> >>        }
> >> >> >>
> >> >> >>        FileOutputStream out = new
> >> FileOutputStream("/temp/test.xlsx");
> >> >> >>        wb.write(out);
> >> >> >>        out.close();
> >> >> >>
> >> >> >> The generated file is readable and all spaces are there.
> >> >> >>
> >> >> >> Yegor
> >> >> >>
> >> >> >> On Tue, Aug 2, 2011 at 11:49 PM, Guilherme Vieira
> >> >> >> <jguilhermemv@gmail.com> wrote:
> >> >> >> > So, I've searched column by column in the problematic
line in
> order
> >> to
> >> >> >> > identify the problem. The problem is quite weird. It's
a string
> >> column
> >> >> in
> >> >> >> > the database. This column stores people names.
> >> >> >> >
> >> >> >> > In my problem the name is: ALEXANDRE__MARINHO DE SOUZA
> >> >> >> >
> >> >> >> > Of course, without the underline character. Instead it
is a
> >> whitespace
> >> >> >> > character. So, when with double whitespace character
the file is
> >> >> >> corrupted.
> >> >> >> > And when I manually remove the one whitespace in the
IDE, the
> file
> >> is
> >> >> >> also
> >> >> >> > corrupted. But when I change the whole name manually
in the IDE,
> >> >> setting
> >> >> >> the
> >> >> >> > value to ALEXANDRE_MARINHO DE SOUZA, it works. It's strange.
I
> >> don't
> >> >> know
> >> >> >> > why SXSSF is not accepting two whitespaces.
> >> >> >> >
> >> >> >> > Anyone have a clue?
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > 2011/8/2 jguilhermemv <jguilhermemv@gmail.com>
> >> >> >> >
> >> >> >> >> I tried without merged region and it didn't work.
So, I noticed
> >> that
> >> >> >> there
> >> >> >> >> is a line in the file which present the error. It's
the line
> >> (2451)
> >> >> and
> >> >> >> >> until the line 2450 everything works great. But for
some reason
> >> when
> >> >> it
> >> >> >> >> reach the line 2450 it just doesn't work. I checked
if the was
> any
> >> >> null
> >> >> >> >> values, but there wasn't. The writing routine is
right,
> otherwise
> >> it
> >> >> >> >> wouldn't write until the line 2450.
> >> >> >> >>
> >> >> >> >> What can I do now?
> >> >> >> >>
> >> >> >> >> Best regards.
> >> >> >> >> José Guilherme Macedo Vieira
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> 2011/8/2 Nick Burch-11 [via Apache POI] <
> >> >> >> >> ml-node+4658878-753894702-237524@n5.nabble.com>
> >> >> >> >>
> >> >> >> >> > On Tue, 2 Aug 2011, jguilhermemv wrote:
> >> >> >> >> > > Regarding the file, it makes use of some
CellStyles and
> Merged
> >> >> >> Regions.
> >> >> >> >> >
> >> >> >> >> > Try without them, and see if that fixes it.
You need to
> narrow
> >> your
> >> >> >> >> > problem down before you can figure out what
to correct. Try
> to
> >> >> >> identify
> >> >> >> >> > the simplest file that fails, and the most complex
one that
> >> works,
> >> >> the
> >> >> >> >> gap
> >> >> >> >> > there is your issue
> >> >> >> >> >
> >> >> >> >> > Nick
> >> >> >> >> >
> >> >> >> >> >
> >> >> ---------------------------------------------------------------------
> >> >> >> >> > To unsubscribe, e-mail: [hidden email]<
> >> >> >> >> http://user/SendEmail.jtp?type=node&node=4658878&i=0>
> >> >> >> >> > For additional commands, e-mail: [hidden email]<
> >> >> >> >> http://user/SendEmail.jtp?type=node&node=4658878&i=1>
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > ------------------------------
> >> >> >> >> >  If you reply to this email, your message will
be added to
> the
> >> >> >> discussion
> >> >> >> >> > below:
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> http://apache-poi.1045710.n5.nabble.com/Apache-POI-3-8-SXSSFWorkbook-Unreadable-Content-tp4658852p4658878.html
> >> >> >> >> >  To unsubscribe from Apache POI 3.8 (SXSSFWorkbook)
-
> Unreadable
> >> >> >> Content,
> >> >> >> >> click
> >> >> >> >> > here<
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> http://apache-poi.1045710.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4658852&code=amd1aWxoZXJtZW12QGdtYWlsLmNvbXw0NjU4ODUyfDg3MzU2ODc4NA==
> >> >> >> >> >.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> View this message in context:
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> http://apache-poi.1045710.n5.nabble.com/Apache-POI-3-8-SXSSFWorkbook-Unreadable-Content-tp4658852p4659737.html
> >> >> >> >> Sent from the POI - Dev mailing list archive at Nabble.com.
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >>
> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> >> >> >> For additional commands, e-mail: dev-help@poi.apache.org
> >> >> >>
> >> >> >>
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> >> >> For additional commands, e-mail: dev-help@poi.apache.org
> >> >>
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> >> For additional commands, e-mail: dev-help@poi.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message