jclouds-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrinand Javadekar <shrin...@maginatics.com>
Subject Re: Perfroamnce degrade for JCloud-OpenStack Swift.
Date Fri, 25 Apr 2014 17:09:57 GMT
Sumit,

>> > After all experiments I feel there is a problem in Jclouds.

There may be a problem with jclouds. I was trying to understand the
test and the environment better so as to get to the bottom of this.


On Fri, Apr 25, 2014 at 12:39 AM, Sumit Gaur <sumitkgaur@gmail.com> wrote:
> Hi Ignasi
> https://github.com/sumitkgaur/test
>
> 1) Example8.java is original programme and required all jclouds libs.
> 2) Worker.java is the delayed delete programme.
>
> Thanks
> sumit
>
>
>
>
>
> On Apr 25, 2014 3:45 PM, "Ignasi Barrera" <nacx@apache.org> wrote:
>
>> Hi Sumit,
>>
>> Could you share the entire code of both programs in a git or pastie so we
>> can understand better how your benchmark works, and reproduce it locally?
>> El 25/04/2014 02:39, "Sumit Gaur" <sumitkgaur@gmail.com> escribió:
>>
>> > Hi Shri,
>> > After all experiments I feel there is a problem in Jclouds.
>> >
>> > 1) I tried retires for every 409 error. After successful retry Jclouds
>> > started reporting that blob is no more exists but in real it is still
>> there
>> > in SWIFT storage.
>> > 2) I try delaying the delete after 100 puts and voila there are no 409 in
>> > 24 hours. That exactly says there are some race situation in jclouds if
>> we
>> > do immediate Delete after PUT.
>> > 3) I know 409 Errors are coming all the way from SWIFT object server but
>> > same is not happening even if I generate much higher "concurrent" load
>> from
>> > curl (PUT- GET-DEL) cycle. I was getting TPS of 150.
>> > 4) I have run SWIFT without any extra daemon like auditor and others to
>> > avoid conflicts because of them. Storage node run only
>> > object/container/account server.
>> > 5) To generate concurrent curl load I am sending curl commands in the
>> > background. I ran this test for 48 hours and not even a single 409 error.
>> > 6) For an idea of sequence of client code
>> >
>> >  static BlobStoreContext getSwiftClientView() {
>> >                return ContextBuilder.newBuilder("swift-keystone")
>> >                           .credentials("test:tester", "test123")
>> >                           .endpoint("http://a.x.y.z.:5000/v2.0/")
>> >                           .buildView(BlobStoreContext.class);
>> >            }
>> >
>> > BlobStoreContext context = getSwiftClientView();
>> > blobStore = context.getBlobStore();
>> > blobStore.createContainerInLocation(null, containerName);
>> > blobStore.blobBuilder(key).payload(file).build();
>> >
>> > blobStore.putBlob(containerName, blob);
>> > getBlob(containerName, key);
>> > blobStore.removeBlob(containerName, key);
>> >
>> > Let me know if you still see any gaps.
>> > Thanks
>> > sumit
>> >
>> >
>> >
>> >
>> >
>> > On Apr 23, 2014 2:36 AM, "Shrinand Javadekar" <shrinand@maginatics.com>
>> > wrote:
>> >
>> > > So there are two problems:
>> > >
>> > > 1) 409 when deleting objects.
>> > > 2) Transactions taking longer after 24-48 hours.
>> > >
>> > > For (1), it looks like the request reached the Swift cluster but the
>> > > Swift cluster itself wasn't able to fulfill it. This could be because
>> > > of the "eventual consistency" semantics of blobstores. When the delete
>> > > request reached Swift, it could have been in the middle of some
>> > > operation on the object itself (e.g. reading the object for
>> > > replicating it, auditing it, etc). Jclouds did it's job of actually
>> > > sending the request. So not sure what else can be done here. Maybe we
>> > > could add retries if the blobstore returns 409. But the main problem
>> > > lies on the Swift side. The Openstack mailing list would be a better
>> > > place for asking this question. There are many more Swift experts
>> > > there.
>> > >
>> > > For (2), from the curl example code, it looks like you're creating
>> > > multiple processes, each doing a put or a delete (no get). This is
>> > > different from jclouds spawning multiple threads. It would be great if
>> > > the experiments count the number of transactions they're doing and
>> > > whether they both reach the same number of transactions in the given
>> > > amount of time. If they do and yet there are less txns via jclouds
>> > > compared to the shell script, we can conclude that jclouds is the
>> > > cause of the problem.
>> > >
>> > > Now, answering some of the questions below.
>> > >
>> > > > It would be great if someone let me know how jcloud delete works.
Is
>> > > there
>> > > > any internal queue while put or delete ? I saw if I put a small sleep
>> > of
>> > > > 300ms between put n del call, it works fine.
>> > >
>> > > I presume the blobstore object you're using in Example9.blobStore is
>> > > of type "BlobStore" and not "AsyncBlobStore". AsyncBlobStore is
>> > > deprecated. The BlobStore object is synchronous. There is no queue.
>> > > When you call removeBlob, the request gets created and sent to the
>> > > Swift cluster.
>> > >
>> > > > Also I assume that jclouds calls are synchronous one n put could not
>> > come
>> > > > out till object get saved in swift.
>> > >
>> > > For the BlobStore type, yes, it is sync.
>> > >
>> > > There are some jvm level settings that might also be at play here
>> > > related to the amount of memory you're allocating to the heap. You
>> > > could change the memory given to the jvm using the -Xms and -Xmx
>> > > options.
>> > >
>> > > -Shri
>> > >
>> > > >  On Apr 22, 2014 11:59 AM, "Sumit Gaur" <sumitkgaur@gmail.com>
>> wrote:
>> > > >
>> > > >> Hi
>> > > >> Please find my answer below
>> > > >>
>> > > >> On Apr 22, 2014 10:49 AM, "Jasdeep Hundal" <
>> > > jasdeep.singh.hundal@gmail.com>
>> > > >> wrote:
>> > > >> >
>> > > >> > Hey Sumit,
>> > > >> >
>> > > >> > I have a couple more questions that might help clarify the
>> > situation:
>> > > >> >
>> > > >> > 1. Are you running the stability test as a single long running
>> Java
>> > > >> process
>> > > >> > (that just keeps cycling through the 10 uploads/gets/deletes)?
>> > > >> >
>> > > >>
>> > > >> Yes. But this process has threads.
>> > > >>
>> > > >> > 2. Are you always running the test in the same container,
or are
>> you
>> > > >> > creating new containers for each test iteration?
>> > > >> >
>> > > >> No, I am doing roundrobin in 1000 containers
>> > > >>
>> > > >> > 3. If the answer to #2 is is that the test runs in a single
>> > container,
>> > > >> how
>> > > >> > many objects does that container currently have?
>> > > >> >
>> > > >>
>> > > >> 0 in ideal case. But as I m facing 409 delete fail also... so
there
>> > are
>> > > >> some objects on each container in hundreds only.
>> > > >>
>> > > >> > It may also help to time each of the individual blobstore
actions
>> as
>> > > you
>> > > >> > run the test to see if any particular one is slowing down.
>> > > >> >
>> > > >>
>> > > >> Even indivitual put and del time increase over the time.
>> > > >>
>> > > >> > Jasdeep
>> > > >> >
>> > > >> >
>> > > >> > On Mon, Apr 21, 2014 at 6:21 PM, Sumit Gaur <sumitkgaur@gmail.com
>> >
>> > > >> wrote:
>> > > >> >
>> > > >> > > hi Shri,
>> > > >> > > Please find answers below
>> > > >> > >
>> > > >> > > On Tue, Apr 22, 2014 at 9:23 AM, Shrinand Javadekar
<
>> > > >> > > shrinand@maginatics.com
>> > > >> > > > wrote:
>> > > >> > > Few more questions to try and understand this better:
>> > > >> > >
>> > > >> > > 1) On the Swift instance you are using, how many replicas
do you
>> > > have?
>> > > >> > >
>> > > >> > > 3 replica
>> > > >> > >
>> > > >> > > 2) Also, how are you using the curl command in the shell
script?
>> > > >> > >
>> > > >> > > send below command in backgroud for 10 iterations and
wait
>> > similiar
>> > > to
>> > > >> the
>> > > >> > > 10 threads in jclouds.
>> > > >> > >
>> > > >> > >             curl -X PUT -i -T 100k -H "X-Auth-Token:
>> > $OS_AUTH_TOKEN"
>> > > >> > > http://
>> > > >> > >
>> > > >> > >
>> > > >>
>> > >
>> >
>> $PROXY_LOCAL_NET_IP:80/v1/AUTH_${KEYSTONE_ID}/zest1-${cn}/zest1-${k}-${i}-${j}.txt
>> > > >> > >             curl -X DELETE -i -H "X-Auth-Token: $OS_AUTH_TOKEN"
>> > > http://
>> > > >> > >
>> > > >> > >
>> > > >>
>> > >
>> >
>> $PROXY_LOCAL_NET_IP:80/v1/AUTH_${KEYSTONE_ID}/zest1-${cn}/zest1-${k}-${i}-${j}.txt
>> > > >> > >
>> > > >> > > I
>> > > >> > > think the shell script and jclouds-with-10-parallel-threads
may
>> > not
>> > > be
>> > > >> > > doing the same amount of work. In 20 hours jclouds might
be
>> doing
>> > > much
>> > > >> > > more work than the shell script. If you let the shell
script
>> also
>> > go
>> > > >> > > upto that point, it might see failures too. Do you know
how many
>> > > >> > > PUT-GET-DEL operations have been performed when you
start seeing
>> > the
>> > > >> > > 409 errors.
>> > > >> > >
>> > > >> > > Actually 409 errors are coming since the start of the
test but
>> TPS
>> > > >> start
>> > > >> > > degrading after 24-48 hours.
>> > > >> > > On Apr 22, 2014 9:23 AM, "Shrinand Javadekar" <
>> > > shrinand@maginatics.com
>> > > >> >
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > Few more questions to try and understand this better:
>> > > >> > > >
>> > > >> > > > 1) On the Swift instance you are using, how many
replicas do
>> you
>> > > >> have?
>> > > >> > > > 2) Also, how are you using the curl command in
the shell
>> > script? I
>> > > >> > > > think the shell script and jclouds-with-10-parallel-threads
>> may
>> > > not
>> > > >> be
>> > > >> > > > doing the same amount of work. In 20 hours jclouds
might be
>> > doing
>> > > >> much
>> > > >> > > > more work than the shell script. If you let the
shell script
>> > also
>> > > go
>> > > >> > > > upto that point, it might see failures too. Do
you know how
>> many
>> > > >> > > > PUT-GET-DEL operations have been performed when
you start
>> seeing
>> > > the
>> > > >> > > > 409 errors.
>> > > >> > > >
>> > > >> > > > -Shri
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > On Mon, Apr 21, 2014 at 4:55 PM, Sumit Gaur <
>> > sumitkgaur@gmail.com
>> > > >
>> > > >> > > wrote:
>> > > >> > > > > FYI ..This is block of code .....   also I
am using jclouds
>> > > 1.7.1
>> > > >> > > (Stable
>> > > >> > > > > branch)
>> > > >> > > > >      try {
>> > > >> > > > > String key = "objkey" + UUID.randomUUID();
>> > > >> > > > >                 Blob blob =
>> > > >> > > > >
>> > > Example9.blobStore.blobBuilder(key).payload(Example9.file).build();
>> > > >> > > > >
>> > > >> > > Example9.blobStore.putBlob(Example9.containerName+count,
>> > > >> > > > > blob);
>> > > >> > > > >
>> > > >> > > Example9.blobStore.getBlob(Example9.containerName+count,
>> > > >> > > > > key);
>> > > >> > > > >
>> > > >> > > > Example9.blobStore.removeBlob(Example9.containerName+count,
>> > > >> > > > > key);
>> > > >> > > > >         } catch (Exception ace) {
>> > > >> > > > >                 System.out.println("Request
failed for
>> objkey
>> > "
>> > > +
>> > > >> key
>> > > >> > > + "
>> > > >> > > > >  " + ace);
>> > > >> > > > >         }
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > > > On Tue, Apr 22, 2014 at 8:32 AM, Sumit Gaur
<
>> > > sumitkgaur@gmail.com>
>> > > >> > > > wrote:
>> > > >> > > > >
>> > > >> > > > >> Hi Shri,
>> > > >> > > > >> Thanks for paying attention to it, Please
find my answers
>> > > below:-
>> > > >> > > > >>
>> > > >> > > > >>
>> > > >> > > > >> On Tue, Apr 22, 2014 at 2:31 AM, Shrinand
Javadekar <
>> > > >> > > > >> shrinand@maginatics.com> wrote:
>> > > >> > > > >>
>> > > >> > > > >>> Sumit,
>> > > >> > > > >>>
>> > > >> > > > >>> I realize that you had sent out a
similar email sometime
>> ago
>> > > >> about
>> > > >> > > > >>> performance degradation. I'm not sure
if anyone has run
>> > these
>> > > >> types
>> > > >> > > of
>> > > >> > > > >>> long running experiments with jclouds.
So this may be a
>> > first.
>> > > >> > > > >>>
>> > > >> > > > >> Tried to debug it in last 2 weeks without
success. Want to
>> > > >> understand
>> > > >> > > > more
>> > > >> > > > >> how jclouds code handle this use case
or any pointers that
>> > this
>> > > >> is a
>> > > >> > > > >> problematic use case would help
>> > > >> > > > >>
>> > > >> > > > >>>
>> > > >> > > > >>> The 409 status is returned because
of a conflict [1]. Are
>> > you
>> > > >> sure
>> > > >> > > you
>> > > >> > > > >>> didn't have two or more threads trying
to delete the same
>> > > object?
>> > > >> > > > >>>
>> > > >> > > > >> No two threads share the same object key
in my programme
>> > > (String
>> > > >> key =
>> > > >> > > > >> "objkey" + UUID.randomUUID();). It is
some kind of race
>> > between
>> > > >> PUT
>> > > >> > > and
>> > > >> > > > >> DEL call . If I put say 10 ms sleep between
call then there
>> > is
>> > > no
>> > > >> 409
>> > > >> > > > error.
>> > > >> > > > >>
>> > > >> > > > >>
>> > > >> > > > >>> Also, I see that that 409 is returned
by Swift if you try
>> to
>> > > >> delete a
>> > > >> > > > >>> container that isn't empty[2]. Is
that something your test
>> > > code
>> > > >> > > > >>> could've tried?
>> > > >> > > > >>>
>> > > >> > > > >> I am trying to delete objects .. not containers.
>> > > >> > > > >>
>> > > >> > > > >>>
>> > > >> > > > >>> When you say there was a similar test
you're trying with
>> > curl,
>> > > >> are
>> > > >> > > you
>> > > >> > > > >>> using the curl command-line utility
or the libcurl
>> library?
>> > > >> > > > >>
>> > > >> > > > >> curl command in shell script with for
loops.
>> > > >> > > > >>
>> > > >> > > > >>
>> > > >> > > > >>> How are
>> > > >> > > > >>> you specifying the number of threads
to use and what
>> object
>> > > each
>> > > >> > > > >>> thread should get/put/delete?
>> > > >> > > > >>>
>> > > >> > > > >>
>> > > >> > > > >> It is a java test programme using ThreadPoolExecutor.
>> > Somthing
>> > > >> > > similiar
>> > > >> > > > as
>> > > >> > > > >> here
>> > > >> > > > >>
>> > > >> > > > >>
>> > > >> > > >
>> > > >> > >
>> > > >>
>> > >
>> >
>> http://www.javacodegeeks.com/2013/01/java-thread-pool-example-using-executors-and-threadpoolexecutor.html
>> > > >> > > > >>
>> > > >> > > > >> Object is a 5KB file. with  key = "objkey"
+
>> > UUID.randomUUID();
>> > > >> with
>> > > >> > > > Pool
>> > > >> > > > >> of 10  threads.
>> > > >> > > > >>
>> > > >> > > > >>
>> > > >> > > > >> Hope this would give a good inside. Let
me know if you get
>> > any
>> > > >> problem
>> > > >> > > > >> here.
>> > > >> > > > >>
>> > > >> > > > >>
>> > > >> > > > >>>
>> > > >> > > > >>> Thanks.
>> > > >> > > > >>> -Shri
>> > > >> > > > >>>
>> > > >> > > > >>> [1]
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
>> > > >> > > > >>> [2] https://bugs.launchpad.net/horizon/+bug/1096084
>> > > >> > > > >>>
>> > > >> > > > >>> On Sun, Apr 20, 2014 at 5:55 PM, Sumit
Gaur <
>> > > >> sumitkgaur@gmail.com>
>> > > >> > > > wrote:
>> > > >> > > > >>> > Hi
>> > > >> > > > >>> > I using jclouds lib integrated
with Openstack Swift+
>> > > keystone
>> > > >> > > > >>> combinaiton.
>> > > >> > > > >>> > Things are working fine except
stability test. After
>> 20-30
>> > > >> hours of
>> > > >> > > > test
>> > > >> > > > >>> > jclouds/SWIFT start degrading
in TPS and keep going down
>> > > over
>> > > >> the
>> > > >> > > > time.
>> > > >> > > > >>> >
>> > > >> > > > >>> > 1) I am running the (PUT-GET-DEL)
cycle in 10 parallel
>> > > threads.
>> > > >> > > > >>> > 2) I am getting a lot of 409
and DEL failure for the as
>> > > >> response
>> > > >> > > too
>> > > >> > > > >>> from
>> > > >> > > > >>> > SWIFT.
>> > > >> > > > >>> > 3) Direct similiar test from
curl does not show much
>> > impact
>> > > >> and TPS
>> > > >> > > > >>> remain
>> > > >> > > > >>> > constant.
>> > > >> > > > >>> >
>> > > >> > > > >>> > Can sombody help me wht is going
wrong here ?
>> > > >> > > > >>> >
>> > > >> > > > >>> > Thanks
>> > > >> > > > >>> > sumit
>> > > >> > > > >>>
>> > > >> > > > >>
>> > > >> > > > >>
>> > > >> > > >
>> > > >> > >
>> > > >>
>> > >
>> >
>>

Mime
View raw message