ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: usage analytics
Date Tue, 03 Nov 2020 11:59:15 GMT
Folks,

I want to bump up this discussion and slightly change the format suggested
by Nikita. I dot think it is correct to gather any information related to
the user environment. However, can we collect just the fact of some of the
Ignite APIs/subsystems being used with no user information whatsoever?
Having started thinking about Ignite 3.0 I realized that we lack even some
very basic knowledge on the impact of changing one or another feature or
API.

To my knowledge, the Ignite website already uses google analytics which is
available to the community. The google analytics platform already has
tooling to track app screen hits in a completely anonymous way, so we can
use this tool to track Ignite components usage (once per node startup)
sending solely component name and a unique environment hash - no IP
addresses, no jdk/os/other information. The information will be available
in the same toolkit we are already using to analyze the website and
optimize our docs.

WDYT?

ср, 19 июл. 2017 г. в 01:15, <dsetrakyan@apache.org>:

> I would try to ping legal again and see if they respond. If not, I think
> we will need to come up with a simpler approach, that does not require
> legal approval.
>
> ⁣D.​
>
> On Jul 18, 2017, 2:23 PM, at 2:23 PM, Nikita Ivanov <nivanov30@gmail.com>
> wrote:
> >Igniters,
> >Just a quick update. I haven't gotten response from ASF Legal on this
> >thread and I frankly don't know how to proceed here. What's the process
> >to
> >arrive to a decision point here?
> >
> >Thanks!
> >--
> >Nikita Ivanov
> >
> >
> >On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik <cos@apache.org>
> >wrote:
> >
> >> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> >> > Cos,
> >> > Based on my experience having it off by default negates the entire
> >> > purpose... We need statistically meaningful data set to make any
> >> inferences
> >> > from it. Moreover, if we are going to ask folks to turn it on it
> >will
> >> > significantly skew the resulting data set anyways and show full
> >picture.
> >> I
> >> > think "on" by default is the better option if we are to collect
> >usage
> >> stats
> >> > to begin with.
> >>
> >> yes, sure. But having this "on" by default is likely to expose us to
> >> another
> >> shit-storm down the road. An interesting dilemma to have indeed. In
> >my
> >> experience, whenever I install something like a browser or an
> >operating
> >> system, it would ask if I want to make the particular piece of
> >software
> >> better
> >> by sending back some anonymized stats. Basically, I am given a way to
> >> explicitly opt-out if I wish.
> >>
> >> By turning the feature "on" by default is like saying: "we'll be
> >collecting
> >> some stats, but if you don't want to you can go here and there and
> >disable
> >> the
> >> collection. Oh, and by the way - you need to go and figure out the
> >exact
> >> steps
> >> to disable it."
> >>
> >> > Also, I want to re-iterate it again to avoid misunderstanding:
> >there is
> >> no
> >> > proposal nor will there be a technical way to attribute collected
> >data
> >> back
> >> > to a certain company. That's not what this is all about. We should
> >only
> >> be
> >> > interested in aggregated stats (community size, geo information,
> >language
> >> > information, components usage).
> >>
> >> Yes, I think it is clear, but never hurts to re-iterate.
> >>
> >> Cos
> >>
> >> > Thoughts?
> >> >
> >> > --
> >> > Nikita Ivanov
> >> > Founder & CTO
> >> > GridGain Systems
> >> >
> >> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <cos@apache.org>
> >> wrote:
> >> >
> >> > > Actually, that should be OFF by default. It sounds like this
> >reduce the
> >> > > amount
> >> > > of the data collected, but this would address the concerns of
> >companies
> >> > > like
> >> > > Roman's. I know for sure that a few of my clients would sue my
> >ass out
> >> of
> >> > > existence if I gave them the platform collecting their
> >data-centers
> >> info.
> >> > >
> >> > > Let's have it, set if off by default and document and easy way to
> >turn
> >> it
> >> > > off.
> >> > > Then start making rounds asking our user base to share _some_ of
> >the
> >> stats
> >> > > with the community, so we can track the growth of the install
> >base,
> >> etc.
> >> > >
> >> > > Cos
> >> > >
> >> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> >> > > > The idea so far is to have a single system property in
> >configuration
> >> that
> >> > > > turns this off completely. I envision that this will be
> >prominently
> >> > > > featured on Ignite website so that everyone who would like to
> >> disable it
> >> > > -
> >> > > > can do it in seconds.
> >> > > >
> >> > > > Thoughts?
> >> > > >
> >> > > > --
> >> > > > Nikita Ivanov
> >> > > > Founder & CTO
> >> > > > GridGain Systems
> >> > > >
> >> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh
> ><rshtykh@yahoo.com>
> >> wrote:
> >> > > >
> >> > > > > Nikita,
> >> > > > >
> >> > > > > Sending and storing (somewhere the company cannot securely
> >handle)
> >> any
> >> > > > > information (OS version, IP addresses, etc.) that can be
used
> >to
> >> > > compromise
> >> > > > > the services would be unacceptable.
> >> > > > > Turning it off might be ok (possibly through the cluster
> >settings,
> >> not
> >> > > via
> >> > > > > globally-accessible site), but the thing that there's a
risk
> >some
> >> > > > > information can leak outside (for any reason, starting from
a
> >human
> >> > > > > mistake) is scary.
> >> > > > >
> >> > > > > -- Roman
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> >> > > nivanov@gridgain.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > >
> >> > > > > Roman,
> >> > > > > Thanks for the feedback. What are those questions
> >specifically?
> >> Are IP
> >> > > > > addresses and OS is what causing it?
> >> > > > >
> >> > > > > Thanks!
> >> > > > >
> >> > > > > --
> >> > > > > Nikita Ivanov
> >> > > > > Founder & CTO
> >> > > > > GridGain Systems
> >> > > > >
> >> > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh
> >> <rshtykh@yahoo.com.invalid
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > NIkita,
> >> > > > >
> >> > > > > While this will help improve Ignite, it will prevent its
> >adoption
> >> by
> >> > > many
> >> > > > > projects -- sending and retaining IP adresses, OS versions,
> >etc.
> >> raises
> >> > > > > tons of questions when considering to use Ignite. Even if
it
> >can be
> >> > > opted
> >> > > > > out.
> >> > > > > -- Roman
> >> > > > >
> >> > > > >
> >> > > > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> >> > > nivanov30@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > >
> >> > > > >  Igniters,
> >> > > > > I would like to kick off the discussion on the idea of
> >collecting
> >> > > Ignite
> >> > > > > usage statistics. The basic idea behind this is to better
> >> understand
> >> > > > > general and anonymous Ignite usage information to better
> >calibrate
> >> > > > > community efforts in developing new features, improving
> >existing
> >> ones,
> >> > > > > delivering better documentation - and in every other way
to
> >make
> >> our
> >> > > > > project a better software solution.
> >> > > > >
> >> > > > > Although such instrumentation is standard practice in
> >commercially
> >> > > > > developed software, for an ASF project this could be a
> >sensitive
> >> issue.
> >> > > > > Therefore I would like to initiate a full community
> >discussion on
> >> how
> >> > > best
> >> > > > > to implement such practice for the benefit of project while
> >> ensuring
> >> > > the
> >> > > > > privacy protection of Ignite users.
> >> > > > >
> >> > > > > To ignite (pun intended) the discussion I'll outline below
> >some of
> >> the
> >> > > > > basic thoughts that I have on this subject. They are here
> >only to
> >> give
> >> > > an
> >> > > > > idea of what such instrumentation may potentially look like
> >so
> >> that we
> >> > > can
> >> > > > > discuss the merits of this idea in a tangible context.
> >> > > > >
> >> > > > > Overview
> >> > > > > -------------
> >> > > > > Upon start and every hour thereafter each Ignite node will
> >collect,
> >> > > encrypt
> >> > > > > and send usage statistics over HTTPS to the ASF-hosted
> >server. That
> >> > > server
> >> > > > > will accept such HTTPS packets, decrypt them and store them
> >in a
> >> > > > > time-series DB. A web interface will be provided to view
the
> >usage
> >> > > > > information.
> >> > > > >
> >> > > > > Opt-In or Opt-out
> >> > > > > -------------------------
> >> > > > > Opt-out. Ignite website will offer simple instructions
> >(system
> >> > > property) on
> >> > > > > how to disable this instrumentation.
> >> > > > >
> >> > > > > Code, Infra, Access
> >> > > > > ---------------------------
> >> > > > > Ignite instrumentation will be part of the Ignite code base.
> >The
> >> > > collection
> >> > > > > server will be a separate module in the Ignite code base
> >(released
> >> > > > > separately from Ignite). The collection server will be hosted
> >by
> >> ASF
> >> > > Infra.
> >> > > > >
> >> > > > > Usage statistics will be publicly accessible by anyone in
the
> >> > > community.
> >> > > > >
> >> > > > > Private, Personal Data
> >> > > > > ------------------------------
> >> > > > > No private or personal data will ever be transferred. No
> >emails,
> >> > > usernames,
> >> > > > > company names, grid names, etc.
> >> > > > >
> >> > > > > Data Retention
> >> > > > > --------------------
> >> > > > > All data will be retained for 1 year and deleted permanently
> >> > > thereafter.
> >> > > > >
> >> > > > > Usage Data
> >> > > > > ----------------
> >> > > > > The following data will be collected in each packet sent
to
> >the
> >> > > collection
> >> > > > > server:
> >> > > > > - GRID_SIZE (to correspond our testing environment with
the
> >more
> >> > > frequent
> >> > > > > cluster sizes)
> >> > > > > - IP_ADDR (for general geo-tracking as well as to know what
> >> > > documentation
> >> > > > > language should be a priority)
> >> > > > > - SES_ID (to track continues uptime vs. re-starts)
> >> > > > > - USERNAME_TYPE (privilege username vs. standard, to track
> >> production
> >> > > vs.
> >> > > > > dev/testing usage; note - this is not an actual username)
> >> > > > > - OS_NAME
> >> > > > > - OS_VER
> >> > > > > - OS_ARCH
> >> > > > > - JAVA_VER
> >> > > > > - JAVA_VENDOR
> >> > > > > - COMP_SQL (whether or not this feature was used)
> >> > > > > - COMP_COMPUTE (whether or not this feature was used)
> >> > > > > - COMP_DATAGRID (whether or not this feature was used)
> >> > > > > - COMP_STREAMING (whether or not this feature was used)
> >> > > > > - COMP_IGFS (whether or not this feature was used)
> >> > > > > - COMP_SERVICE (whether or not this feature was used)
> >> > > > > - COMP_PERSISTENCE (whether or not this feature was used)
> >> > > > >
> >> > > > > Please let's discuss this idea. Everyone's comments and
> >> suggestions are
> >> > > > > *extremely* welcome.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Nikita Ivanov.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > >
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message