From dev-return-1599-apmail-systemml-dev-archive=systemml.apache.org@systemml.incubator.apache.org Tue Apr 18 04:59:35 2017 Return-Path: X-Original-To: apmail-systemml-dev-archive@minotaur.apache.org Delivered-To: apmail-systemml-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C7FFB18602 for ; Tue, 18 Apr 2017 04:59:35 +0000 (UTC) Received: (qmail 15225 invoked by uid 500); 18 Apr 2017 04:59:35 -0000 Delivered-To: apmail-systemml-dev-archive@systemml.apache.org Received: (qmail 15182 invoked by uid 500); 18 Apr 2017 04:59:35 -0000 Mailing-List: contact dev-help@systemml.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.incubator.apache.org Delivered-To: mailing list dev@systemml.incubator.apache.org Received: (qmail 15170 invoked by uid 99); 18 Apr 2017 04:59:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Apr 2017 04:59:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D0ADE18FCDC for ; Tue, 18 Apr 2017 04:59:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.335 X-Spam-Level: *** X-Spam-Status: No, score=3.335 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FORGED_HOTMAIL_RCVD2=1.187, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hotmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id x6iV927K7z0i for ; Tue, 18 Apr 2017 04:59:32 +0000 (UTC) Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-oln040092253045.outbound.protection.outlook.com [40.92.253.45]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 198F05FD6D for ; Tue, 18 Apr 2017 04:59:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=8c9TZKWzAu8mtp7V5g+Qex/f5m3kaYFl+3SZyXJe8ak=; b=D9yQMxXYTLwz7kemaNcsi5EiJ1qgrCI3tsxmrmBNnmHd4G2iKsDNTO7Cjw0iVi5MgRNMTAbQha6lHh0ukk4S7pAj00zOtuGi/JX38SV//rTMBGXfEOIbW+VYtoSrZeavdrhDqvTcYsOpLX4DkAEFq/QBWSNhdWA4jIqBbSrN9kV5zu8a7Yne0mw1coNfz/2EW6xA6Tz8FdZx4j6mkNN0XPv5JOGwq01XdUw1NTCI2AuvKsApKFD/NEqQkMV8z2tTSS16CdTiOC0wPa2wkK454OhcS3CQR7mmClnw1J14ibQMNlfxAwd5B3kWLNlrxuEede0lFhVWuvPoXQBOEg8MAg== Received: from HK2APC01FT039.eop-APC01.prod.protection.outlook.com (10.152.248.55) by HK2APC01HT036.eop-APC01.prod.protection.outlook.com (10.152.249.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1019.14; Tue, 18 Apr 2017 04:59:22 +0000 Received: from PN1PR01MB0717.INDPRD01.PROD.OUTLOOK.COM (10.152.248.56) by HK2APC01FT039.mail.protection.outlook.com (10.152.249.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1019.14 via Frontend Transport; Tue, 18 Apr 2017 04:59:22 +0000 Received: from PN1PR01MB0717.INDPRD01.PROD.OUTLOOK.COM ([10.174.145.142]) by PN1PR01MB0717.INDPRD01.PROD.OUTLOOK.COM ([10.174.145.142]) with mapi id 15.01.1034.012; Tue, 18 Apr 2017 04:59:16 +0000 From: arijit chakraborty To: "dev@systemml.incubator.apache.org" Subject: Re: Distinct Item of a column Thread-Topic: Distinct Item of a column Thread-Index: AQHSt46eokjIuY0gWEaXz517cz50PKHJ8aAAgACf15g= Date: Tue, 18 Apr 2017 04:59:15 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: systemml.incubator.apache.org; dkim=none (message not signed) header.d=none;systemml.incubator.apache.org; dmarc=none action=none header.from=hotmail.com; x-incomingtopheadermarker: OriginalChecksum:2F3BDB69D3C80E2E4FA5CAB221B7EEB643B5B01312B224060572612CF6308095;UpperCasedChecksum:40488300E4D7E7536558B765D7C964FF4D8A56D813A618C1F892C2E1AD8B8232;SizeAsReceived:8144;Count:42 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [x54qK4+M9cpwu33320B03k5+aX4XOGeC] x-microsoft-exchange-diagnostics: 1;HK2APC01HT036;5:zRi3j9pyPmTFVjVJ+zcWLxTLf4kES8A52bjyr/5E1kLkfja6dmkdVJm9eCcYgvzC2GMfE7fMuvSYnfbmCnlGYe9XFT52LhxJ/l8LYZGJtUVeFkKNi+Tplcn7YLjr6JoJbn6S5A0feULOSha6jcxDfA==;24:LcBOL2GousB6cS49SFwLvp4QQDyXtViaCXsqa+/jYdLdaqWeRMCq7Uy1/nQWTtRnGoL/ieaIiYI1GL819BhYDgH6WwGGNMF2CKzLapFF8QM=;7:KEiwAs5qtddGp7+oZ/jccECnQxeyMByT3Vwd1uybJEq3Zxj96IuNGd6rpWEzvuDVU2RYQeog46RFwZqAUepMzwxRmfVKRuTioS6HaXbxN0mbo1aabNji1qIYrbFZ4kcdQeiTztRkS00WMLe8w7it5mwxNoVEY3TKu+WF4II/UdnjfyHoHxDg+u6ce6CvEFLH6muQr590EaoHxtwomhYES0zfqyE8zI/rE/J8llqh/2pOB/NtMYMUNrgqeuixLoLpxXu11TzbMyIBRa7909FiG5g9IZW8opF3uUbLAKdWmEAuGQuCHVbWLfecWLKu/KYf x-incomingheadercount: 42 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI;SFV:NSPM;SFS:(7070007)(98901004);DIR:OUT;SFP:1901;SCL:1;SRVR:HK2APC01HT036;H:PN1PR01MB0717.INDPRD01.PROD.OUTLOOK.COM;FPR:;SPF:None;LANG:en; x-ms-office365-filtering-correlation-id: 2b889868-e6f1-4ac9-c732-08d48617a9f9 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322274)(1601125374)(1603101448)(1701031045);SRVR:HK2APC01HT036; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(444000031);SRVR:HK2APC01HT036;BCL:0;PCL:0;RULEID:;SRVR:HK2APC01HT036; x-forefront-prvs: 028166BF91 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_PN1PR01MB07171B57D5FC3A2D1155908FA3190PN1PR01MB0717INDP_" MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Apr 2017 04:59:15.8469 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: HK2APC01HT036 --_000_PN1PR01MB07171B57D5FC3A2D1155908FA3190PN1PR01MB0717INDP_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thank you Niketan! Your answer completely answer my question. Regards, Arijit ________________________________ From: Niketan Pansare Sent: Tuesday, April 18, 2017 12:55:28 AM To: dev@systemml.incubator.apache.org Subject: Re: Distinct Item of a column Hi Arijit, PySpark and SystemML are complimentary and both serve different purpose. Py= Spark primarily operates on a collection of datapoints (i.e. RDD) or a Data= Frame and exposes the Spark programming model (i.e. transformation and acti= ons). SystemML primarily operates on matrices and provides wide variety of = linear algebra operators required for implementing Machine Learning algorit= hms. Personally, I would use PySpark for data preprocessing and SystemML fo= r training/prediction (YMMV!!). As an example: in our breast cancer project= , we use PySpark APIs in https://github.com/apache/incubator-systemml/blob/= master/projects/breast_cancer/Preprocessing.ipynb and SystemML APIs in http= s://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer= /MachineLearning.ipynb ... Yes, some operations (such as distinct) can be d= one in both SystemML and PySpark, in which case, you should chose the one t= hat best fits your need. PySpark ML (or MLLib) is more closer to SystemML. I agree with you that the= re is not enough comparisons out there, probably because benchmarking ML sy= stems is non-trivial. For apples to apples comparison, you need compare bot= h accuracy and runtime performance of a given ML model on variety of datase= ts. I am using the term "accuracy" broadly, so please refer to http://sciki= t-learn.org/stable/modules/classes.html#module-sklearn.metrics. Also, since= different ML systems use different optimization algorithms (i.e. SGD, conj= ugate gradient, direct solve, ...), one needs to reason about hyperparamete= rs as well as convergence behavior before making a judgement. Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=3Dus-npansar PS: SystemML has recently added support for frames (http://apache.github.io= /incubator-systemml/dml-language-reference.html#frames) that simplifies com= mon data transformation operations such as recoding, dummy coding, binning = and handling of missing values. [Inactive hide details for arijit chakraborty ---04/17/2017 08:50:51 AM---H= i, I'm curious to know what's the advantage of system]arijit chakraborty --= -04/17/2017 08:50:51 AM---Hi, I'm curious to know what's the advantage of s= ystemML over pyspark? Especially in terms of perfor From: arijit chakraborty To: "dev@systemml.incubator.apache.org" Date: 04/17/2017 08:50 AM Subject: Distinct Item of a column ________________________________ Hi, I'm curious to know what's the advantage of systemML over pyspark? Especial= ly in terms of performance. I tried looking for some reading on it, but har= dly could find one. Thank you! Arijit --_000_PN1PR01MB07171B57D5FC3A2D1155908FA3190PN1PR01MB0717INDP_--