spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amith sha <amithsh...@gmail.com>
Subject Spark Streaming
Date Wed, 30 Sep 2015 11:50:51 GMT
Hi All,
I am planning to handle streaming data from kafka to spark Using python code
Earlier using my own log files i handled them in spark using INDEX But
in case of Apache log
I cannot prefer index because by splitting with whitespace, index will
be missed so
Is that Possible to use regex in TrasformRDD ?
OR
Any other possible ways to for different groups
ex:-
THIS IS THE APACHE LOG

[u'10.10.80.1', u'-', u'-', u'[08/Sep/2015:12:15:15', u'+0530]',
u'"GET', u'/', u'HTTP/1.1"', u'200', u'1213', u'"-"', u'"Mozilla/5.0',
u'(Windows', u'NT', u'10.0;', u'WOW64)', u'AppleWebKit/537.36',
u'(KHTML,', u'like', u'Gecko)', u'Chrome/45.0.2454.85',
u'Safari/537.36"']

I NEED LIKE THIS
IP:            10.10.80.1
IDENTITY:          -
USER:              -
TIME:                    08/Sep/2015:12:15:15 +0530
SERVER MESSAGE:   GET /favicon.ico HTTP/1.1
STATUS:    404
SIZE:    514
REFERER:            http://xxxxxxxxxxxxxxxx.com/
CLIENT MESSAGE:
Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/45.0.2454.85 Safari/537.36





Thanks & Regards
Amithsha

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message