metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Mitra Kandikonda <r...@hortonworks.com>
Subject [DISCUSS] Adding new fields to stored records
Date Mon, 27 Mar 2017 09:26:53 GMT
Hi All,

I would like to start a discussion around what would be the good approach to append data to
the existing records that are processed by Metron. Here are few thoughts that I have to start
with.

1.Store the new fields just in ES and allow records to be different in ES and HDFS.
2.Store the new fields in HBASE along with ES.
a.We can create a new table in HBASE that stores  guid + key (or any other unique key of the
record) and the new value.
b. The table name will be same as the file name that originally contained the record.
3. Store new fields in ES and in HDFS.
a. The new fields will be stored in same file as the original record.
b. The new fields are stored along with guid of the record.
c. Any changes to the values of the fields will have a new record instead of modifying the
existing record.
d. To read the latest value for a record we need to parse the entire file.
Ex: File  enrichment-null-0-0-1490335748664.json has 3 records
{“key1”: “value1, “key2”: “value2”, “key3”: “value3” , “guid” :
“id1"}
{“key1”: “value11, “key2”: “value21”, “key3”: “value31”  , “guid”
: “id2"}
{“key1”: “value12, “key2”: “value22”, “key3”: “value32”  , “guid”
: “id3"}
Now we have to store new field for record with guid id2 the new file looks as follows
{“key1”: “value1, “key2”: “value2”, “key3”: “value3” }
{“key1”: “value11, “key2”: “value21”, “key3”: “value31” }
{“key1”: “value12, “key2”: “value22”, “key3”: “value32” }
{“guid”: “id2", “newKey”: “newValue”}
Again the value of newKey for record has been changed to newestValue the new file looks as
follows
{“key1”: “value1, “key2”: “value2”, “key3”: “value3” }
{“key1”: “value11, “key2”: “value21”, “key3”: “value31” }
{“key1”: “value12, “key2”: “value22”, “key3”: “value32” }
{“guid”: “id2", “newKey”: “newValue”}
{“guid”: “id2", “newKey”: “newestValue”}
4. Store the new fields in ES and in HDFS.
a. The new fields will be stored in new file than the file where the record originally existed.
b. The name of file will be the same  as the file where the record is originally present but
it will be in a different folder.
c. The new fields are stored along with guid of the record.
c. new value to an existing field or a new field would be appended to the end of the file
instead of modifying a record.
d. To read the latest value for a record we need to parse the entire file.
Ex: File  /apps/metron/indexing/indexed/snort/enrichment-null-0-0-1490335746765.json has following
records
{“key1”: “value1, “key2”: “value2”, “key3”: “value3” , “guid” :
“id1"}
{“key1”: “value11, “key2”: “value21”, “key3”: “value31”  , “guid”
: “id2"}
{“key1”: “value12, “key2”: “value22”, “key3”: “value32”  , “guid”
: “id3"}
Now we have a ’newKey’ and ’newValue’ to be stored for record with guid id2. The file
enrichment-null-0-0-1490335748664.json will look the same but we will have a new file
/apps/metron/augmented/snort/enrichment-null-0-0-1490335746765.json with the following content
{“guid”: “id2", “newKey”: “newValue”}
Again the value of newKey is changed to  newestValue  and there is a new key called newestKey
the file looks as follows
{“guid”: “id2", “newKey”: “newValue”}
{“guid”: “id2", “newKey”: “newestValue”}
{“guid”: “id2", “newestKey”: “nextNewestValue”}

-Raghu



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message