spot-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edwards, Brandon" <>
Subject Spot Suspicious Connects Description and questions related to 'feedback' from UI to ML
Date Thu, 25 May 2017 17:27:50 GMT
Hi all,

I am attaching the document that describes how Spot uses LDA in order to perform anomaly detection
on network events. I have also received multiple questions related to how the ‘user scoring’
(‘feedback’) of particular items in the suspicious connects report (in the UI layer) is
used in ML. We have not provided much detail on this functionality in the attached document.
I thought I’d put an explanation out there and we can discuss questions related to my explanation
and discuss what additional info should be included in the attached document.

The Spot team feels that changes are needed to this ‘feedback’ functionality, and see
these changes as happening concurrent with improvements to the ability for context from an
LDA model trained on a given batch of data to be carried forward to the next training run
(or even training in a streaming use case). The value of ‘feedback’ is dependent on the
quality of the model-context we can carry over.

The idea for feedback is as follows. The items that are scored with a 1 (i.e. the user identifies
the item as benign and so does not want to see it in the suspicious connects report anymore)
will be used for letting the machine learning component know that such an entry should not
be considered as suspicious anymore. Currently this is done by injecting artificial log entries
into the next batch of data so that LDA sees many such entries and therefore no longer sees
them as anomalies.

We have ideas for other ways to allow this functionality - for example we could filter entries
matching the identified pattern from the next batch run BEFORE ML runs on the batch. For items
that are scored by the user in the UI as ‘3’ (for example the user sees an ip as so suspicious
that we want to see all future log entries associated to that ip) we could filter future items
matching such a pattern in order to skip ML and instead report them in a separate pane of
the UI or insert them to the top of the most suspicious events.

Comments, Questions?
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message