metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cestella <...@git.apache.org>
Subject [GitHub] metron pull request #878: METRON-1377: Stellar function to generate typosqua...
Date Tue, 19 Dec 2017 22:32:29 GMT
GitHub user cestella opened a pull request:

    https://github.com/apache/metron/pull/878

    METRON-1377: Stellar function to generate typosquatted domains (similar to dnstwist)

    ## Contributor Comments
    As a component of a strategy to detect [Typosquatting]( https://en.wikipedia.org/wiki/Typosquatting),
generating typosquatted domains is necessary. As such, a stellar function which replicates
the functionality of dnstwist would be of use.
    
    You can validate this in the REPL via:
    ```
    {17:10}[system]~/Documents/workspace/metron/fork/incubator-metron:typosquat ✗ ➭ mvn
exec:java -Dexec.mainClass="org.apache.metron.stellar.common.shell.StellarShell" -pl metron-platform/metron-common
    [INFO] Scanning for projects...
    [INFO]
    [INFO] ------------------------------------------------------------------------
    [INFO] Building metron-common 0.4.2
    [INFO] ------------------------------------------------------------------------
    [INFO]
    [INFO] --- exec-maven-plugin:1.5.0:java (default-cli) @ metron-common ---
    log4j:WARN No appenders could be found for logger (org.apache.metron.stellar.dsl.functions.resolver.BaseFunctionResolver).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    Stellar, Go!
    Please note that functions are loading lazily in the background and will be unavailable
until loaded fully.
    [Stellar]>>> Functions loaded, you may refer to functions now...
    
    [Stellar]>>>
    [Stellar]>>> filter := REDUCE( DOMAIN_TYPOSQUAT( 'amazon' ), (s, d) -> BLOOM_ADD(s,
d), BLOOM_INIT())
    [Stellar]>>> BLOOM_EXISTS( filter, 'amazon')
    true
    [Stellar]>>> BLOOM_EXISTS( filter, 'google')
    false
    [Stellar]>>> BLOOM_EXISTS( filter, 'amazoon')
    true
    [Stellar]>>>
    ```
    Note: By itself, this is of some interest, but is not a complete solution.  I suggest
as a follow-on to this, two JIRAs:
    1. the ability through a new mode for the flat-file loader to write out serialized objects
(e.g. a bloom filter containing all the typosquatted domains for a CSV of domains)
    2. the ability to take a serialized object from HDFS and load it into memory and return
it (e.g. `OBJECT_GET(path)` (with a cache in front of it)
    
    With these, in conjunction with the stellar function from this PR, we should have the
ability to scalably detect typosquatted domains at the enrichment phase:
    1. with the flat file loader, generate a bloom filter containing the typosquatted domains
from the set of known good domains
    2. upload to HDFS
    3. As an enrichment:
    ```
    is_typosquatted := BLOOM_EXISTS(OBJECT_GET('/apps/metron/typosquat/alexa1m.ser', domain))
    ```
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow these guidelines
and ask you to double check the following:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at
[Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).

    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are
trying to resolve? Pay particular attention to the hyphen "-" character.
    - [x] Has your PR been rebased against the latest commit within the target branch (typically
master)?
    
    
    ### For code changes:
    - [x] Have you included steps to reproduce the behavior or problem that is being changed
or addressed?
    - [x] Have you included steps or a guide to how the change may be verified and tested
manually?
    - [x] Have you ensured that the full suite of tests and checks have been executed in the
root metron folder via:
      ```
      mvn -q clean integration-test install && build_utils/verify_licenses.sh 
      ```
    
    - [x] Have you written or updated unit tests and or integration tests to verify your changes?
    - [x] If adding new dependencies to the code, are these dependencies licensed in a way
that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?

    - [x] Have you verified the basic functionality of the build by building and running locally
with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [x] Have you ensured that format looks appropriate for the output in which it is rendered
by building and verifying the site-book? If not then run the following commands and the verify
changes via `site-book/target/site/index.html`:
    
      ```
      cd site-book
      mvn site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and
submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal
repository such that your branches are built there before submitting a pull request.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron typosquat

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #878
    
----
commit a95014ed1e145f9133dd95dcbfbf7e9212401fef
Author: cstella <cestella@...>
Date:   2017-12-19T22:26:03Z

    METRON-1377: Stellar function to generate typosquatted domains (similar to dnstwist)

----


---

Mime
View raw message