This tool parses log data and allows to define analysis pipelines for anomaly detection.

Last update: Nov 27, 2022

Overview

logdata-anomaly-miner

This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis with limited resources and lowest possible permissions to make it suitable for production server use.

Requirements

In order to install logdata-anomaly-miner a Linux system with python >= 3.6 is required. Debian-based distributions are currently recommended.

See requirements.txt for further module dependencies

Installation

Debian

There are Debian packages for logdata-anomaly-miner in the official Debian/Ubuntu repositories.

apt-get update && apt-get install logdata-anomaly-miner

From source

The following command will install the latest stable release:

cd $HOME
wget https://raw.githubusercontent.com/ait-aecid/logdata-anomaly-miner/main/scripts/aminer_install.sh
chmod +x aminer_install.sh
./aminer_install.sh

Docker

For installation with Docker see: Deployment with Docker

Getting started

Here are some resources to read in order to get started with configurations:

Publications

Publications and talks:

Wurzenberger M., Skopik F., Settanni G., Fiedler R. (2018): AECID: A Self-learning Anomaly Detection Approach Based on Light-weight Log Parser Models. 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), January 22-24, 2018, Funchal, Madeira - Portugal. INSTICC. [PDF]
Wurzenberger M., Landauer M., Skopik F., Kastner W. (2019): AECID-PG: AECID-PG: A Tree-Based Log Parser Generator To Enable Log Analysis. 4th IEEE/IFIP International Workshop on Analytics for Network and Service Management (AnNet 2019) in conjunction with the IFIP/IEEE International Symposium on Integrated Network Management (IM), April 8, 2019, Washington D.C., USA. IEEE. [PDF]
Landauer M., Skopik F., Wurzenberger M., Hotwagner W., Rauber A. (2019): A Framework for Cyber Threat Intelligence Extraction from Raw Log Data. International Workshop on Big Data Analytics for Cyber Threat Hunting (CyberHunt 2019) in conjunction with the IEEE International Conference on Big Data 2019, December 9-12, 2019, Los Angeles, CA, USA. IEEE. [PDF]

A complete list of publications can be found at https://aecid.ait.ac.at/further-information/.

Contribution

We're happily taking patches and other contributions. Please see the following links for how to get started:

Bugs

If you encounter any bugs, please create an issue on Github.

Security

If you discover any security-related issues read the SECURITY.md first and report the issues.

License

GPL-3.0

Comments

Multiline support

Since issue 372 was closed, I open a new issue for multiline support. See https://github.com/ait-aecid/logdata-anomaly-miner/issues/372

As I mentioned in the issue, it would be good to have an optional EOL parameter in the config to support simple multiline logs that are clearly separable, e.g., by \n\n that otherwise does not occur. We could also think about supporting more advanced multiline logs, in particular, json formatted logs where each json object spans over several lines rather than a single line. This could be solved by counting brackets, i.e., the ByteStreamAtomizer increases a counter (initially set to 0) for every "{" and decreases it for every "}" (or any other user-defined characters), and passes a log_atom to the parser every time this counter reaches 0.
enhancement

opened by landauermax 15
Allowlist and blocklist for detector path lists

allowlisted_paths in ECD should be named blocklisted_paths, since these paths are not considered for detection.

allowlisted_paths should also exist, but does the oppsite: Only when one of the paths in the logatom match dictionary contains one of the allowlisted_paths, analysis should be carried out.

The attribute paths should overrule these lists.

This feature should be available for all detectors that may be analyzing all available parser matches, such as the VTD.
enhancement

opened by landauermax 15
Fix import warnings

/usr/lib/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from spec or package, falling back on name and path

return f(*args, **kwds)

should not occur, when running the aminer.
bug

opened by 4cti0nfi9ure 15
%z makes parsing way too slow

When using the %z in the parsing model (see slow.txt), I get around 50 lines per second. Without it I get around 1000 lines per second (see fast.txt). There is something wrong with parsing %z in the DateTimeModelElement.

fast.txt slow.txt train.log config.py.txt
bug high

opened by landauermax 12
added nullable functionality to JsonModelElements.
Make sure these boxes are signed before submitting your Pull Request -- thank you.

Must haves

[x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow

[x] Issues exist for this PR

[x] I added related issues using the "Fixes #"-notations

[x] This Pull-Requests merges into the "development"-branch

Fixes #1061 Fixes #1074

Submission specific

[ ] This PR introduces breaking changes

[ ] My change requires a change to the documentation

[ ] I have updated the documentation accordingly

[ ] I have added tests to cover my changes

[ ] All new and existing tests passed

Describe changes:
opened by ernstleierzopf 11
Create backups of persistency

There should be a parameter for the command line that backups the persistency in regular intervals. Also, there should be a command for the remote control that saves the persistency when executed.

The persistency should be copied into a directory /var/lib/aminer/backup/yyyy-mm-dd-hh-mm-ss/...

There should also be the possibility to restore configs, by remote control, config settings, etc.
enhancement

opened by landauermax 11
Tabs in logs

My log file contains tabulators (e.g. System name:\tTESTNAME). However, the byte strings in the parsing models cannot interpret these tabulators (\t): FixedDataModelElement('fixed1', b'System name:\t'),

How can I make it possible for the tabs to be interpreted correctly?

opened by tschohanna 10
Add overall output for aminer

There should be a way to write everything that the AMiner outputs in a file. For example, in the beginning of the config, a parameter StandardOutput: "/etc/aminer/output.txt" can be set, where all the output (anomalies, errors, etc) is written to in addition to the usual output components. By default, it should be None and not write anything.
enhancement

opened by landauermax 10
Warning if two detectors persist on same file

It is possible to define two detectors of the same type that will end up persisting in the same file - this can especially happen by accident, when the "Default" name is used. We should not prevent it completely, but at least print a warning when two or more detectors persist on the same file.
enhancement

opened by landauermax 9
AtomFilterMatchAction YAML support

There should be a way to use a MatchRule so that only logs that match are forwarded to a specific detector, using the AtomFilterMatchAction. This can be done in python configs, but not in yaml configs. Also, tests and documentation is missing.
enhancement high

opened by landauermax 8

Paths to JSON list elements

I have this sample data:

[email protected]:/home/ubuntu# cat file3.log 
{"a": ["success", "a.png"]}
{"a": ["success", "b.png"]}
{"a": ["fail", "c.png"]}
{"a": ["success", "c.png"]}

The values in the list should be detected with a value detector. They should not be mixed, i.e., the first and second element in the list are independent.

I use the following config to parse the file:

LearnMode: True

LogResourceList:
  - "file:///home/ubuntu/file3.log"

Parser:  
       - id: x
         type: VariableByteDataModelElement
         name: 'x'
         args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'

       - id: json
         start: True
         type: JsonModelElement
         name: 'model'
         key_parser_dict:
           "a": 
             - x

Input:
        timestamp_paths: None
        verbose: True
        json_format: True

Analysis:
        - id: vd
          type: NewMatchPathValueDetector
          paths:
              - '/model/x'
          learn_mode: true
          persistence_id: test

EventHandlers:
        - id: stpe
          json: true
          type: StreamPrinterEventHandler

Note that I use a value detector on the list. The result is as follows:

[email protected]:/home/ubuntu# cat /var/lib/aminer/NewMatchPathValueDetector/test 
["bytes:a.png", "bytes:c.png", "bytes:b.png"]

Only the last value has been learned, but I also want to learn the first element in the array.

I propose to model all elements of the lists as their own elements, so that the parser looks like this:

Parser:
       - id: y
         type: FixedWordlistDataModelElement
         name: 'y'
         args:
           - 'success'
           - 'fail'
             
       - id: x
         type: VariableByteDataModelElement
         name: 'x'
         args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'

       - id: json
         start: True
         type: JsonModelElement
         name: 'model'
         key_parser_dict:
           "a": 
             - y
             - x

and the analysis could look like this, where each element can be addressed individually by an analysis component:

Analysis:
        - id: vd
          type: NewMatchPathValueDetector
          paths:
              - '/model/x'
          learn_mode: true
          persistence_id: test

        - id: vd
          type: NewMatchPathValueDetector
          paths:
              - '/model/y'
          learn_mode: true
          persistence_id: test

The current implementation uses a single element to model all elements of the list. This can also be convenient and should be possible by introducing a new element called ListOfElements. It should parse any number of elements in the list with the specified parsing model element. For example, the list of elements here is a list of variable byte elements:

Parser:
       - id: loe
         type: ListOfElements
         name: 'loe'
         args: z
             
       - id: z
         type: VariableByteDataModelElement
         name: 'z'
         args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'

       - id: json
         start: True
         type: JsonModelElement
         name: 'model'
         key_parser_dict:
           "a": 
             - loe

The ListOfElements element should then assign the index of the element in the JSON list at the end of the path. For example, the following paths can be used in the analysis section:

Analysis:
        - id: vd
          type: NewMatchPathValueDetector
          paths:
              - '/model/loe/0'
          learn_mode: true
          persistence_id: test

        - id: vd
          type: NewMatchPathValueDetector
          paths:
              - '/model/loe/1'
          learn_mode: true
          persistence_id: test

enhancement medium

opened by landauermax 8

extended FrequencyDetector wiki tests.
Make sure these boxes are signed before submitting your Pull Request -- thank you.

Must haves

[x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow

[x] Issues exist for this PR

[x] I added related issues using the "Fixes #"-notations

[x] This Pull-Requests merges into the "development"-branch

Fixes #1008 Fixes #1009

Submission specific

[ ] This PR introduces breaking changes

[ ] My change requires a change to the documentation

[ ] I have updated the documentation accordingly

[ ] I have added tests to cover my changes

[ ] All new and existing tests passed

Describe changes:

Please change the attached wiki files first in the development branch. HowTo-Create-your-own-FrequencyDetector.md HowTo-Create-your-own-SequenceDetector.md
opened by ernstleierzopf 0
fixed test26 so no fix definition number has to be added.
Make sure these boxes are signed before submitting your Pull Request -- thank you.

Must haves

[x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow

[x] Issues exist for this PR

[x] I added related issues using the "Fixes #"-notations

[x] This Pull-Requests merges into the "development"-branch

Fixes #1181

Submission specific

[ ] This PR introduces breaking changes

[ ] My change requires a change to the documentation

[ ] I have updated the documentation accordingly

[ ] I have added tests to cover my changes

[ ] All new and existing tests passed

Describe changes:
opened by ernstleierzopf 0
Random test fails when new detector is added

When adding a new detector and running the tests, they usually fail at test26_filter_config_errors in YamlConfigTest.py as there is an integer that needs to be incremented. For example, see PR #1180 where this had to be fixed when adding a new detector. It is hard to spot why this test fails as it has nothing to do with the added detector and it is not an indicator of something that needs to be fixed. I therefore suggest to modify this test case so that no matter what integer comes after the "definition" keyword, the test passes. Then adding new detectors in the future should not make it necessary to always update this test.
test medium

opened by landauermax 0

Add possibility to run some LogResources as json input and some as normal text input.

LogResourceList:

   - url: "file:///var/log/apache2/access.log"
   - url: "unix:///var/lib/akafka/aminer.sock"
     type: json  # Konfiguriert den ByteStream
     parser_id: kafka_audit_logs  # Konfiguriert den zugehörigen Parser


Parser:
   - id: kafka_audit_logs
     type: AuditDingsParser

   - id: ApacheAccessModel
     start: true

opened by ernstleierzopf 0

Shorten the build-time for docker builds

Currently the complete docker image is build at once. This takes a lot of time for each build. We could shorten the build time by inheriting from a pre-built image.
enhancement

opened by whotwagner 0

Releases(V2.5.1)

V2.5.1(May 17, 2022)
Bugfixes:

EFD: Fixed problem that appears with empty windows

Fixed index out of range if matches are empty in JsonModelElement array.

EFD: Fixed problem that appears with empty windows

EFD: Enabled immediate detection without training, if both limits are set

EFD: Fixed bug related to auto_include_flag

Remove spaces in aminer logo

ParserCounter: Fixed do_timer

Fixed code to allow the usage of AtomFilterMatchAction in yaml configs

Fixed JsonModelElement when json object is null

Fix incorrect message of charset detector

Fix match list handling for json objects

Fix incorrect message of charset detector

Changes:

Added nullable functionality to JsonModelElements

Added include-directive to supervisord.conf

ETD: Output warning when count first exceeds range

EFD: Added option to output anomaly when the count first exceeds the range

VTD: Added variable type 'range'

EFD: Added the function reset_counter

EFD: Added option to set the lower and upper limit of the range interval

Enhance EFD to consider multiple time windows

VTD: Changed the value of parameter num_updates_until_var_reduction to track all variables from False to 0.

PAD: Used the binom_test of the scipy package as test if the model should be reinitialized if too few anomalies occur than are expected

Add ParsedLogAtom to aminer parser to ensure compatibility with lower versions

Added script to add build-id to the version-string

Support for installations from source in install-script

Fixed and stadardize the persistence time of various detectors

Refactoring

Improve performance

Improve output handling

Improved testing

Source code(tar.gz)
Source code(zip)
V2.5.0(Dec 6, 2021)
Bugfixes:

Fixed bug in YamlConfig

Changes:

Added supervisord to docker

Moved unparsed atom handlers to analysis(yamlconfig)

Moved new_match_path_detector to analysis(yamlconfig)

Refactor: merged all UnparsedHandlers into one python-file

Added remotecontrol-command for reopening eventhandlers

Added config-parameters for logrotation

Improved testing

Source code(tar.gz)
Source code(zip)
V2.4.2(Nov 24, 2021)
Bugfixes:

PVTID: Fixed output format of previously appeared times

VTD: Fixed bugs (static -> discrete)

VTD: Fixed persistency-bugs

Fixed %z performance issues

Fixed error where optional keys with an array type are not parsed when being null

Fixed issues with JasonModelElement

Fixed persistence handling for ValueRangeDetector

PTSAD: Fixed a bug, which occurs, when the ETD stops saving the values of one analyzed path

ETD: Fixed the problem when entries of the match_dictionary are not of type MatchElement

Fixed error where json data instead of array was parsed successfully.

Changes:

Added multiple parameters to VariableCorrelationDetector

Improved VTD

PVTID: Renamed parameter time_window_length to time_period_length

PVTID: Added check if atom time is None

Enhanced output of MTTD and PVTID

Improved docker-compose-configuration

Improved testing

Enhanced PathArimaDetector

Improved documentation

Improved KernelMsgParsingModel

Added pretty print for json output

Added the PathArimaDetector

TSA: Added functionality to discard arima models with too few log lines per time step

TSA: improved confidence calculation

TSA: Added the option to force the period length

TSA: Automatic selection of the pause area of the ACF

Extended EximGenericParsingModel

Extended AudispdParsingModel

Source code(tar.gz)
Source code(zip)
V2.4.1(Jul 23, 2021)
Bugfixes:

Fixed issues with array of arrays in JsonParser

Fixed problems with invalid json-output

Fixed ValueError in DTME

Fixed error with parsing floats in scientific notation with the JsonModelElement.

Fixed issue with paths in JsonModelElement

Fixed error with \x encoded json

Fixed error where EMPTY_ARRAY and EMPTY_OBJECT could not be parsed from the yaml config

Fixed a bug in the TSA when encountering a new event type

Fixed systemd script

Fixed encoding errors when reading yaml configs

Changes:

Add entropy detector

Add charset detector

Add value range detector

Improved ApacheAccessModel, AudispdParsingModel

Refactoring

Improved documentation

Improved testing

Improved schema for yaml-config

Added EMPTY_STRING option to the JsonModelElement

Implemented check to report unparsed atom if ALLOW_ALL is used with data with a type other than list or dict

Source code(tar.gz)
Source code(zip)
V2.4.0(Jun 10, 2021)
Bugfixes:

Fixed error in JsonModelElement

Fixed problems with umlauts in JsonParser

Fixed problems with the start element of the ElementValueBranchModelElement

Fixed issues with the stat and debug command line parameters

Fixed issues if posix acl are not supported by the filesystem

Fixed issues with output for non ascii characters

Modified kafka-version

Changes:

Improved command-line-options install-script

Added documentation

Improved VTD CM-Test

Improved unit-tests

Refactoring

Added TSAArimaDetector

Improved ParserCount

Added the PathValueTimeIntervalDetector

Implemented offline mode

Added PCA detector

Added timeout-paramter to ESD

Source code(tar.gz)
Source code(zip)
V2.3.1(Apr 8, 2021)
Bugfixes:

Replaced username and groupname with uid and gid for chown()

Removed hardcoded username and groupname

Source code(tar.gz)
Source code(zip)
V2.3.0(Mar 31, 2021)
Bugfixes:

Changed pyyaml-version to 5.4

NewMatchIdValueComboDetector: Fix allow multiple values per id path

ByteStreamLineAtomizer: fixed encoding error

Fixed too many open directory-handles

Added close() function to LogStream

Changes:

Added EventFrequencyDetector

Added EventSequenceDetector

Added JsonModelElement

Added tests for Json-Handling

Added command line parameter for update checks

Improved testing

Splitted yaml-schemas into multiple files

Improved support for yaml-config

YamlConfig: set verbose default to true

Various refactoring

Source code(tar.gz)
Source code(zip)
V2.2.3(Feb 5, 2021)
Bugfixes:

Fixed error with KafkaEventHandler in YamlConfig

Source code(tar.gz)
Source code(zip)
V2.2.2(Jan 29, 2021)
Bugfixes:

fixed kafka-version in test-scripts Changes:

various changes for debian-release

Source code(tar.gz)
Source code(zip)
V2.2.1(Jan 26, 2021)
Bugfixes:

Fixed warnigs due to files in Persistency-Directory

Fixed ACL-problems in dockerfile and autocreate /var/lib/aminer/log

Changes:

Added simple test for dockercontainer

Negate result of the timeout-command. 1 is okay. 0 must be an error

Added bullseye-tests

Make tmp-dir in debian-bullseye-test and debian-buster-test unique

Source code(tar.gz)
Source code(zip)
V2.2.0(Dec 23, 2020)
Changes:

Added Dockerfile

Addes checks for acl of persistency directory

Added VariableCorrelationDetector

Added tool for managing multiple persistency files

Added supress-list for output

Added suspend-mode to remote-control

Added requirements.txt

Extended documentation

Extended yaml-configuration-support

Standardize command line parameters

Removed --Forground cli parameter

Fixed Security warnings by removing functions that allow race-condition

Refactoring

Ethical correct naming of variables

Enhanced testing

Added statistic outputs

Enhanced status info output

Changed global learn_mode behavior

Added RemoteControlSocket to yaml-config

Reimplemented the default mailnotificationhandler

Bugfixes:

Fixed typos in documentation

Fixed issue with the AtomFilter in the yaml-config

Fixed order of ETD in yaml-config

Fixed various issues in persistency

Source code(tar.gz)
Source code(zip)
V2.1.0(Nov 5, 2020)
Changes:

Added VariableTypeDetector,EventTypeDetector and EventCorrelationDetector

Added support for unclean format strings in the DateTimeModelElement

Added timezones to the DateTimeModelElement

Enhanced ApacheAccessModel

Yamlconfig: added support for kafka stream

Removed cpu limit configuration

Various refactoring

Yamlconfig: added support for more detectors

Added new command-line-parameters

Renamed executables to aminer.py and aminerremotecontroly.py

Run Aminer in forgroundd-mode per default

Added various unit-tests

Improved yamlconfig and checks

Added start-config for parser to yamlconfig

Renamed config templates

Removed imports from init.py for better modularity

Created AnalysisComponentsPerformanceTests for the EventTypeDetector

Extended demo-config

Renamed whitelist to allowlist

Added warnings for non-existent resources

Changed default of auto_include_flag to false

Bugfixes:

Fixed some exit() in forks

Fixed debian files

Fixed JSON output of the AffectedLogAtomValues in all detectors

Fixed normal output of the NewMatchPathValueDetector

Fixed reoccuring alerting in MissingMatchPathValueDetector

Source code(tar.gz)
Source code(zip)
V2.0.2(Jul 17, 2020)
Changes:

Added help parameters

Added help-screen

Added version parameter

Adden path and value filter

Change time model of ApacheAccessModel for arbitrary time zones

Update link to documentation

Added SECURITY.md

Refactoring

Updated man-page

Added unit-tests for loadYamlconfig

Bugfixes:

Fixed header comment type in schema file

Fix debian files

Source code(tar.gz)
Source code(zip)
V2.0.1(Jun 24, 2020)
Changes:

Updated documentation

Updated testcases

Updated demos

Updated debian files

Added copyright headers

Added executable bit to AMiner

Source code(tar.gz)
Source code(zip)
V2.0.0(May 29, 2020)
Changes:

Updated documentation

Added functions getNameByComponent and getIdByComponent to AnalysisChild.py

Update DefaultMailNotificationEventHandler.py to python3

Extended AMinerRemoteControl

Added support for configuration in yaml format

Refactoring

Added KafkaEventHandler

Added JsonConverterHandler

Added NewMatchIdValueComboDetector

Enabled multiple default timestamp paths

Added debug feature ParserCount

Added unit and integration tests

Added installer script

Added VerboseUnparsedHandler

Bugfixes including:

Fixed dependencies in Debian packaging

Fixed typo in various analysis components

Fixed import of ModelElementInterface in various parsing components

Fixed issues with byte/string comparison

Fixed issue in DecimalIntegerValueModelElement, when parsing integer including sign and padding character

Fixed unnecessary long blocking time in SimpleMultisourceAtomSync

Changed minum matchLen in DelimitedDataModelElement to 1 byte

Fixed timezone offset in ModuloTimeMatchRule

Minor bugfixes

Source code(tar.gz)
Source code(zip)

Owner

AECID

Automatic Event Correlation for Incident Detection

GitHub Repository

This tool parses log data and allows to define analysis pipelines for anomaly detection.

Related tags

Overview

logdata-anomaly-miner

Requirements

Installation

Debian

From source

Docker

Getting started

Publications

Contribution

Bugs

Security

License

Comments

Make sure these boxes are signed before submitting your Pull Request -- thank you.

Must haves

Submission specific

Describe changes:

Make sure these boxes are signed before submitting your Pull Request -- thank you.

Must haves

Submission specific

Describe changes:

Make sure these boxes are signed before submitting your Pull Request -- thank you.

Must haves

Submission specific

Describe changes:

Releases(V2.5.1)

V2.5.1(May 17, 2022)

V2.5.0(Dec 6, 2021)

V2.4.2(Nov 24, 2021)

V2.4.1(Jul 23, 2021)

V2.4.0(Jun 10, 2021)

V2.3.1(Apr 8, 2021)

V2.3.0(Mar 31, 2021)

V2.2.3(Feb 5, 2021)

V2.2.2(Jan 29, 2021)

V2.2.1(Jan 26, 2021)

Bugfixes:

Changes:

V2.2.0(Dec 23, 2020)

Changes:

Bugfixes:

V2.1.0(Nov 5, 2020)

V2.0.2(Jul 17, 2020)

V2.0.1(Jun 24, 2020)

V2.0.0(May 29, 2020)

Owner

AECID

Scraping and analysis of leetcode-compensations page.

Zipline, a Pythonic Algorithmic Trading Library

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Python utility to extract differences between two pandas dataframes.

Data imputations library to preprocess datasets with missing data

Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Python Practicum - prepare for your Data Science interview or get a refresher.

InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

Provide a market analysis (R)

Single-Cell Analysis in Python. Scales to >1M cells.

Candlestick Pattern Recognition with Python and TA-Lib

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.

Approximate Nearest Neighbor Search for Sparse Data in Python!

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Repository created with LinkedIn profile analysis project done

CS50 pset9: Using flask API to create a web application to exchange stocks' shares.

Deep universal probabilistic programming with Python and PyTorch