This tool parses log data and allows to define analysis pipelines for anomaly detection.

Overview

logdata-anomaly-miner Build Status DeepSource

This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis with limited resources and lowest possible permissions to make it suitable for production server use.

AECID Demo – Anomaly Detection with aminer and Reporting to IBM QRadar

Requirements

In order to install logdata-anomaly-miner a Linux system with python >= 3.6 is required. Debian-based distributions are currently recommended.

See requirements.txt for further module dependencies

Installation

Debian

There are Debian packages for logdata-anomaly-miner in the official Debian/Ubuntu repositories.

apt-get update && apt-get install logdata-anomaly-miner

From source

The following command will install the latest stable release:

cd $HOME
wget https://raw.githubusercontent.com/ait-aecid/logdata-anomaly-miner/main/scripts/aminer_install.sh
chmod +x aminer_install.sh
./aminer_install.sh

Docker

For installation with Docker see: Deployment with Docker

Getting started

Here are some resources to read in order to get started with configurations:

Publications

Publications and talks:

A complete list of publications can be found at https://aecid.ait.ac.at/further-information/.

Contribution

We're happily taking patches and other contributions. Please see the following links for how to get started:

Bugs

If you encounter any bugs, please create an issue on Github.

Security

If you discover any security-related issues read the SECURITY.md first and report the issues.

License

GPL-3.0

Comments
  • Multiline support

    Multiline support

    Since issue 372 was closed, I open a new issue for multiline support. See https://github.com/ait-aecid/logdata-anomaly-miner/issues/372

    As I mentioned in the issue, it would be good to have an optional EOL parameter in the config to support simple multiline logs that are clearly separable, e.g., by \n\n that otherwise does not occur. We could also think about supporting more advanced multiline logs, in particular, json formatted logs where each json object spans over several lines rather than a single line. This could be solved by counting brackets, i.e., the ByteStreamAtomizer increases a counter (initially set to 0) for every "{" and decreases it for every "}" (or any other user-defined characters), and passes a log_atom to the parser every time this counter reaches 0.

    enhancement 
    opened by landauermax 15
  • Allowlist and blocklist for detector path lists

    Allowlist and blocklist for detector path lists

    allowlisted_paths in ECD should be named blocklisted_paths, since these paths are not considered for detection.

    allowlisted_paths should also exist, but does the oppsite: Only when one of the paths in the logatom match dictionary contains one of the allowlisted_paths, analysis should be carried out.

    The attribute paths should overrule these lists.

    This feature should be available for all detectors that may be analyzing all available parser matches, such as the VTD.

    enhancement 
    opened by landauermax 15
  • Fix import warnings

    Fix import warnings

    /usr/lib/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from spec or package, falling back on name and path

    return f(*args, **kwds)

    should not occur, when running the aminer.

    bug 
    opened by 4cti0nfi9ure 15
  • %z makes parsing way too slow

    %z makes parsing way too slow

    When using the %z in the parsing model (see slow.txt), I get around 50 lines per second. Without it I get around 1000 lines per second (see fast.txt). There is something wrong with parsing %z in the DateTimeModelElement.

    fast.txt slow.txt train.log config.py.txt

    bug high 
    opened by landauermax 12
  • added nullable functionality to JsonModelElements.

    added nullable functionality to JsonModelElements.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1061 Fixes #1074

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 11
  • Create backups of persistency

    Create backups of persistency

    There should be a parameter for the command line that backups the persistency in regular intervals. Also, there should be a command for the remote control that saves the persistency when executed.

    The persistency should be copied into a directory /var/lib/aminer/backup/yyyy-mm-dd-hh-mm-ss/...

    There should also be the possibility to restore configs, by remote control, config settings, etc.

    enhancement 
    opened by landauermax 11
  • Tabs in logs

    Tabs in logs

    My log file contains tabulators (e.g. System name:\tTESTNAME). However, the byte strings in the parsing models cannot interpret these tabulators (\t): FixedDataModelElement('fixed1', b'System name:\t'),

    How can I make it possible for the tabs to be interpreted correctly?

    opened by tschohanna 10
  • Add overall output for aminer

    Add overall output for aminer

    There should be a way to write everything that the AMiner outputs in a file. For example, in the beginning of the config, a parameter StandardOutput: "/etc/aminer/output.txt" can be set, where all the output (anomalies, errors, etc) is written to in addition to the usual output components. By default, it should be None and not write anything.

    enhancement 
    opened by landauermax 10
  • Warning if two detectors persist on same file

    Warning if two detectors persist on same file

    It is possible to define two detectors of the same type that will end up persisting in the same file - this can especially happen by accident, when the "Default" name is used. We should not prevent it completely, but at least print a warning when two or more detectors persist on the same file.

    enhancement 
    opened by landauermax 9
  • AtomFilterMatchAction YAML support

    AtomFilterMatchAction YAML support

    There should be a way to use a MatchRule so that only logs that match are forwarded to a specific detector, using the AtomFilterMatchAction. This can be done in python configs, but not in yaml configs. Also, tests and documentation is missing.

    enhancement high 
    opened by landauermax 8
  • Paths to JSON list elements

    Paths to JSON list elements

    I have this sample data:

    [email protected]:/home/ubuntu# cat file3.log 
    {"a": ["success", "a.png"]}
    {"a": ["success", "b.png"]}
    {"a": ["fail", "c.png"]}
    {"a": ["success", "c.png"]}
    

    The values in the list should be detected with a value detector. They should not be mixed, i.e., the first and second element in the list are independent.

    I use the following config to parse the file:

    LearnMode: True
    
    LogResourceList:
      - "file:///home/ubuntu/file3.log"
    
    Parser:  
           - id: x
             type: VariableByteDataModelElement
             name: 'x'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - x
    
    Input:
            timestamp_paths: None
            verbose: True
            json_format: True
    
    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/x'
              learn_mode: true
              persistence_id: test
    
    EventHandlers:
            - id: stpe
              json: true
              type: StreamPrinterEventHandler
    

    Note that I use a value detector on the list. The result is as follows:

    [email protected]:/home/ubuntu# cat /var/lib/aminer/NewMatchPathValueDetector/test 
    ["bytes:a.png", "bytes:c.png", "bytes:b.png"]
    

    Only the last value has been learned, but I also want to learn the first element in the array.

    I propose to model all elements of the lists as their own elements, so that the parser looks like this:

    Parser:
           - id: y
             type: FixedWordlistDataModelElement
             name: 'y'
             args:
               - 'success'
               - 'fail'
                 
           - id: x
             type: VariableByteDataModelElement
             name: 'x'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - y
                 - x
    

    and the analysis could look like this, where each element can be addressed individually by an analysis component:

    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/x'
              learn_mode: true
              persistence_id: test
    
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/y'
              learn_mode: true
              persistence_id: test
    

    The current implementation uses a single element to model all elements of the list. This can also be convenient and should be possible by introducing a new element called ListOfElements. It should parse any number of elements in the list with the specified parsing model element. For example, the list of elements here is a list of variable byte elements:

    Parser:
           - id: loe
             type: ListOfElements
             name: 'loe'
             args: z
                 
           - id: z
             type: VariableByteDataModelElement
             name: 'z'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - loe
    

    The ListOfElements element should then assign the index of the element in the JSON list at the end of the path. For example, the following paths can be used in the analysis section:

    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/loe/0'
              learn_mode: true
              persistence_id: test
    
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/loe/1'
              learn_mode: true
              persistence_id: test
    
    enhancement medium 
    opened by landauermax 8
  • extended FrequencyDetector wiki tests.

    extended FrequencyDetector wiki tests.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1008 Fixes #1009

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 0
  • fixed test26 so no fix definition number has to be added.

    fixed test26 so no fix definition number has to be added.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1181

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 0
  • Random test fails when new detector is added

    Random test fails when new detector is added

    When adding a new detector and running the tests, they usually fail at test26_filter_config_errors in YamlConfigTest.py as there is an integer that needs to be incremented. For example, see PR #1180 where this had to be fixed when adding a new detector. It is hard to spot why this test fails as it has nothing to do with the added detector and it is not an indicator of something that needs to be fixed. I therefore suggest to modify this test case so that no matter what integer comes after the "definition" keyword, the test passes. Then adding new detectors in the future should not make it necessary to always update this test.

    test medium 
    opened by landauermax 0
  • Add possibility to run some LogResources as json input and some as normal text input.

    Add possibility to run some LogResources as json input and some as normal text input.

    LogResourceList:
    
       - url: "file:///var/log/apache2/access.log"
       - url: "unix:///var/lib/akafka/aminer.sock"
         type: json  # Konfiguriert den ByteStream
         parser_id: kafka_audit_logs  # Konfiguriert den zugehörigen Parser
    
    
    Parser:
       - id: kafka_audit_logs
         type: AuditDingsParser
    
       - id: ApacheAccessModel
         start: true
    
    opened by ernstleierzopf 0
  • Shorten the build-time for docker builds

    Shorten the build-time for docker builds

    Currently the complete docker image is build at once. This takes a lot of time for each build. We could shorten the build time by inheriting from a pre-built image.

    enhancement 
    opened by whotwagner 0
Releases(V2.5.1)
  • V2.5.1(May 17, 2022)

    Bugfixes:

    • EFD: Fixed problem that appears with empty windows
    • Fixed index out of range if matches are empty in JsonModelElement array.
    • EFD: Fixed problem that appears with empty windows
    • EFD: Enabled immediate detection without training, if both limits are set
    • EFD: Fixed bug related to auto_include_flag
    • Remove spaces in aminer logo
    • ParserCounter: Fixed do_timer
    • Fixed code to allow the usage of AtomFilterMatchAction in yaml configs
    • Fixed JsonModelElement when json object is null
    • Fix incorrect message of charset detector
    • Fix match list handling for json objects
    • Fix incorrect message of charset detector

    Changes:

    • Added nullable functionality to JsonModelElements
    • Added include-directive to supervisord.conf
    • ETD: Output warning when count first exceeds range
    • EFD: Added option to output anomaly when the count first exceeds the range
    • VTD: Added variable type 'range'
    • EFD: Added the function reset_counter
    • EFD: Added option to set the lower and upper limit of the range interval
    • Enhance EFD to consider multiple time windows
    • VTD: Changed the value of parameter num_updates_until_var_reduction to track all variables from False to 0.
    • PAD: Used the binom_test of the scipy package as test if the model should be reinitialized if too few anomalies occur than are expected
    • Add ParsedLogAtom to aminer parser to ensure compatibility with lower versions
    • Added script to add build-id to the version-string
    • Support for installations from source in install-script
    • Fixed and stadardize the persistence time of various detectors
    • Refactoring
    • Improve performance
    • Improve output handling
    • Improved testing
    Source code(tar.gz)
    Source code(zip)
  • V2.5.0(Dec 6, 2021)

    Bugfixes:

    • Fixed bug in YamlConfig

    Changes:

    • Added supervisord to docker
    • Moved unparsed atom handlers to analysis(yamlconfig)
    • Moved new_match_path_detector to analysis(yamlconfig)
    • Refactor: merged all UnparsedHandlers into one python-file
    • Added remotecontrol-command for reopening eventhandlers
    • Added config-parameters for logrotation
    • Improved testing
    Source code(tar.gz)
    Source code(zip)
  • V2.4.2(Nov 24, 2021)

    Bugfixes:

    • PVTID: Fixed output format of previously appeared times
    • VTD: Fixed bugs (static -> discrete)
    • VTD: Fixed persistency-bugs
    • Fixed %z performance issues
    • Fixed error where optional keys with an array type are not parsed when being null
    • Fixed issues with JasonModelElement
    • Fixed persistence handling for ValueRangeDetector
    • PTSAD: Fixed a bug, which occurs, when the ETD stops saving the values of one analyzed path
    • ETD: Fixed the problem when entries of the match_dictionary are not of type MatchElement
    • Fixed error where json data instead of array was parsed successfully.

    Changes:

    • Added multiple parameters to VariableCorrelationDetector
    • Improved VTD
    • PVTID: Renamed parameter time_window_length to time_period_length
    • PVTID: Added check if atom time is None
    • Enhanced output of MTTD and PVTID
    • Improved docker-compose-configuration
    • Improved testing
    • Enhanced PathArimaDetector
    • Improved documentation
    • Improved KernelMsgParsingModel
    • Added pretty print for json output
    • Added the PathArimaDetector
    • TSA: Added functionality to discard arima models with too few log lines per time step
    • TSA: improved confidence calculation
    • TSA: Added the option to force the period length
    • TSA: Automatic selection of the pause area of the ACF
    • Extended EximGenericParsingModel
    • Extended AudispdParsingModel
    Source code(tar.gz)
    Source code(zip)
  • V2.4.1(Jul 23, 2021)

    Bugfixes:

    • Fixed issues with array of arrays in JsonParser
    • Fixed problems with invalid json-output
    • Fixed ValueError in DTME
    • Fixed error with parsing floats in scientific notation with the JsonModelElement.
    • Fixed issue with paths in JsonModelElement
    • Fixed error with \x encoded json
    • Fixed error where EMPTY_ARRAY and EMPTY_OBJECT could not be parsed from the yaml config
    • Fixed a bug in the TSA when encountering a new event type
    • Fixed systemd script
    • Fixed encoding errors when reading yaml configs

    Changes:

    • Add entropy detector
    • Add charset detector
    • Add value range detector
    • Improved ApacheAccessModel, AudispdParsingModel
    • Refactoring
    • Improved documentation
    • Improved testing
    • Improved schema for yaml-config
    • Added EMPTY_STRING option to the JsonModelElement
    • Implemented check to report unparsed atom if ALLOW_ALL is used with data with a type other than list or dict
    Source code(tar.gz)
    Source code(zip)
  • V2.4.0(Jun 10, 2021)

    Bugfixes:

    • Fixed error in JsonModelElement
    • Fixed problems with umlauts in JsonParser
    • Fixed problems with the start element of the ElementValueBranchModelElement
    • Fixed issues with the stat and debug command line parameters
    • Fixed issues if posix acl are not supported by the filesystem
    • Fixed issues with output for non ascii characters
    • Modified kafka-version

    Changes:

    • Improved command-line-options install-script
    • Added documentation
    • Improved VTD CM-Test
    • Improved unit-tests
    • Refactoring
    • Added TSAArimaDetector
    • Improved ParserCount
    • Added the PathValueTimeIntervalDetector
    • Implemented offline mode
    • Added PCA detector
    • Added timeout-paramter to ESD
    Source code(tar.gz)
    Source code(zip)
  • V2.3.1(Apr 8, 2021)

  • V2.3.0(Mar 31, 2021)

    Bugfixes:

    • Changed pyyaml-version to 5.4
    • NewMatchIdValueComboDetector: Fix allow multiple values per id path
    • ByteStreamLineAtomizer: fixed encoding error
    • Fixed too many open directory-handles
    • Added close() function to LogStream

    Changes:

    • Added EventFrequencyDetector
    • Added EventSequenceDetector
    • Added JsonModelElement
    • Added tests for Json-Handling
    • Added command line parameter for update checks
    • Improved testing
    • Splitted yaml-schemas into multiple files
    • Improved support for yaml-config
    • YamlConfig: set verbose default to true
    • Various refactoring
    Source code(tar.gz)
    Source code(zip)
  • V2.2.3(Feb 5, 2021)

  • V2.2.2(Jan 29, 2021)

  • V2.2.1(Jan 26, 2021)

    Bugfixes:

    • Fixed warnigs due to files in Persistency-Directory
    • Fixed ACL-problems in dockerfile and autocreate /var/lib/aminer/log

    Changes:

    • Added simple test for dockercontainer
    • Negate result of the timeout-command. 1 is okay. 0 must be an error
    • Added bullseye-tests
    • Make tmp-dir in debian-bullseye-test and debian-buster-test unique
    Source code(tar.gz)
    Source code(zip)
  • V2.2.0(Dec 23, 2020)

    Changes:

    • Added Dockerfile
    • Addes checks for acl of persistency directory
    • Added VariableCorrelationDetector
    • Added tool for managing multiple persistency files
    • Added supress-list for output
    • Added suspend-mode to remote-control
    • Added requirements.txt
    • Extended documentation
    • Extended yaml-configuration-support
    • Standardize command line parameters
    • Removed --Forground cli parameter
    • Fixed Security warnings by removing functions that allow race-condition
    • Refactoring
    • Ethical correct naming of variables
    • Enhanced testing
    • Added statistic outputs
    • Enhanced status info output
    • Changed global learn_mode behavior
    • Added RemoteControlSocket to yaml-config
    • Reimplemented the default mailnotificationhandler

    Bugfixes:

    • Fixed typos in documentation
    • Fixed issue with the AtomFilter in the yaml-config
    • Fixed order of ETD in yaml-config
    • Fixed various issues in persistency
    Source code(tar.gz)
    Source code(zip)
  • V2.1.0(Nov 5, 2020)

    • Changes:
      • Added VariableTypeDetector,EventTypeDetector and EventCorrelationDetector
      • Added support for unclean format strings in the DateTimeModelElement
      • Added timezones to the DateTimeModelElement
      • Enhanced ApacheAccessModel
      • Yamlconfig: added support for kafka stream
      • Removed cpu limit configuration
      • Various refactoring
      • Yamlconfig: added support for more detectors
      • Added new command-line-parameters
      • Renamed executables to aminer.py and aminerremotecontroly.py
      • Run Aminer in forgroundd-mode per default
      • Added various unit-tests
      • Improved yamlconfig and checks
      • Added start-config for parser to yamlconfig
      • Renamed config templates
      • Removed imports from init.py for better modularity
      • Created AnalysisComponentsPerformanceTests for the EventTypeDetector
      • Extended demo-config
      • Renamed whitelist to allowlist
      • Added warnings for non-existent resources
      • Changed default of auto_include_flag to false
    • Bugfixes:
      • Fixed some exit() in forks
      • Fixed debian files
      • Fixed JSON output of the AffectedLogAtomValues in all detectors
      • Fixed normal output of the NewMatchPathValueDetector
      • Fixed reoccuring alerting in MissingMatchPathValueDetector
    Source code(tar.gz)
    Source code(zip)
  • V2.0.2(Jul 17, 2020)

    • Changes:
      • Added help parameters
      • Added help-screen
      • Added version parameter
      • Adden path and value filter
      • Change time model of ApacheAccessModel for arbitrary time zones
      • Update link to documentation
      • Added SECURITY.md
      • Refactoring
      • Updated man-page
      • Added unit-tests for loadYamlconfig
    • Bugfixes:
      • Fixed header comment type in schema file
      • Fix debian files
    Source code(tar.gz)
    Source code(zip)
  • V2.0.1(Jun 24, 2020)

    • Changes:
      • Updated documentation
      • Updated testcases
      • Updated demos
      • Updated debian files
      • Added copyright headers
      • Added executable bit to AMiner
    Source code(tar.gz)
    Source code(zip)
  • V2.0.0(May 29, 2020)

    • Changes:
      • Updated documentation
      • Added functions getNameByComponent and getIdByComponent to AnalysisChild.py
      • Update DefaultMailNotificationEventHandler.py to python3
      • Extended AMinerRemoteControl
      • Added support for configuration in yaml format
      • Refactoring
      • Added KafkaEventHandler
      • Added JsonConverterHandler
      • Added NewMatchIdValueComboDetector
      • Enabled multiple default timestamp paths
      • Added debug feature ParserCount
      • Added unit and integration tests
      • Added installer script
      • Added VerboseUnparsedHandler
    • Bugfixes including:
      • Fixed dependencies in Debian packaging
      • Fixed typo in various analysis components
      • Fixed import of ModelElementInterface in various parsing components
      • Fixed issues with byte/string comparison
      • Fixed issue in DecimalIntegerValueModelElement, when parsing integer including sign and padding character
      • Fixed unnecessary long blocking time in SimpleMultisourceAtomSync
      • Changed minum matchLen in DelimitedDataModelElement to 1 byte
      • Fixed timezone offset in ModuloTimeMatchRule
      • Minor bugfixes
    Source code(tar.gz)
    Source code(zip)
Owner
AECID
Automatic Event Correlation for Incident Detection
AECID
Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

WeRateDogs Twitter Data from 2015 to 2017 Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data Table of Contents Introduction Proj

Keenan Cooper 1 Jan 12, 2022
This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

superSFS This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot. It is easy-to-use and runing fast. What you s

3 Dec 16, 2022
Pipeline to convert a haploid assembly into diploid

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes

Mikhail Kolmogorov 50 Jan 05, 2023
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
Scraping and analysis of leetcode-compensations page.

Leetcode compensations report Scraping and analysis of leetcode-compensations page.

utsav 96 Jan 01, 2023
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
collect training and calibration data for gaze tracking

Collect Training and Calibration Data for Gaze Tracking This tool allows collecting gaze data necessary for personal calibration or training of eye-tr

Pascal 5 Dec 17, 2022
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

Jeremy Singer-Vine 98 Dec 31, 2022
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

Matrix Profile Foundation 302 Dec 29, 2022
A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

Lehman Garrison 3 Aug 24, 2022
Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Mohammed Hassan 13 Mar 31, 2022
A crude Hy handle on Pandas library

Quickstart Hyenas is a curde Hy handle written on top of Pandas API to allow for more elegant access to data-scientist's powerhouse that is Pandas. In

Peter Výboch 4 Sep 05, 2022
Vaex library for Big Data Analytics of an Airline dataset

Vaex-Big-Data-Analytics-for-Airline-data A Python notebook (ipynb) created in Jupyter Notebook, which utilizes the Vaex library for Big Data Analytics

Nikolas Petrou 1 Feb 13, 2022
Extract Thailand COVID-19 Cluster data from daily briefing pdf.

Thailand COVID-19 Cluster Data Extraction About Extract Clusters from Thailand Daily COVID-19 briefing PDF Download latest data Here. Data will be upd

Noppakorn Jiravaranun 5 Sep 27, 2021
follow-analyzer helps GitHub users analyze their following and followers relationship

follow-analyzer follow-analyzer helps GitHub users analyze their following and followers relationship by providing a report in html format which conta

Yin-Chiuan Chen 2 May 02, 2022
Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

kana sudheer reddy 2 Oct 01, 2021
My first Python project is a simple Mad Libs program.

Python CLI Mad Libs Game My first Python project is a simple Mad Libs program. Mad Libs is a phrasal template word game created by Leonard Stern and R

Carson Johnson 1 Dec 10, 2021
WithPipe is a simple utility for functional piping in Python.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Michael Milton 1 Oct 26, 2021