Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

Overview

Baselining, on steroids!

Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

The project aims to offer an open-source alternative to the famous NSRL or HashSets and allows you to generate baselines from your own systems. Plus, it is cross-platform, so you can use it the same way whether on a Windows or a GNU/Linux system.

Currently available extractors:

  • fs: Extracts filesystem-related metadata.
  • hash: Computes several hashes from the entry's data (e.g. MD5, SHA-1, ssdeep).
  • pe: Extracts detailed information from Portable Executable (PE) files.

Table of contents

Installation

Installing from PyPI

Baseline is currently not available on PyPI. The main reason is that the name is currently taken by another project.

Installing from source

Since Baseline uses Poetry as its packaging toolkit of choice, so installing it from source is as simple as:

git clone httpe://github.com/sk4la/baseline.git
cd baseline

python3 -m pip install poetry
python3 -m poetry install

Precompiled binaries

Precompiled binaries are available in the Releases section.

Docker

Baseline is also available as a Docker image.

To pull the latest image from Docker Hub:

docker pull sk4la/baseline

See the official Docker documentation for details on how to install and use it.

Usage

The help menu for the baseline command-line utility:

Usage: baseline 
   
     COMMAND [ARGS]...

  Command-line utility that creates file-oriented baselines.

Options:
  --ensure-administrator          Ensure that the current user has administrative privileges (i.e.
                                  is `root` or equivalent on GNU/Linux systems).
  --log-file 
    
                    Set the log file path (e.g. 'baseline.log').  [default:
                                  /home/sk4la/baseline/20211031110323.25756fb1b706.log]
  --logging-configuration 
     
        Set the logging configuration using a custom file. The file must
                                  adhere to the official specification. See
                                  https://docs.python.org/3/library/logging.config.html#logging-
                                  config-dictschema for more details.
  --monochrome                    Disable console output coloring. This can be useful when piping
                                  the output to a log file.
  -v, --verbose                   Increase the logging verbosity. Supports up to 4 occurrences of
                                  the same option (e.g. -vvvv).  [0<=x<=4]
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Commands:
  new     Creates a new filesystem-based baseline.
  schema  Show the JSON representation of the actual schema.

     
    
   

The help menu for the baseline new subcommand:

Usage: baseline new 
    
    
     ...

  Creates a new filesystem-based baseline.

Options:
  --comment 
     
                       Add an arbitrary comment to the generated output file.
  --exclude-directory 
      
       ...
                                  Exclude a specific directory from the baseline. Can be specified
                                  multiple times (e.g. `--exclude-directory /dev --exclude-
                                  directory /proc`).
  --exclude-extractor [hash|pe|fs]
                                  Exclude extractors. Can be specified multiple times (e.g.
                                  `--exclude-extractor hash --exclude-extractor pe`).
  --max-size 
       
         Set the maximum file size (in bytes) to inspect. [default: 5000000; x>=1] -o, --output-file 
        
          Set the output file path (e.g. 'baseline.ndjson'). [default: /workspaces/baseline/20211102200452.58bd60a3b16a.ndjson] --output-file-encoding [utf-8|utf-16le] Set the output file encoding. Only applies when writing to an actual file. [default: utf-8] -f, --output-format [ndjson] Set the output format. [default: ndjson] --partition-size 
         
           Set the partition size (i.e. number of entries per process). [default: 200; x>=1] --processes 
          
            Set the number of parallel processes. [default: 2; x>=1] --recursive / --non-recursive Whether to walk the filesystem recursively. When set, the program will only inspect the files and directories specifies on the first level of any included path. For example, if '/mnt/image' is specified as an included path, then only the directory '/mnt/image' itself and its direct children will be inspected. [default: recursive] --remap 
           
            ... Artificially remap included paths (e.g. '/mnt/image:/'). Can be specified multiple times (e.g. `--remap /mnt/image:/ --remap /dev/null:/dev/void`). --report / --no-report Whether to show a final report at the end. [default: report] --skip-compression Whether to skip on-the-fly compression of the resulting file. --skip-directories Whether to skip directories. --skip-empty Whether to skip empty entries. --help Show this message and exit. 
           
          
         
        
       
      
     
    
   

The help menu for the baseline schema subcommand:

Usage: baseline schema 
   
    

  Show the JSON representation of the actual schema.

Options:
  --compact                       Render compact JSON instead of the default idented version.
  --output-file 
    
                 Set the output file path (e.g. 'schema.json').
  --output-file-encoding [utf-8|utf-16le]
                                  Set the output file encoding. Only applies when writing to an
                                  actual file.  [default: utf-8]
  --help                          Show this message and exit.

    
   

Creating a baseline of a live system

Creating a baseline of a live system is as simple as:

baseline new

When using Baseline from a removable device, you may want to exclude its path (for example /mnt/usb) from the generated baseline:

baseline --ensure-administrator new --exclude-directory /mnt/usb

See the Usage section for a complete list of options and arguments.

Creating a baseline from a mounted image

When creating a baseline of a mounted image, you may want the baseline to represent the files as if they were read from the actual system, not the mounted image.

For example, if your image is currently mounted on /mnt/IMG-001, you can then execute the following command to remap all entries read from this path to /:

baseline new --remap /mnt/IMG-001:/ /mnt/IMG-001

You can think of this as a chroot jail.

Displaying the schema

Baseline uses a fixed schema for rendering the information. This schema is enforced using the Pydantic package and produces a heavily-typed output that can later be ingested as-is.

To print the standardized JSON schema:

baseline schema

To dump a compact version of the JSON schema to schema.min.json:

baseline schema --compact --output-file schema.min.json

The JSON schema produced by Pydantic is compatible with the specifications from JSON Schema Core, JSON Schema Validation and OpenAPI Data Types. See the official Pydantic documentation for more details.

Advanced usage

Building binaries

Baseline currently supports the following packaging systems:

Although precompiled binaries are available in the Releases section, you should always build your own binaries.

To produce a binary using PyInstaller:

make pyinstaller-linux

To produce a binary using Nuitka:

make nuitka-linux

As Nuitka is a Python compiler by itself and does not rely on the standard CPython interpreter, you should be aware that there may be bugs and/or issues unrelated to Baseline itself.

Building the Docker image

To build the official Docker image:

make docker

Additional instructions can be added to the Dockerfile in order to customize the image.

The official Docker image is available at https://hub.docker.com/r/sk4la/baseline. You can use the FROM docker.io/sk4la/baseline:latest instruction in your own Dockerfile to derive your own image.

API

Using Baseline from Python is possible using the Baseline class:

from baseline.core import Baseline

with Baseline() as baseline:
    for record in baseline.compute(*[
        "/mnt/IMG-001",
        "/mnt/IMG-002",
    ]):
        print(record.json(exclude_none=True))

See the actual code for a more thorough example.

The Baseline class emits logging messages to the baseline logger, to which you can subscribe to if you wish. The command-line utility displays these messages to the console by default.

Contribute

Baseline is a work in progress, everyone is welcome to contribute! 👐

Writing a new extractor

In order for new extractors to be able to enrich the generated records, the global schema first needs to be updated. To do this, you must create a sublass of Pydantic's BaseModel in schema.py that references the fields that will eventually be filled by the extractor. This class will then be referenced in the schema's root Record class.

In this example, we want to extract the first 50 lines of any *.txt file. Here we arbitrarily decide that the extracted text will be stored in the content attribute and that the extractor's key will be text:

class Text(pydantic.BaseModel):
    content: str

class Record(pydantic.BaseModel):
    ...
    text: typing.Optional[Text]

We can then start to write the actual code. All extractors must inherit from the base Extractor class:

None: with self.entry.open() as stream: setattr( record, self.KEY, schema.Text( content=stream.read(50), ), ) ">
from baseline.models import Extractor
from baseline.schema import Text

class Text(Extractor):
    """Extracts the first line of any `.txt` file."""

    EXTENSION_FILTERS = (
      r"\.txt$",
    )
    KEY = "text"

    def run(self: object, record: schema.Record) -> None:
        with self.entry.open() as stream:
            setattr(
                record,
                self.KEY,
                schema.Text(
                    content=stream.read(50),
                ),
            )

The extractor's KEY class variable must correspond to the one that was specified in the schema's root Record class (text in this example).

Support

In case you encounter a problem or want to suggest a new feature, please submit a ticket.

License

Baseline is licensed under the GNU General Public License (GPL) version 3.

You might also like...
A Python module and command line utility for working with web archive data using the WACZ format specification

py-wacz The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification

A Python module and command-line utility for converting .ANS format ANSI art to HTML

ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage

A command line utility to export Google Keep notes to markdown.

Keep-Exporter A command line utility to export Google Keep notes to markdown files with metadata stored as a frontmatter header. Supports exporting: S

A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.
A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.

A command line stock market / portfolio tracker originally insipred by Ericm's Stonks program, featuring unicode for incredibly high detailed graphs even in a terminal.

📦 A command line utility to put text in a box.
📦 A command line utility to put text in a box.

boxie A command line utility to put text in a box. Installation pip install boxie If you are on Linux you may need to use sudo to access this globally

Tiny command-line utility for mapping broken keys to other positions.

brokenkey Tiny command-line utility for mapping broken keys to other positions. Installation Clone this repository using git: git clone https://github

This is a CLI utility that allows you to view RedFlagDeals.com on the command line.
This is a CLI utility that allows you to view RedFlagDeals.com on the command line.

RFD Description Motivation Installation Usage View Hot Deals View and Sort Hot Deals Search Advanced View Posts Shell Completion bash zsh Description

img-proof (IPA) provides a command line utility to test images in the Public Cloud

overview img-proof (IPA) provides a command line utility to test images in the Public Cloud (AWS, Azure, GCE, etc.). With img-proof you can now test c

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

Comments
  • executable variable is not defined in extractors/executables

    executable variable is not defined in extractors/executables

    self.executable is not defined in baseline/extractors/executables.py, so in the __del__ method, it crashes.

    $ baseline new /etc      
    Exception ignored in: <function PortableExecutable.__del__ at 0x7fd1142c98b0>
    Traceback (most recent call last):
      File "/home/git/baseline/env/lib/python3.9/site-packages/baseline/extractors/executables.py", line 80, in __del__
        if self.executable:
    AttributeError: 'PortableExecutable' object has no attribute 'executable'
    All done! 💪
    
    Entries Processed : 1592
    Output Format     : ndjson
    Output File       : /home/git/baseline/20211106191044.heaven.ndjson.xz
    Size              : 1.2 MB
    Total Time        : 0:00:01.993753
    
    opened by jurelou 0
Releases(0.1.0)
Owner
Nelson
Random bits of code
Nelson
Kubernetes shell: An integrated shell for working with the Kubernetes

kube-shell Kube-shell: An integrated shell for working with the Kubernetes CLI Under the hood kube-shell still calls kubectl. Kube-shell aims to provi

CloudNative Labs 2.2k Jan 08, 2023
Python package with library and CLI tool for analyzing SeaFlow data

Seaflowpy A Python package for SeaFlow flow cytometer data. Table of Contents Install Read EVT/OPP/VCT Files Command-line Interface Configuration Inte

<a href=[email protected]"> 3 Nov 03, 2021
A powerful Minecraft command library.

Mecha A powerful Minecraft command library. from mecha import Mecha

32 Dec 10, 2022
🎈 A Mini CLI-based Operating System simulator

Mini OS Simulator Summary 🎈 A Mini CLI-based Operating System simulator, well, not really. It simulates a file system with songs and movies and stuff

Jaiyank S. 3 Feb 14, 2022
Command line tool to automate transforming the effects of one color profile to another, possibly more standard one.

Finished rendering the frames of that animation, and now the colors look washed out and ugly? This terminal program will solve exactly that.

Eric Xue 1 Jan 26, 2022
Tiny command-line utility for mapping broken keys to other positions.

brokenkey Tiny command-line utility for mapping broken keys to other positions. Installation Clone this repository using git: git clone https://github

0 Oct 04, 2021
ctree - command line christmas tree

ctree ctree - command line christmas tree It is small python script that prints a christmas tree in terminal. It is colourful and always gives you a d

15 Aug 15, 2022
Python3 library for multimedia functions at the command terminal

TERMINEDIA This is a Python library allowing using a text-terminal as a low-resolution graphics output, along with keyboard realtime reading, and a co

Joao S. O. Bueno 89 Dec 17, 2022
Password manager for the CLI simps.

CLI Password Manager Password manager for the CLI simps. Free software: MIT license

1 Dec 30, 2021
CLI translator based on Google translate API

Translate-CLI CLI переводчик основанный на Google translate API как пользоваться ? запустить в консоли скомпилированный скрипт (exe - windows, bin - l

7 Aug 13, 2022
The Pythone Script will generate a (.)sh file with reverse shell codes then you can execute the script on the target

Pythone Script will generate a (.)sh file with reverse shell codes then you can execute the script on the targetPythone Script will generate a (.)sh file with reverse shell codes then you can execute

Boy From Future 15 Sep 16, 2022
CLI Web-CAT interface for people who use VIM.

CLI Web-CAT CLI Web-CAT interface. Installation git clone https://github.com/phuang1024/cliwebcat cd cliwebcat python setup.py bdist_wheel sdist cd di

Patrick 4 Apr 11, 2022
Dynamically Generate GitHub Stats as like Terminal Interface

GitHub Stats Terminal Style Dynamically Generate GitHub Stats as like Terminal Interface Usage Create a New Repository using this Template or click he

YOGESHWARAN R 63 Jan 03, 2023
Generate an ASCII Art from keyword put in the cli

ascii-art-generator-cli Generate an ASCII Art from keyword put in the cli Install git clone https://github.com/Nathanlauga/ascii-art-generator-cli cd

Nathan Lauga 1 Nov 14, 2021
Shortcut-Maker - It is a tool that can be set to run any tool with a single command

Shortcut-Maker It is a tool that can be set to run any tool with a single command Coded by Dave Smith(Owner of Sl Cyber Warriors) Command list 👇 pkg

Dave Smith 10 Sep 14, 2022
Create argparse subcommands with decorators.

python-argparse-subdec This is a very simple Python package that allows one to create argparse's subcommands via function decorators. Usage Create a S

Gustavo José de Sousa 7 Oct 21, 2022
Rich is a Python library for rich text and beautiful formatting in the terminal.

The Rich API makes it easy to add color and style to terminal output. Rich can also render pretty tables, progress bars, markdown, syntax highlighted source code, tracebacks, and more — out of the bo

Will McGugan 41.4k Jan 03, 2023
Open-Source Python CLI package for copying DynamoDB tables and items in parallel batch processing + query natural & Global Secondary Indexes (GSIs)

Python Command-Line Interface Package to copy Dynamodb data in parallel batch processing + query natural & Global Secondary Indexes (GSIs).

1 Oct 31, 2021
Python commandline tool for remembering linux/terminal commands

ehh Remember linux commands Commandline tool for remembering linux/terminal commands. It stores your favorite commands in ~/ehh.json in your homedir a

56 Nov 10, 2022
A user-friendly python CLI for Fmask 4.3 software (GERS Lab, UCONN).

pyFmask What is pyFmask pyFmask is a user-friendly python CLI for Fmask 4.3 software (GERS Lab, UCONN; https://github.com/GERSL/Fmask). Fmask (Zhu et

1 Jan 05, 2022