Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

Overview

Baselining, on steroids!

Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

The project aims to offer an open-source alternative to the famous NSRL or HashSets and allows you to generate baselines from your own systems. Plus, it is cross-platform, so you can use it the same way whether on a Windows or a GNU/Linux system.

Currently available extractors:

  • fs: Extracts filesystem-related metadata.
  • hash: Computes several hashes from the entry's data (e.g. MD5, SHA-1, ssdeep).
  • pe: Extracts detailed information from Portable Executable (PE) files.

Table of contents

Installation

Installing from PyPI

Baseline is currently not available on PyPI. The main reason is that the name is currently taken by another project.

Installing from source

Since Baseline uses Poetry as its packaging toolkit of choice, so installing it from source is as simple as:

git clone httpe://github.com/sk4la/baseline.git
cd baseline

python3 -m pip install poetry
python3 -m poetry install

Precompiled binaries

Precompiled binaries are available in the Releases section.

Docker

Baseline is also available as a Docker image.

To pull the latest image from Docker Hub:

docker pull sk4la/baseline

See the official Docker documentation for details on how to install and use it.

Usage

The help menu for the baseline command-line utility:

Usage: baseline 
   
     COMMAND [ARGS]...

  Command-line utility that creates file-oriented baselines.

Options:
  --ensure-administrator          Ensure that the current user has administrative privileges (i.e.
                                  is `root` or equivalent on GNU/Linux systems).
  --log-file 
    
                    Set the log file path (e.g. 'baseline.log').  [default:
                                  /home/sk4la/baseline/20211031110323.25756fb1b706.log]
  --logging-configuration 
     
        Set the logging configuration using a custom file. The file must
                                  adhere to the official specification. See
                                  https://docs.python.org/3/library/logging.config.html#logging-
                                  config-dictschema for more details.
  --monochrome                    Disable console output coloring. This can be useful when piping
                                  the output to a log file.
  -v, --verbose                   Increase the logging verbosity. Supports up to 4 occurrences of
                                  the same option (e.g. -vvvv).  [0<=x<=4]
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Commands:
  new     Creates a new filesystem-based baseline.
  schema  Show the JSON representation of the actual schema.

     
    
   

The help menu for the baseline new subcommand:

Usage: baseline new 
    
    
     ...

  Creates a new filesystem-based baseline.

Options:
  --comment 
     
                       Add an arbitrary comment to the generated output file.
  --exclude-directory 
      
       ...
                                  Exclude a specific directory from the baseline. Can be specified
                                  multiple times (e.g. `--exclude-directory /dev --exclude-
                                  directory /proc`).
  --exclude-extractor [hash|pe|fs]
                                  Exclude extractors. Can be specified multiple times (e.g.
                                  `--exclude-extractor hash --exclude-extractor pe`).
  --max-size 
       
         Set the maximum file size (in bytes) to inspect. [default: 5000000; x>=1] -o, --output-file 
        
          Set the output file path (e.g. 'baseline.ndjson'). [default: /workspaces/baseline/20211102200452.58bd60a3b16a.ndjson] --output-file-encoding [utf-8|utf-16le] Set the output file encoding. Only applies when writing to an actual file. [default: utf-8] -f, --output-format [ndjson] Set the output format. [default: ndjson] --partition-size 
         
           Set the partition size (i.e. number of entries per process). [default: 200; x>=1] --processes 
          
            Set the number of parallel processes. [default: 2; x>=1] --recursive / --non-recursive Whether to walk the filesystem recursively. When set, the program will only inspect the files and directories specifies on the first level of any included path. For example, if '/mnt/image' is specified as an included path, then only the directory '/mnt/image' itself and its direct children will be inspected. [default: recursive] --remap 
           
            ... Artificially remap included paths (e.g. '/mnt/image:/'). Can be specified multiple times (e.g. `--remap /mnt/image:/ --remap /dev/null:/dev/void`). --report / --no-report Whether to show a final report at the end. [default: report] --skip-compression Whether to skip on-the-fly compression of the resulting file. --skip-directories Whether to skip directories. --skip-empty Whether to skip empty entries. --help Show this message and exit. 
           
          
         
        
       
      
     
    
   

The help menu for the baseline schema subcommand:

Usage: baseline schema 
   
    

  Show the JSON representation of the actual schema.

Options:
  --compact                       Render compact JSON instead of the default idented version.
  --output-file 
    
                 Set the output file path (e.g. 'schema.json').
  --output-file-encoding [utf-8|utf-16le]
                                  Set the output file encoding. Only applies when writing to an
                                  actual file.  [default: utf-8]
  --help                          Show this message and exit.

    
   

Creating a baseline of a live system

Creating a baseline of a live system is as simple as:

baseline new

When using Baseline from a removable device, you may want to exclude its path (for example /mnt/usb) from the generated baseline:

baseline --ensure-administrator new --exclude-directory /mnt/usb

See the Usage section for a complete list of options and arguments.

Creating a baseline from a mounted image

When creating a baseline of a mounted image, you may want the baseline to represent the files as if they were read from the actual system, not the mounted image.

For example, if your image is currently mounted on /mnt/IMG-001, you can then execute the following command to remap all entries read from this path to /:

baseline new --remap /mnt/IMG-001:/ /mnt/IMG-001

You can think of this as a chroot jail.

Displaying the schema

Baseline uses a fixed schema for rendering the information. This schema is enforced using the Pydantic package and produces a heavily-typed output that can later be ingested as-is.

To print the standardized JSON schema:

baseline schema

To dump a compact version of the JSON schema to schema.min.json:

baseline schema --compact --output-file schema.min.json

The JSON schema produced by Pydantic is compatible with the specifications from JSON Schema Core, JSON Schema Validation and OpenAPI Data Types. See the official Pydantic documentation for more details.

Advanced usage

Building binaries

Baseline currently supports the following packaging systems:

Although precompiled binaries are available in the Releases section, you should always build your own binaries.

To produce a binary using PyInstaller:

make pyinstaller-linux

To produce a binary using Nuitka:

make nuitka-linux

As Nuitka is a Python compiler by itself and does not rely on the standard CPython interpreter, you should be aware that there may be bugs and/or issues unrelated to Baseline itself.

Building the Docker image

To build the official Docker image:

make docker

Additional instructions can be added to the Dockerfile in order to customize the image.

The official Docker image is available at https://hub.docker.com/r/sk4la/baseline. You can use the FROM docker.io/sk4la/baseline:latest instruction in your own Dockerfile to derive your own image.

API

Using Baseline from Python is possible using the Baseline class:

from baseline.core import Baseline

with Baseline() as baseline:
    for record in baseline.compute(*[
        "/mnt/IMG-001",
        "/mnt/IMG-002",
    ]):
        print(record.json(exclude_none=True))

See the actual code for a more thorough example.

The Baseline class emits logging messages to the baseline logger, to which you can subscribe to if you wish. The command-line utility displays these messages to the console by default.

Contribute

Baseline is a work in progress, everyone is welcome to contribute! πŸ‘

Writing a new extractor

In order for new extractors to be able to enrich the generated records, the global schema first needs to be updated. To do this, you must create a sublass of Pydantic's BaseModel in schema.py that references the fields that will eventually be filled by the extractor. This class will then be referenced in the schema's root Record class.

In this example, we want to extract the first 50 lines of any *.txt file. Here we arbitrarily decide that the extracted text will be stored in the content attribute and that the extractor's key will be text:

class Text(pydantic.BaseModel):
    content: str

class Record(pydantic.BaseModel):
    ...
    text: typing.Optional[Text]

We can then start to write the actual code. All extractors must inherit from the base Extractor class:

None: with self.entry.open() as stream: setattr( record, self.KEY, schema.Text( content=stream.read(50), ), ) ">
from baseline.models import Extractor
from baseline.schema import Text

class Text(Extractor):
    """Extracts the first line of any `.txt` file."""

    EXTENSION_FILTERS = (
      r"\.txt$",
    )
    KEY = "text"

    def run(self: object, record: schema.Record) -> None:
        with self.entry.open() as stream:
            setattr(
                record,
                self.KEY,
                schema.Text(
                    content=stream.read(50),
                ),
            )

The extractor's KEY class variable must correspond to the one that was specified in the schema's root Record class (text in this example).

Support

In case you encounter a problem or want to suggest a new feature, please submit a ticket.

License

Baseline is licensed under the GNU General Public License (GPL) version 3.

You might also like...
A Python module and command line utility for working with web archive data using the WACZ format specification

py-wacz The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification

A Python module and command-line utility for converting .ANS format ANSI art to HTML

ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage

A command line utility to export Google Keep notes to markdown.

Keep-Exporter A command line utility to export Google Keep notes to markdown files with metadata stored as a frontmatter header. Supports exporting: S

A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.
A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.

A command line stock market / portfolio tracker originally insipred by Ericm's Stonks program, featuring unicode for incredibly high detailed graphs even in a terminal.

πŸ“¦ A command line utility to put text in a box.
πŸ“¦ A command line utility to put text in a box.

boxie A command line utility to put text in a box. Installation pip install boxie If you are on Linux you may need to use sudo to access this globally

Tiny command-line utility for mapping broken keys to other positions.

brokenkey Tiny command-line utility for mapping broken keys to other positions. Installation Clone this repository using git: git clone https://github

This is a CLI utility that allows you to view RedFlagDeals.com on the command line.
This is a CLI utility that allows you to view RedFlagDeals.com on the command line.

RFD Description Motivation Installation Usage View Hot Deals View and Sort Hot Deals Search Advanced View Posts Shell Completion bash zsh Description

img-proof (IPA) provides a command line utility to test images in the Public Cloud

overview img-proof (IPA) provides a command line utility to test images in the Public Cloud (AWS, Azure, GCE, etc.). With img-proof you can now test c

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

Comments
  • executable variable is not defined in extractors/executables

    executable variable is not defined in extractors/executables

    self.executable is not defined in baseline/extractors/executables.py, so in the __del__ method, it crashes.

    $ baseline new /etc      
    Exception ignored in: <function PortableExecutable.__del__ at 0x7fd1142c98b0>
    Traceback (most recent call last):
      File "/home/git/baseline/env/lib/python3.9/site-packages/baseline/extractors/executables.py", line 80, in __del__
        if self.executable:
    AttributeError: 'PortableExecutable' object has no attribute 'executable'
    All done! πŸ’ͺ
    
    Entries Processed : 1592
    Output Format     : ndjson
    Output File       : /home/git/baseline/20211106191044.heaven.ndjson.xz
    Size              : 1.2 MB
    Total Time        : 0:00:01.993753
    
    opened by jurelou 0
Releases(0.1.0)
Owner
Nelson
Random bits of code
Nelson
Redial is a simple shell application that manages your SSH sessions on Unix terminal.

redial redial is a simple shell application that manages your SSH sessions on Unix terminal. What's New 0.7 (19.12.2019) Basic support for adding ssh

Bahadır Yağan 186 Oct 28, 2022
Python CLI utility and library for manipulating SQLite databases

sqlite-utils Python CLI utility and library for manipulating SQLite databases. Some feature highlights Pipe JSON (or CSV or TSV) directly into a new S

Simon Willison 1.1k Jan 04, 2023
Salesforce object access auditor

Salesforce object access auditor Released as open source by NCC Group Plc - https://www.nccgroup.com/ Developed by Jerome Smith @exploresecurity (with

NCC Group Plc 90 Sep 19, 2022
A Python3 rewrite of my original PwnedConsole project from almost a decade ago

PwnedConsoleX A CLI shell for performing queries against the HaveIBeenPwned? API to gather breach information for user-supplied email addresses. | wri

1 Jul 23, 2022
A Python module and command-line utility for converting .ANS format ANSI art to HTML

ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage

4 Oct 16, 2022
NudeNet wrapper made to provide a simple cli interface to the library

Nudenet Wrapper. Small warpper script for NudeNet Made to provide a small and easy to use cli interface with the library. You can indicate a single im

1 Oct 20, 2021
Palm CLI - the tool-belt for data teams

Palm CLI: The extensible CLI at your fingertips Palm is a universal CLI developed to improve the life and work of data professionals. Palm CLI documen

Palmetto 41 Dec 12, 2022
Features terminal for python

Features Terminal V1.0 (23/10/2021) Um programa para linux com diferentes ferramentas! Recursos: Criador de QR code Gerador de senhas Teste de velocid

1 Oct 26, 2021
Basic python tools to generate shellcode runner in vba

vba_bin_runner Basic python tools to generate shellcode runner in vba. The stub use ZwAllocateVirtualMemory to allocate memory, RtlMoveMemory to write

4 Aug 24, 2021
Unconventional ways to save an Image

Unexpected Image Saves Unconventional ways to save an image πŸ˜„ Have you ever been bored by the same old .png, .jpg, .jpeg, .gif and all other image ex

Eric Mendes 15 Nov 06, 2022
CLI tool to computes CO2 emissions of HPC computations following green-algorithms.org methodology

gqueue gqueue is a CLI (command line interface) tool that computes carbon footprint of HPC computations on clusters running slurm. It follows the meth

4 Dec 10, 2021
Program Command Line Interface (CLI) Sederhana: Pemesanan Nasi Goreng Hekel

Program ini merupakan aplikasi yang berjalan di dalam command line (terminal). Program ini menggunakan built-in library python yaitu argparse yang dapat menerima parameter saat program ini dijalankan

Habib Abdurrasyid 5 Nov 19, 2021
lfb (light file browser) is a terminal file browser

lfb (light file browser) is a terminal file browser. The whole program is a mess as of now. In the feature I will remove the need for external dependencies, tidy up the code, make an actual readme, a

2 Apr 09, 2022
Wik is use to get information about anything on the shell using Wikipedia.

WIK wik is a tool to view wikipedia pages from your terminal. It also let you search for any wikipedia up to date article on one query from your termi

Yash Singh 340 Dec 18, 2022
A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.

A command line stock market / portfolio tracker originally insipred by Ericm's Stonks program, featuring unicode for incredibly high detailed graphs even in a terminal.

Conrad Selig 51 Nov 29, 2022
A Hikari command handler for people who love ducks.

A Hikari command handler for people who love ducks.

Jeremiah 2 Oct 09, 2022
Sink is a CLI tool that allows users to synchronize their local folders to their Google Drives. It is similar to the Git CLI and allows fast and reliable syncs with the drive.

Sink is a CLI synchronisation tool that enables a user to synchronise local system files and folders with their Google Drives. It follows a git C

Yash Thakre 16 May 29, 2022
Standalone script written in Python 3 for generating Reverse Shell one liner snippets and handles the communication between target and client using custom Netcat binaries

Standalone script written in Python 3 for generating Reverse Shell one liner snippets and handles the communication between target and client using custom Netcat binaries. It automates the boring stu

Yash Bhardwaj 3 Sep 27, 2022
πŸ—ƒοΈ Fileio-cli wrapper for fileioapi.py with fire.py, inspiration DOS

πŸ—ƒοΈ File.io File.io simply upload a file, share the link, and after it is downloaded, the file is completely deleted. An API wrapper for the file.io w

nkot56297 2 May 12, 2022
A next-generation CLI and TUI that aims to be your personal assistant for everything competitive programming related. πŸš€

Competitive Programming Tool Kit The Competitive Programming Tool Kit (cptk for short), is a command line and terminal user interface (CLI and TUI) th

Alon 4 May 21, 2022