PerfSpect is a system performance characterization tool based on linux perf targeting Intel microarchitectures

Overview

PerfSpect

PerfSpect is a system performance characterization tool based on linux perf targeting Intel microarchitectures. The tool has two parts

  1. perf collection to collect underlying PMU(Performance Monitoring Unit) counters
  2. post processing that generates csv output of performance metrics.

PerfSpect output

Getting Started

Prerequisites

  1. Linux perf
  2. Python3+

Building binaries from source code

pre-requisites

  1. requires docker to be installed on the system
  2. please make sure docker commands execute without sudo (for example - docker run hello-world runs successfully)

build binaries

  1. builder/build_docker_image
  2. builder/build

On successful build, binaries would be created in "dist" folder

1. Perf collection:

(sudo) ./perf-collect (options) -- Some options can be used only with root privileges

Options:
  -h, --help (show this help message and exit)    

  -v, --version         display version info  

  -e EVENTFILE, --eventfile EVENTFILE (Event file containing events to collect, default=events/
   
    )

  -i INTERVAL, --interval INTERVAL (interval in seconds for time series dump, default=1)
 
  -m MUXINTERVAL, --muxinterval MUXINTERVAL (event mux interval for events in ms, default=0 i.e. will use the system default. Requires root privileges)
   
  -o OUTCSV, --outcsv OUTCSV (perf stat output in csv format, default=results/perfstat.csv)
  
  -a APP, --app APP (Application to run with perf-collect, perf collection ends after workload completion)
  
  -p PID, --pid PID perf-collect on selected PID(s)
	
  -t TIMEOUT, --timeout TIMEOUT (  perf event collection time)
  
  --percore  (Enable per core event collection)

  --nogroups  (Disable perf event grouping, events are grouped by default as in the event file)
  
  --dryrun (Test if Performance Monitoring Counters are in-use, and collect stats for 10sec)
  
  --metadata (collect system info only, does not run perf)

   

Examples

  1. sudo ./perf-collect (collect PMU counters using predefined architecture specific event file until collection is terminated)
  2. sudo ./perf-collect -m 10 -t 30 (sets event multiplexing interval to 10ms and collects PMU counters for 30 seconds using default architecture specific event file)
  3. sudo ./perf-collect -a "myapp.sh myparameter" (collect perf for myapp.sh)
  4. sudo ./perf-collect --dryrun (checks PMU usage, and collects PMU counters for 10 seconds using default architecture specific event file)
  5. sudo ./perf-collect --metadata (collect system info and PMU event info without running perf, uses default outputfile if -o option is not used)

Notes

  1. Intel CPUs(until Cascadelake) have 3 fixed PMUs (cpu-cycles, ref-cycles, instructions) and 4 programmable PMUs. The events are grouped in event files with this assumption. However, some of the counters may not be available on some CPUs. You can check the corretness of the event file with dryrun and check the output for anamolies, Typically output will have "not counted", "unsuppported" or zero values for cpu-cycles if number of available counters are less than events in a group
  2. Globally pinned events can limit the number of counters available for perf event groups. On X86 systems NMI watchdog pins a fixed counter by default. NMI watchdog is disabled during perf collection if run as a sudo user. If NMI watchdog can't be disabled, event grouping will be forcefully disabled to let perf driver handle event multiplexing.

2. Perf Postprocessing:

./perf-postprocess (options)

Options:

  -h, --help (show this help message and exit)

  -v, --version         display version info 

  -m METRICFILE, --metricfile METRICFILE (formula file, default=events/metric.json)

  -o OUTFILE, --outcsv OUTFILE (perf stat output file, csv or xlsx format is supported, default=results/metric_out.csv)
  
  --keepall (keep all intermediate csv files)
  
  --persocket (generate persocket metrics)

  --percore (generate percore metrics)
  
  --epoch  (time series in epoch format, default is sample count)

required arguments:

  -r RAWFILE, --rawfile RAWFILE (Raw CSV output from perf-collect)

Examples

./perf-postprocess -r results/perfstat.csv (post processes perfstat.csv and creates metric_out.csv, metric_out.average.csv, metric_out.raw.csv)

Notes

  1. metric_out.csv : Time series dump of the metrics. The metrics are defined in events/metric.json
  2. metric_out.averags.csv: Average of metrics over the collection period
  3. metric_out.raw.csv: csv file with raw events normalized per second
  4. Socket/core level metrics: Additonal csv files .socket.csv/.core.csv will be generated. Socket/core level data will be in added as new sheets if excel output is chosen

Things to note

  1. The tool can collect only the counters supported by underlying linux perf version.
  2. Current version supports Intel Icelake, Cascadelake, Skylake and Broadwell microarchitecture only.
  3. Perf collection overhead will increase with increase in number of counters and/or dump interval. Using the right perf multiplexing (check perf-collection.py Notes for more details) interval to reduce overhead
  4. If you run into locale issues - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 4519: ordinal not in range(128), more likely the locales needs to be set appropriately. You could also try running post-process step with LC_ALL=C.UTF-8 LANG=C.UTF-8 ./perf-postprocess -r result.csv

How to contribute

Create a pull request on github.com/intel/PerfSpect with your patch. Please make sure your patch is building without errors. A maintainer will contact you if there are questions or concerns.

You might also like...
Waydroid is a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu.

Waydroid is a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu.

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

PoC getting concret intel with chardet and charset-normalizer

aiohttp with charset-normalizer Context aiohttp.TCPConnector(limit=16) alpine linux nginx 1.21 python 3.9 aiohttp dev-master chardet 4.0.0 (aiohttp-ch

Intel Realsense t265 into Unreal Engine
Intel Realsense t265 into Unreal Engine

t265_UE Intel Realsense t265 into Unreal Engine. Windows only, and Livelink plugin is 4.26.2 only at the moment. Might recompile it for different vers

A Python software implementation of the Intel 4004 processor
A Python software implementation of the Intel 4004 processor

Pyntel4004 A Python software implementation of the Intel 4004 processor. General Information Two pass assembler using the original mnemonics, directiv

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Threat Intel Platform for T-POTs

GreedyBear The project goal is to extract data of the attacks detected by a TPOT or a cluster of them and to generate some feeds that can be used to p

Threat Intel Platform for T-POTs
Threat Intel Platform for T-POTs

T-Pot 20.06 runs on Debian (Stable), is based heavily on docker, docker-compose

SysInfo is an app developed in python which gives Basic System Info , and some detailed graphs of system performance .
SysInfo is an app developed in python which gives Basic System Info , and some detailed graphs of system performance .

SysInfo SysInfo is an app developed in python which gives Basic System Info , and some detailed graphs of system performance . Installation Download t

Tool to produce system call tables from Linux source code.

Syscalls Tool to generate system call tables from the linux source tree. Example The following will produce a markdown (.md) file containing the table

A simple tool to audit Unix/*BSD/Linux system libraries to find public security vulnerabilities
A simple tool to audit Unix/*BSD/Linux system libraries to find public security vulnerabilities

master_librarian A simple tool to audit Unix/*BSD/Linux system libraries to find public security vulnerabilities. To install requirements: $ sudo pyth

Automation for grabbing keys from a Linux host. Useful during red team exercises to quickly help assess what access to a Linux host can lead to.

keygrabber Automation for grabbing keys from a Linux host. This can be helpful during red team exercises when you gain access to a Linux host and want

LSO, also known as Linux Swap Operator, is a software with both GUI and terminal versions that you can manage the Swap area for Linux operating systems.
LSO, also known as Linux Swap Operator, is a software with both GUI and terminal versions that you can manage the Swap area for Linux operating systems.

LSO - Linux Swap Operator Türkçe - LSO Nedir? LSO, diğer adıyla Linux Swap Operator Linux işletim sistemleri için Swap alanını yönetebileceğiniz hem G

List of Linux Tools I put on almost every linux / Debian host

Linux-Tools List of Linux Tools I put on almost every Linux / Debian host Installed: geany -- GUI editor/ notepad++ like chkservice -- TUI Linux ser

Organize seu linux - organize your linux

OrganizeLinux Organize seu linux - organize your linux Organize seu linux Uma forma rápida de separar arquivos dispersos em pastas. formatos a serem c

Pancakeswap-Sniper-TORNADO-CASH--MAC-WIN-ANDROID-LINUX--2022-V1 - Pancakeswap Sniper BOT - TORNADO CASH (MAC WINDOWS ANDROID LINUX) FAST snipe BUY token on LUANCH after add LIQUIDITY
Morpy Bot Linux - Morpy Bot Linux With Python

Morpy_Bot_Linux Guide to using the robot : 🔸 Lsmod = to identify admins and st

The Linux defender anti-virus software ported to work on CentOS Linux.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

guapow is an on-demand and auto performance optimizer for Linux applications.

guapow is an on-demand and auto performance optimizer for Linux applications. This project's name is an abbreviation for Guarana powder (Guaraná is a fruit from the Amazon rainforest with a highly caffeinated seed).

Comments
  • Dependency simpleval needed

    Dependency simpleval needed

    PerfSpect imports module simpleeval. This is not available in a fresh installation of Python (at least was not in mine), and needs a separate install. pip install simpleeval I recommend adding the installation of module simpleeval to the build of PerfSpect

    opened by jpf18 3
  • results/ directory created by perf-collect,py has owner root, without general write permission

    results/ directory created by perf-collect,py has owner root, without general write permission

    A little workflow inconvenience. When running perf-collect.py as root (i.e.. sudo python perf-collect.py), the results/ directory is created with owner root and no write permission for regular users. This means a sudo chmod 777 results/ is needed before running perf-postprocess.py,

    opened by jpf18 2
  • Fixed the limitation when

    Fixed the limitation when "no of Workloads" become greater than "no o…

    Fixed the limitation when "no of workloads" become greater than "no of performance matrics / no of features". Now, It will work for similarity analysis of any "no of workloads".

    opened by faqeerurrehmanIntel 1
  • Add support for Oracle Cloud (OCI)

    Add support for Oracle Cloud (OCI)

    This PR adds support for Oracle Cloud (OCI)

    • Add events files for OCI CPU types
    • Add OCI flags for SKX and ICX arch types
    • Tested on multiple platforms[1]

    Note that this will still give warnings[2] but collection will still occur whereas the current behavior is that the application will print an error and exit[3].

    [1] Tested on the following:

    +--------------+--------------------+-----------------------------------+------------------------------+-----------------------------------------------+
    |      OS      |       Shape        |              Kernel               |             Perf             |                      CPU                      |
    +--------------+--------------------+-----------------------------------+------------------------------+-----------------------------------------------+
    | Ubuntu 20.04 | VM.Standard3.Flex  | 5.13.0-1018-oracle                | 5.13.19                      | Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz  |
    | Ubuntu 20.04 | VM.Optimized3.Flex | 5.13.0-1018-oracle                | 5.13.19                      | Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz      |
    | Ubuntu 20.04 | VM.Standard2.2     | 5.13.0-1018-oracle                | 5.13.19                      | Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz |
    | Ubuntu 18.04 | VM.Standard3.Flex  | 5.4.0-1070-oracle                 | 5.4.178                      | Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz  |
    | Ubuntu 18.04 | VM.Optimized3.Flex | 5.4.0-1070-oracle                 | 5.4.178                      | Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz      |
    | Ubuntu 18.04 | VM.Standard2.2     | 5.4.0-1070-oracle                 | 5.4.178                      | Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz |
    | OEL8         | VM.Standard3.Flex  | 5.4.17-2136.306.1.3.el8uek.x86_64 | 4.18.0-348.20.1.el8_5.x86_64 | Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz  |
    | OEL8         | VM.Optimized3.Flex | 5.4.17-2136.306.1.3.el8uek.x86_64 | 4.18.0-348.20.1.el8_5.x86_64 | Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz      |
    | OEL8         | VM.Standard2.2     | 5.4.17-2136.306.1.3.el8uek.x86_64 | 4.18.0-348.20.1.el8_5.x86_64 | Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz |
    +--------------+--------------------+-----------------------------------+------------------------------+-----------------------------------------------+
    

    [2] Example on VM.Optimized3.Flex, Ubuntu 20.04, Intel(R) Xeon(R) Gold 6354

    $ sudo ./perf-collect --cloud oci
    These events are not supported with current version of perf, will not be collected!
    topdown.slots,
    power/energy-pkg/,
    power/energy-ram/;
    upi/event=0x2,umask=0xf,name='UNC_UPI_TxL_FLITS.ALL_DATA'/,
    upi/event=0x2,umask=0x97,name='UNC_UPI_TxL_FLITS.NON_DATA'/,
    upi/event=0x1,umask=0x0,name='UNC_UPI_CLOCKTICKS'/;
    cha/event=0x35,umask=0xC816FE01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL'/,
    cha/event=0x35,umask=0xC8177E01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE'/,
    cha/event=0x35,umask=0xC896FE01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_LOCAL'/,
    cha/event=0x35,umask=0xC8977E01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_REMOTE'/;
    cha/event=0x00,umask=0x00,name='UNC_CHA_CLOCKTICKS'/;
    imc/event=0xd3,umask=0x01,name='UNC_M_TAGCHK.HIT'/,
    imc/event=0xd3,umask=0x02,name='UNC_M_TAGCHK.MISS_CLEAN'/,
    imc/event=0xd3,umask=0x04,name='UNC_M_TAGCHK.MISS_DIRTY'/;
    imc/event=0x04,umask=0x0f,name='UNC_M_CAS_COUNT.RD'/,
    imc/event=0x04,umask=0x30,name='UNC_M_CAS_COUNT.WR'/;
    Collecting perf stat for events in : /home/ubuntu/cloudcompute.perfspect/events/icx_oci.txt
    

    [3] Example

    $ sudo ./perf-collect --cloud oci
    These events are not supported with current version of perf, will not be collected!
    power/energy-pkg/,
    power/energy-ram/;
    upi/event=0x2,umask=0xf,name='UNC_UPI_TxL_FLITS.ALL_DATA'/,
    upi/event=0x2,umask=0x97,name='UNC_UPI_TxL_FLITS.NON_DATA'/,
    upi/event=0x1,umask=0x0,name='UNC_UPI_CLOCKTICKS'/;
    cha/event=0x35,umask=0xC816FE01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL'/,
    cha/event=0x35,umask=0xC8177E01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE'/,
    cha/event=0x35,umask=0xC896FE01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_LOCAL'/,
    cha/event=0x35,umask=0xC8977E01,name='UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_REMOTE'/;
    cha/event=0x00,umask=0x00,name='UNC_CHA_CLOCKTICKS'/;
    imc/event=0x04,umask=0x0f,name='UNC_M_CAS_COUNT.RD'/,
    imc/event=0x04,umask=0x30,name='UNC_M_CAS_COUNT.WR'/;
    Collecting perf stat for events in : icx.txt
    Consider using cloudtype flag to set instance type -> VM/BM; Default is VM
    Error:
    The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (slots).
    /bin/dmesg | grep -i perf may provide additional information.
    
    Collection complete! Calculating TSC frequency now
    
    perf stat dumped to /home/opc/perfspect/results/perfstat.csv
    

    Signed-off-by: Aaron Blakeman [email protected]

    opened by amblakem 0
Releases(v1.1.3)
Owner
Intel Corporation
Intel Corporation
A low-impact profiler to figure out how much memory each task in Dask is using

dask-memusage If you're using Dask with tasks that use a lot of memory, RAM is your bottleneck for parallelism. That means you want to know how much m

Itamar Turner-Trauring 23 Dec 09, 2022
Shrapnel is a scalable, high-performance cooperative threading library for Python.

This Python library was evolved at IronPort Systems and has been provided as open source by Cisco Systems under an MIT license. Intro Shrapnel is a li

216 Nov 06, 2022
Python compiler that massively increases Python's code performance without code changes.

Flyable - A python compiler for highly performant code Flyable is a Python compiler that generates efficient native code. It uses different techniques

Flyable 35 Dec 16, 2022
This tool allows to gather statistical profile of CPU usage of mixed native-Python code.

Sampling Profiler for Python This tool allows to gather statistical profile of CPU usage of mixed native-Python code. Currently supported platforms ar

Intel Corporation 13 Oct 04, 2022
Django query profiler - one profiler to rule them all. Shows queries, detects N+1 and gives recommendations on how to resolve them

Django Query Profiler This is a query profiler for Django applications, for helping developers answer the question "My Django code/page/API is slow, H

Django Query Profiler 116 Dec 15, 2022
Cinder is Instagram's internal performance-oriented production version of CPython

Cinder is Instagram's internal performance-oriented production version of CPython 3.8. It contains a number of performance optimizations, including bytecode inline caching, eager evaluation of corout

Facebook Incubator 2.2k Dec 30, 2022
Pyccel stands for Python extension language using accelerators.

Pyccel stands for Python extension language using accelerators.

Pyccel 242 Jan 02, 2023
guapow is an on-demand and auto performance optimizer for Linux applications.

guapow is an on-demand and auto performance optimizer for Linux applications. This project's name is an abbreviation for Guarana powder (Guaraná is a fruit from the Amazon rainforest with a highly ca

Vinícius Moreira 19 Nov 18, 2022
PerfSpect is a system performance characterization tool based on linux perf targeting Intel microarchitectures

PerfSpect PerfSpect is a system performance characterization tool based on linux perf targeting Intel microarchitectures. The tool has two parts perf

Intel Corporation 139 Dec 30, 2022
Pearpy - a Python package for writing multithreaded code and parallelizing tasks across CPU threads.

Pearpy The Python package for (pear)allelizing your tasks across multiple CPU threads. Installation The latest version of Pearpy can be installed with

MLH Fellowship 5 Nov 01, 2021
Sampling profiler for Python programs

py-spy: Sampling profiler for Python programs py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spe

Ben Frederickson 9.5k Jan 01, 2023
Silky smooth profiling for Django

Silk Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before prese

Jazzband 3.7k Jan 01, 2023
Rip Raw - a small tool to analyse the memory of compromised Linux systems

Rip Raw Rip Raw is a small tool to analyse the memory of compromised Linux systems. It is similar in purpose to Bulk Extractor, but particularly focus

Cado Security 127 Oct 28, 2022