The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Last update: Dec 29, 2022

Overview

About

The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way. Having the data defined as a set of objects, specialized storage methods are used to get direct access to the separate attributes of the selected objects, without having to touch the bulk of the data. Included are histograming methods in an arbitrary number of dimensions, curve fitting, function evaluation, minimization, graphics and visualization classes to allow the easy setup of an analysis system that can query and process the data interactively or in batch mode, as well as a general parallel processing framework, PROOF, that can considerably speed up an analysis.

Thanks to the built-in C++ interpreter cling, the command, the scripting and the programming language are all C++. The interpreter allows for fast prototyping of the macros since it removes the time consuming compile/link cycle. It also provides a good environment to learn C++. If more performance is needed the interactively developed macros can be compiled using a C++ compiler via a machine independent transparent compiler interface called ACliC.

The system has been designed in such a way that it can query its databases in parallel on clusters of workstations or many-core machines. ROOT is an open system that can be dynamically extended by linking external libraries. This makes ROOT a premier platform on which to build data acquisition, simulation and data analysis systems.

Cite

When citing ROOT, please use both the reference reported below and the DOI specific to your ROOT version available on Zenodo . For example, you can copy-paste and fill in the following citation:

Rene Brun and Fons Rademakers, ROOT - An Object Oriented Data Analysis Framework,
Proceedings AIHENP'96 Workshop, Lausanne, Sep. 1996,
Nucl. Inst. & Meth. in Phys. Res. A 389 (1997) 81-86.
See also "ROOT" [software], Release vX.YY/ZZ, dd/mm/yyyy,
(Select the right link for your release here: https://zenodo.org/search?page=1&size=20&q=conceptrecid:848818&all_versions&sort=-version).

Live Demo for CERN Users

Screenshots

These screenshots shows some of the plots (produced using ROOT) presented when the Higgs boson discovery was announced at CERN:

See more screenshots on our gallery.

Installation and Getting Started

See https://root.cern/install for installation instructions. For instructions on how to build ROOT from these source files, see https://root.cern/install/build_from_source.

Our "Getting started with ROOT" page is then the perfect place to get familiar with ROOT.

Help and Support

Contribution Guidelines

Comments

Update ROOT's llvm to llvm13.

The things we need to do before merging this PR and can probably be done by various people in parallel

Cling standalone:

[x] Fix cling CUDA tests
[ ] Fix the remaining test failures (6, see below)
[x] Revert the commit 'FIXME: Undo this change and debug why we have PendingInstances.'

Cling test failures

Failures in master on my system:

    Cling :: CodeUnloading/PCH/VTables.C
    Cling :: DynamicLibraryManager/callable_lib_L_AB_order1.C

Remaining failures (excluding the ones above):

  Cling :: CodeGeneration/Symbols.C
  Cling :: CodeUnloading/AtExit.C
  Cling :: CodeUnloading/PCH/VTablesClingPCH.C
  Cling :: CodeUnloading/RereadFile.C
  Cling :: ErrorRecovery/StoredState.C
  Cling :: MultipleInterpreters/MultipleInterpreters.C

ROOT:

[x] Compare the build size against master
[x] Compare the .pcm file size against master
[ ] Add flags to ignore compilation warnings coming from llvm
[x] Remove the FIXME from commit 'Add another symbol generator to resolve the generated lazy symbol' - the explanation is in the commit
[x] Fix the llvm::StringRef conversion failures on OSX

Binary Size this PR needs 13% more space (2.3 vs 2. GB)

du -hs root-release-llvm13
2.3G	.
(base) [email protected] /build/vvassilev/root-release-llvm13 $ du -hs ../root-release-master/
2.0G	../root-release-master/

Module files need ~5% more space on disk (215 vs 206 MB)

diff -y llvm13 master 
424K	lib/ASImageGui.pcm				      |	444K	lib/ASImageGui.pcm
468K	lib/ASImage.pcm					      |	484K	lib/ASImage.pcm
4.2M	lib/_Builtin_intrinsics.pcm			      |	4.0M	lib/_Builtin_intrinsics.pcm
48K	lib/_Builtin_stddef_max_align_t.pcm		      |	44K	lib/_Builtin_stddef_max_align_t.pcm
200K	lib/Cling_Runtime_Extra.pcm			      |	132K	lib/Cling_Runtime_Extra.pcm
100K	lib/Cling_Runtime.pcm					100K	lib/Cling_Runtime.pcm
11M	lib/Core.pcm					      |	9.6M	lib/Core.pcm
564K	lib/EG.pcm					      |	584K	lib/EG.pcm
5.7M	lib/Eve.pcm					      |	5.4M	lib/Eve.pcm
652K	lib/FitPanel.pcm				      |	656K	lib/FitPanel.pcm
504K	lib/Foam.pcm					      |	520K	lib/Foam.pcm
440K	lib/Fumili.pcm					      |	460K	lib/Fumili.pcm
1.2M	lib/Gdml.pcm						1.2M	lib/Gdml.pcm
960K	lib/Ged.pcm					      |	968K	lib/Ged.pcm
432K	lib/Genetic.pcm					      |	456K	lib/Genetic.pcm
2.9M	lib/GenVector.pcm				      |	2.8M	lib/GenVector.pcm
868K	lib/GeomBuilder.pcm				      |	876K	lib/GeomBuilder.pcm
500K	lib/GeomPainter.pcm				      |	520K	lib/GeomPainter.pcm
3.4M	lib/Geom.pcm					      |	3.3M	lib/Geom.pcm
860K	lib/Gpad.pcm						860K	lib/Gpad.pcm
836K	lib/Graf3d.pcm					      |	844K	lib/Graf3d.pcm
1.0M	lib/Graf.pcm						1.0M	lib/Graf.pcm
540K	lib/GuiBld.pcm					      |	556K	lib/GuiBld.pcm
588K	lib/GuiHtml.pcm					      |	604K	lib/GuiHtml.pcm
3.5M	lib/Gui.pcm					      |	3.4M	lib/Gui.pcm
496K	lib/Gviz3d.pcm					      |	516K	lib/Gviz3d.pcm
468K	lib/GX11.pcm					      |	484K	lib/GX11.pcm
412K	lib/GX11TTF.pcm					      |	432K	lib/GX11TTF.pcm
3.6M	lib/HistFactory.pcm				      |	3.4M	lib/HistFactory.pcm
484K	lib/HistPainter.pcm				      |	500K	lib/HistPainter.pcm
5.9M	lib/Hist.pcm					      |	5.7M	lib/Hist.pcm
1.5M	lib/Html.pcm						1.5M	lib/Html.pcm
1.8M	lib/Imt.pcm					      |	1.7M	lib/Imt.pcm
1.9M	lib/libc.pcm						1.9M	lib/libc.pcm
12M	lib/MathCore.pcm				      |	11M	lib/MathCore.pcm
1.6M	lib/Matrix.pcm						1.6M	lib/Matrix.pcm
3.1M	lib/Minuit2.pcm					      |	3.0M	lib/Minuit2.pcm
544K	lib/Minuit.pcm					      |	560K	lib/Minuit.pcm
476K	lib/MLP.pcm					      |	496K	lib/MLP.pcm
1.2M	lib/MultiProc.pcm					1.2M	lib/MultiProc.pcm
1.1M	lib/Net.pcm						1.1M	lib/Net.pcm
712K	lib/NetxNG.pcm						712K	lib/NetxNG.pcm
728K	lib/Physics.pcm					      |	736K	lib/Physics.pcm
492K	lib/Postscript.pcm				      |	508K	lib/Postscript.pcm
564K	lib/ProofBench.pcm				      |	584K	lib/ProofBench.pcm
948K	lib/ProofDraw.pcm				      |	940K	lib/ProofDraw.pcm
1.6M	lib/Proof.pcm						1.6M	lib/Proof.pcm
732K	lib/ProofPlayer.pcm				      |	744K	lib/ProofPlayer.pcm
596K	lib/Quadp.pcm					      |	608K	lib/Quadp.pcm
392K	lib/RCsg.pcm					      |	412K	lib/RCsg.pcm
536K	lib/Recorder.pcm				      |	556K	lib/Recorder.pcm
5.4M	lib/RGL.pcm					      |	5.1M	lib/RGL.pcm
1.6M	lib/RHTTP.pcm					      |	1.5M	lib/RHTTP.pcm
412K	lib/RHTTPSniff.pcm				      |	436K	lib/RHTTPSniff.pcm
400K	lib/Rint.pcm					      |	420K	lib/Rint.pcm
2.6M	lib/RIO.pcm					      |	2.5M	lib/RIO.pcm
23M	lib/RooFitCore.pcm				      |	22M	lib/RooFitCore.pcm
1.1M	lib/RooFitHS3.pcm				      |	1008K	lib/RooFitHS3.pcm
16M	lib/RooFit.pcm					      |	15M	lib/RooFit.pcm
424K	lib/RooFitRDataFrameHelpers.pcm			      |	448K	lib/RooFitRDataFrameHelpers.pcm
4.3M	lib/RooStats.pcm				      |	4.1M	lib/RooStats.pcm
468K	lib/RootAuth.pcm				      |	484K	lib/RootAuth.pcm
120K	lib/ROOT_Config.pcm					120K	lib/ROOT_Config.pcm
15M	lib/ROOTDataFrame.pcm				      |	14M	lib/ROOTDataFrame.pcm
332K	lib/ROOT_Foundation_C.pcm				332K	lib/ROOT_Foundation_C.pcm
620K	lib/ROOT_Foundation_Stage1_NoRTTI.pcm		      |	600K	lib/ROOT_Foundation_Stage1_NoRTTI.pcm
140K	lib/ROOT_Rtypes.pcm					140K	lib/ROOT_Rtypes.pcm
4.1M	lib/ROOTTMVASofie.pcm					4.1M	lib/ROOTTMVASofie.pcm
412K	lib/ROOTTPython.pcm				      |	432K	lib/ROOTTPython.pcm
2.6M	lib/ROOTVecOps.pcm				      |	2.5M	lib/ROOTVecOps.pcm
652K	lib/SessionViewer.pcm				      |	668K	lib/SessionViewer.pcm
3.0M	lib/Smatrix.pcm					      |	2.9M	lib/Smatrix.pcm
436K	lib/SpectrumPainter.pcm				      |	456K	lib/SpectrumPainter.pcm
572K	lib/Spectrum.pcm				      |	584K	lib/Spectrum.pcm
424K	lib/SPlot.pcm					      |	440K	lib/SPlot.pcm
624K	lib/SQLIO.pcm					      |	640K	lib/SQLIO.pcm
18M	lib/std.pcm					      |	17M	lib/std.pcm
1.6M	lib/Thread.pcm					      |	1.5M	lib/Thread.pcm
568K	lib/TMVAGui.pcm					      |	588K	lib/TMVAGui.pcm
18M	lib/TMVA.pcm					      |	17M	lib/TMVA.pcm
2.6M	lib/Tree.pcm					      |	2.5M	lib/Tree.pcm
4.5M	lib/TreePlayer.pcm				      |	4.3M	lib/TreePlayer.pcm
668K	lib/TreeViewer.pcm				      |	684K	lib/TreeViewer.pcm
536K	lib/Unfold.pcm					      |	552K	lib/Unfold.pcm
424K	lib/X3d.pcm					      |	448K	lib/X3d.pcm
1.1M	lib/XMLIO.pcm					      |	1.0M	lib/XMLIO.pcm
444K	lib/XMLParser.pcm				      |	464K	lib/XMLParser.pcm

cc: @hahnjo, @Axel-Naumann

opened by vgvassilev 750

[Exp PyROOT] Build PyROOT with multiple Python versions

The commits in this PR contain the necessary steps performed in order to allow the user to build PyROOT with more than one versions of Python. The version in use can be changed with the usual source thisroot.sh preceded by the specific Python version, e.g.: ROOT_PYTHON_VERSION=3.6 source bin/thisroot.sh performed inside the build directory. Quick summary of the commits: (1) set the necessary CMake variables to build the PyROOT libraries in lib/pythonX.Y (2) modify thisroot.sh to allow the user to select the Python version (3) necessary changes to pyunittests and tutorials CMake variables (4) installation
new feature

opened by maxgalli 346
RooFit::MultiProcess & TestStatistics part 2 redo: RooFitZMQ & MultiProcess
This PR is a do-over of #8385 and #8412 and, as such, again the second part of a split and clean-up of #8294. The most important blocker in those PRs was the inclusion of a patched libzmq in RooFitZMQ itself. This patch has now been included in libzmq proper. Another blocking review comment was that libzmq symbols must not be allowed to be exported through our libraries. This has been solved in theory, and in practice is pending another PR to libzmq. Having fixed these two blockers, we should now be able to continue.

To recap:

In this PR, we introduce two packages: RooFitZMQ and RooFit::MultiProcess. It also adds two builtins for ZeroMQ to ease dependency management: libzmq and cppzmq. The builtin for libzmq is especially necessary at this point because it has recently gained a necessary feature that has not been released yet.

RooFit::MultiProcess is a task-based parallelization framework.

It uses forked processes for parallelization, as opposed to threads. We chose this approach because A) the existing RooRealMPFE parallelization framework already made use of forks as well, so we had something to build on and B) it was at the time deemed infeasible to check the entire RooFit code for thread-safety. Moreover, we use MultiProcess to parallelize gradients -- i.e. the tasks to be executed in parallel are partial derivatives -- and these are sufficiently large tasks that communication in between tasks is not a big concern in the big fits that we aimed to parallelize.

The communication between processes is done using ZeroMQ. The ZeroMQ dependency is wrapped in convenience classes contributed by @roelaaij which here are packaged as RooFitZMQ.

Will un-draft the PR once the following is done (based on previous review comments by @guitargeek @hageboeck @amadio @lmoneta and also some other things from myself):

[x] includes: correct order (matching header, RooFit, ROOT, std) and ROOT includes in quotation marks

[x] fix ZMQ deprecation warnings

[x] refactor member names: underscore suffix

[x] document important things with doxygen

[x] remove commented out code and TODOs and other junk

[x] fix copyright headers + author lists (RooFitZMQ: me, Roel; MP: me, Inti, Vince)

[ ] rebase in 2-3 neat commits that all compile and pass tests

[x] clang-tidy up

[x] change libzmq builtin back to master after PR is merged: https://github.com/zeromq/libzmq/pull/4266

[ ] ~use enum class instead of template parameters for minimizer function implementation choice~ -> next PR

Edit 18 Nov 2021: the following list is to keep track of unaddressed (at time of writing) comments made in this thread (because the thread is so long that it is very inconvenient to navigate on GitHub which doesn't load it all at once):

[x] https://github.com/root-project/root/pull/9078#pullrequestreview-773656413: only need to rebase, but that is already listed above.

[x] https://github.com/root-project/root/pull/9078#pullrequestreview-790026907: we have to double check whether the build issues still exist. They should be gone, because we don't build dictionaries anymore.

[x] https://github.com/root-project/root/pull/9078#discussion_r736998615: Related to the issue above, iiuc, because the include was missing from the dictionary, so this can probably also be marked resolved now.

[x] https://github.com/root-project/root/pull/9078#pullrequestreview-791797535: change inc to res in RooFitZMQ and MultiProcess and only include these zmq header exposing include directories to specific targets that need them using target_include_directories. This way, we don't transitively expose zmq includes to ROOT users.

[x] https://github.com/root-project/root/pull/9078#pullrequestreview-791786326: The above solution also circumvents this issue with ZMQ_ENABLE_DRAFT preprocessor defines.

[x] https://github.com/root-project/root/pull/9078#pullrequestreview-791883192: change copyright/license headers.

Let me know if you find additional items for the todo list.
in:RooFit/RooStats
opened by egpbos 312
[cxxmodules] Fix failing runtime_cxxmodules tests by preloading modules
Currently, 36 tests are failing for runtime modules: https://epsft-jenkins.cern.ch/view/ROOT/job/root-nightly-runtime-cxxmodules/ We want to make these test pass so that we can say that the runtime modules is finally working.

This patch enables ROOT to preload all modules at startup time. In my environment, this patch fixes 14 tests for runtime cxxmodules.

Preloading all the modules has several advantages. 1. We do not have to rely on rootmap files which don't support some features (namespaces and templates) 2. Lookup would be faster because we don't have to do trampoline via rootmap files.

The only disadvantage of preloading all the modules is the startup time performance. root.exe -q -l memory.C This is a release build without modules:

cpu time = 0.091694 seconds sys time = 0.026187 seconds res memory = 133.008 Mbytes vir memory = 217.742 Mbytes

This is a release build with modules, with this patch:

cpu time = 0.234134 seconds sys time = 0.066774 seconds res memory = 275.301 Mbytes vir memory = 491.832 Mbytes

As you can see, preloading all the modules makes both time and memory 2 to 3 times worse at a startup time.

Edit : With hsimple.C root.exe -l -b tutorials/hsimple.C -q ~/CERN/ROOT/memory.C Release build without modules:

Processing tutorials/hsimple.C... hsimple : Real Time = 0.04 seconds Cpu Time = 0.05 seconds (TFile *) 0x555ae2a9d560 Processing /home/yuka/CERN/ROOT/memory.C... cpu time = 0.173591 seconds sys time = 0.011835 seconds res memory = 135.32 Mbytes vir memory = 209.664 Mbytes

Release build with modules, with this patch:

Processing tutorials/hsimple.C... hsimple : Real Time = 0.04 seconds Cpu Time = 0.04 seconds (TFile *) 0x55d1b036d230 Processing /home/yuka/CERN/ROOT/memory.C... cpu time = 0.290742 seconds sys time = 0.043851 seconds res memory = 256.844 Mbytes vir memory = 438.484 Mbytes

However, it is a matter of course that we get slower startup time if we try to load all the modules at startup time, not on-demand. I haven't had a good benchmark for this but, in theory, it reduces execution time instead as we're anyway loading modules after the startup.
opened by yamaguchi1024 282
[cmake] use only source dirs as include paths when building ROOT

Fully exclude ${CMAKE_BUILD_DIR)/include from includes paths when buiding ROOT libraries

Several generated files placed first to ${CMAKE_BUILD_DIR)/ginclude and then copied to include.

Dictionary generation still uses only ${CMAKE_BUILD_DIR)/include, otherwise cling complains about similar includes in different places. Once problem with cling fixed, source dirs can be used for it as well
new feature

opened by linev 252
[cxxmodules] Implement global module indexing to improve performance.
The global module index represents an efficient on-disk hash table which stores identifier->module mapping. Every time clang finds a unknown identifier we are informed and we can load the corresponding module on demand.

This way we can provide minimal set of loaded modules. Currently, we see that for hsimple.C only the half of the modules are loaded. This can be further improved because we currently load all modules which have an identifier, that is when looking for (for example TPad) we will load all modules which have the identifier TPad, including modules which contain only a forward declaration of it.

Kudos Arpitha Raghunandan (@arpi-r)!

We still need some performance measurements but the preliminary results are promising.

Performance

Methodology

We have a forwarding root.exe which essentially calls /usr/bin/time -v root.exe $@. We have processed and stored this information in csv files. We have run in three modes:

root master without modules (modulesoff)

root master with modules (moduleson)

root master with this PR with modules (gmi)

Run on Ubuntu 18.10 on Intel® Core™ i5-8250U CPU @ 1.60GHz × 8

Results Interpretation

A general comparison between 2) and 3) show that this PR makes ROOT about 3% faster and 25% more memory efficient.

A general comparison between 1) and 3) shows that modules are still less efficient in a few cases which is expected because the PR loads more modules than it should. This will be addressed in subsequent PRs. A good trend is that some test already show that 3) is better than 1).

The raw data could be found here. [work was done by Arpitha Raghunandan (@arpi-r)]

Depends on #4005.
opened by vgvassilev 219
[VecOps] RVec 2.0: small buffer optimization based on LLVM SmallVector
[x] add ARCHITECTURE.md

[x] use fCapacity == -1 to indicate memory-adoption mode

[x] switch asserts to throws

[x] expose the small buffer size as a template parameter (defaulted to sizeof(T)*8 > 1024 ? 0 : 8 or similar, see also https://lists.llvm.org/pipermail/llvm-dev/2020-November/146613.html and the way they currently do it in LLVM: https://llvm.org/doxygen/SmallVector_8h_source.html#l01101)

[x] re-check before/after benchmark runtimes (first measurements at https://eguiraud.web.cern.ch/eguiraud/decks/20201112_rvec_redesign_ppp )

[x] unit test for exceptions thrown during construction or resizing (and add note about lack of exception safety in docs)

[x] confirm that crediting of LLVM is ok (currently only in math/vecops/ARCHITECTURE.md)
opened by eguiraud 200
[CMake] Add automatic FAILREGEX for gtests
gtests can print errors using ROOT's message system, but these get ignored completely. Several problems could have been caught automatically, but they went undetected.

This adds a default regex to all gtests that checks for "(Fatal|Error|Warning) in <", unless an explicit FAILREGEX is passed to ROOT_ADD_GTEST.

How to fix the tests:

[Easy, but unsafe] Add FAILREGEX "" to ROOT_ADD_GTEST. In that case, we will not grep for anything.

[Safe] Use the macros from https://github.com/root-project/root/blob/master/test/unit_testing_support/ROOTUnitTestSupport.h and catch the diagnostics

Fix what triggers the warnings/errors
opened by hageboeck 200

Add vectorized implementations of first batch of TMath functions

This PR adds vectorized implementations of the following TMath functions using VecCore backend :

Log2
Breit-Wigner
Gaus
LaplaceDist
LaplaceDistI
Freq
Bessel I0, I1, J0, J1

The first batch includes functions for which a definite speedup is obtained. Left out are the ones with more conditional branches. Work is ongoing to implement them as well.

Here is the PR for benchmarks.

Benchmarks from a trial run :

----------------------------------------------------------------------
Benchmark                                Time           CPU Iterations
-----------------------------------------------------------------------
BM_TMath_Log2                       340895 ns     340801 ns       2042
BM_TMath_BreitWigner                 42236 ns      42227 ns      16562
BM_TMath_Gaus                       280188 ns     280130 ns       2476
BM_TMath_LaplaceDist                246254 ns     246176 ns       2834
BM_TMath_LaplaceDistI               291277 ns     291221 ns       2405
BM_TMath_Freq                       388384 ns     388278 ns       1816
BM_TMath_BesselI0                   283500 ns     283445 ns       2466
BM_TMath_BesselI1                   327932 ns     327847 ns       2134
BM_TMath_BesselJ0                   744044 ns     743897 ns        938
BM_TMath_BesselJ1                   735381 ns     735235 ns        937
BM_VectorizedTMath_Log2              97462 ns      97433 ns       7079
BM_VectorizedTMath_BreitWigner       20773 ns      20769 ns      33494
BM_VectorizedTMath_Gaus             127413 ns     127385 ns       5519
BM_VectorizedTMath_LaplaceDist      118903 ns     118870 ns       5845
BM_VectorizedTMath_LaplaceDistI     130724 ns     130693 ns       5367
BM_VectorizedTMath_Freq             267444 ns     267389 ns       2590
BM_VectorizedTMath_BesselI0         177544 ns     177503 ns       3936
BM_VectorizedTMath_BesselI1         206571 ns     206523 ns       3370
BM_VectorizedTMath_BesselJ0         326378 ns     326312 ns       2144
BM_VectorizedTMath_BesselJ1         343600 ns     343531 ns       2014

new contributor

opened by ArifAhmed1995 164

[cxxmodules] Enable the semantic global module index to boost performance.

The global module index (GMI) is an optimization which hides the introduced by clang overhead when pre-loading the C++ modules at startup.

The GMI represents a mapping between an identifier and a set of modules which contain this indentifier. This mean that if we TH1 is undeclared the GMI will load all modules which contain this identifier which is usually suboptimal, too.

The semantic GMI maps identifiers only to modules which contain a definition of the entity behind the identifier. For cases such as typedefs where the entity introduces a synonym (rather than declaration) we map the first module we encounter. For namespaces we add all modules which has a namespace partition. The namespace case is still suboptimal and further improved by inspecting what exactly is being looked up in the namespace by the qualified lookup facilities.

opened by vgvassilev 160
[tcling] Improve symbol resolution.
This patch consolidates the symbol resolution facilities throughout TCling into a new singleton class Dyld part of the cling's DynamicLibraryManager.

The new dyld is responsible for:

Symlink resolution -- it implements a memory efficient representation of the full path to shared objects allowing search at constant time O(1). This also fixes issues when resolving symbols from OSX where the system libraries contain multiple levels of symlinks.

Bloom filter optimization -- it uses a stohastic data structure which gives a definitive answer if a symbol is not in the set. The implementation checks the .gnu.hash section in ELF which is the GNU implementation of a bloom filter and uses it. If the symbol is not in the bloom filter, the implementation builds its own and uses it. The measured performance of the bloom filter is 30% speed up for 2mb more memory. The custom bloom filter on top of the .gnu.hash filter gives 1-2% better performance. The advantage for the custom bloom filter is that it works on all implementations which do not support .gnu.hash (windows and osx). It is also customizable if we want to further reduce the false positive rates (currently at p=2%).

Hash table optimization -- we build a hash table which contains all symbols for a given library. This allows us to avoid the fallback symbol iteration if multiple symbols from the same library are requested. The hash table optimization targets to optimize the case where the bloom filter tells us the symbol is maybe in the library.

Patch by Alexander Penev (@alexander-penev) and me!

Performance Report

|platform|test|PCH-time|Module-time|Module-PR-time| |:--------|:---|:---------:|:-----------:|:---------------| |osx 10.14|roottest-python-pythonizations|22,82|26,89|20,08| |osx 10.14| roottest-cling| 589,67|452,97|307,34| |osx 10.14| roottest-python| 377,69|475,78|311,5| |osx 10.14| roottest-root-hist| 60,59|90,98|49,65| |osx 10.14| roottest-root-math| 106,18|140,41|73,96| |osx 10.14| roottest-root-tree| 1287,53|1861|1149,35| |osx 10.14| roottest-root-treeformula | 568,43|907,46|531| |osx 10.15| root-io-stdarray| - | 126.02 | 31.42| |osx 10.15| roottest-root-treeformula| - | 327.08 | 231.14 |

The effect of running ctest -j8: |platform|test|PCH-time|Module-time|Module-PR-time| |:--------|:---|:---------:|:-----------:|:---------------| |osx 10.14|roottest-python-pythonizations|14,45|18,89|13,03| |osx 10.14| roottest-cling| 88,96|118,94|100,1| |osx 10.14| roottest-python| 107,57|60,93|100,88| |osx 10.14| roottest-root-hist| 10,25|23,25|11,77| |osx 10.14| roottest-root-math| 8,33|21,23|9,27| |osx 10.14| roottest-root-tree| 555|840,89|510,97| |osx 10.14| roottest-root-treeformula | 235,44|402,82|228,91|

We think in -j8 we lose the advantage of the new PR because the PCH had the rootmaps read in memory and restarting the processes allowed the kernel efficiently reuse that memory. Whereas, the modules and this PR scans the libraries from disk and builds in-memory optimization data structures. Reading from disk seems to be the bottleneck (not verified) but if that's an issue in future we can write out the index making subsequent runs at almost zero cost.
opened by vgvassilev 159
Make ROOT terminology and workings easier to decypher
Explain what you would like to see improved

Documentation.

I am having to spend way too much time trying to figure out what basic stuff is.

Eg. a TTree is apparently a list of "independent columns" but so far looks to me very much like it is the equivalent of a table used to back a higher level representation of table in which case the columns are not independent - they would be related (unless "independent columns" is being used to mean statistically independent variables).

And I came across "event" in code comments which sounds very like "event" is a "row" of data which'd make sense from a Cern perspective but is ambiguous/meaningless/confusing to a newbie.

share how it could be improved

A Glossary with ROOT term equivalents in other frames of reference

Eg.

Event ~ row ~ tuple ~ observation (assuming I guessed correctly)

TTRee ~ RDataFrame/TDataFrame ~ dataset ~ Table ~ 1 or 2 dimensional Array or Tensor ~ a grid of data with one row per event/observation/record. TBranch ~ column of data in a grid or table of data TLeaf ~ element ~ cell - a single observation of single variable

And where these are not correct list the differences between them to clarify what they actually are.

Without a clear and precise understanding of what the terms mean you are never sure about what you are doing.

Some (more) high level notes on how the framework works would be very useful at the start of the primer or comments in the code to explain "magic" when it happens - I was scratching my head as to how one particular object knew to use another when no relationship appeared in the code anywhere;

// The canvas on which we'll draw the graph auto mycanvas = new TCanvas(); // lots of code like... // Draw the graph ! graph.DrawClone("APE"); // but no mention of mycanvas again until... mycanvas->Print("graph_with_law.pdf");

which raises all sorts of questions ( as it is not obvious what is going on ).

Basic stuff first:

Most people will want to read in a multi column file and get stats/analysis on those columns - fromCSV is buried pretty deep considering - why why why am I reading about "TTree"s when I can get going without it ?
improvement
opened by bobOnGitHub 0
[RF] Fix and improvements in `testSumW2Error`

In testSumW2Error, a weighted clone of an unweighted dataset is created, where each event gets the weight 0.5.

However, in the loop over the original dataset used to fill the new dataset, get(i) is never called, meaning the new weighted dataset is only filled with the same values. This resulted in an unstable fit, necessitating careful tweaking of the initial parameters to even get convergence. That's why it's better to copy the dataset correctly, even if this is just the test case. I noticed this problem when I was copy-pasting code to create another new unit test.

Also, the binned dataset is now a binned clone of the unbinned dataset in the test, reducing the degree of randomness.

Furthermore, some general code improvements are applied to the source file.
in:RooFit/RooStats

opened by guitargeek 5

[RF] Completely implement `Offset("bin")` feature

Fully implement and test the new Offset("bin") feature over the test matrix that is the tensor product of BatchMode(), doing and extended fit, RooDataSet vs. RooDataHist, and SumW2 correction. The test should compute the likelihood for a template PDF created from the data, and it should be numerically compatible with zero.

void testOffsetBin()
{
   using namespace RooFit;
   using RealPtr = std::unique_ptr<RooAbsReal>;

   // Create extended PDF model
   RooRealVar x("x", "x", -10, 10);
   RooRealVar mean("mean", "mean", 0, -10, 10);
   RooRealVar sigma("sigma", "width", 4, 0.1, 10);
   RooRealVar nEvents{"n_events", "n_events", 10000, 100, 100000};

   RooGaussian gauss("gauss", "gauss", x, mean, sigma);
   RooAddPdf extGauss("extGauss", "extGauss", gauss, nEvents);

   std::unique_ptr<RooDataSet> data{extGauss.generate(x)};

   {
      // Create weighted dataset and hist to test SumW2 feature
      RooRealVar weight("weight", "weight", 0.5, 0.0, 1.0);
      auto dataW = std::make_unique<RooDataSet>("dataW", "dataW", RooArgSet{x, weight}, "weight");
      for (std::size_t i = 0; i < data->numEntries(); ++i) {
         dataW->add(*data->get(i), 0.5); // try weights that are different from unity
      }
      std::swap(dataW, data); // try to replace the original dataset with weighted dataset
   }

   std::unique_ptr<RooDataHist> hist{data->binnedClone()};

   data->Print();
   hist->Print();

   // Create template PDF based on data
   RooHistPdf histPdf{"histPdf", "histPdf", x, *hist};
   RooAddPdf extHistPdf("extHistPdf", "extHistPdf", histPdf, nEvents);

   auto& pdf = extHistPdf;

   auto const bm = "off"; // it should also work work BatchMode("cpu")

   double nllVal01 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(false))}->getVal();
   double nllVal02 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(true)) }->getVal();
   double nllVal03 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(false))}->getVal();
   double nllVal04 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(true)) }->getVal();

   double nllVal1  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
   double nllVal2  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();
   double nllVal3  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
   double nllVal4  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();

   // The final unit test should also include the SumW2 option in the test matrix

   // For all configurations, the bin offset should have the effect of bringing
   // the NLL close to zero:
   std::cout << "Unbinned fit      :  " << nllVal01 << "    " << nllVal1 << std::endl;
   std::cout << "Unbinned ext. fit : " << nllVal02 << "   " << nllVal2 << std::endl;
   std::cout << "Binned fit        :  " << nllVal03 << "   " << nllVal3 << std::endl;
   std::cout << "Binned ext. fit   : " << nllVal04 << "   " << nllVal4 << std::endl;
}

new feature in:RooFit/RooStats

opened by guitargeek 0

[RF] Remove RooFormula code for gcc <= 4.8 when minimum standard is raised to C++17

This issue serves as a reminder that the code behind #ifndef ROOFORMULA_HAVE_STD_REGEX in RooFormula.cxx can be removed once the minimum C++ standard for ROOT is raised to C++17, because then gcc 4.8 is not supported anymore anyway. At that point, std::regex probably also works with visual studio, so the #ifndef _MSC_VER check can probably be removed in the same go.

See #8583 as a reference for what files to check to know what the minimum supported C++ standard of ROOT is.
in:RooFit/RooStats

opened by guitargeek 0
[RF] Exclude `RooGrid` class from IO

The RooGrid is a utility class for the RooMCIntegrator, which doesn't support IO itself. Therefore, it doesn't make sense to have a ClassDef(1) macro. It is only putting the unnecessary burden of keeping backwards compatibility on the developers.

Therefore, this commit suggests to leave out the ClassDef macro out of RooGrid, and also remove the unnecessary base classes TObject and RooPrintable. There is only one printing function that makes sense anyway, which is kept without implementing the full RooPrintable interface.
in:RooFit/RooStats

opened by guitargeek 6
[RF] Avoid code duplication with new private `Algorithms.h` file

The RooMomentMorphND and RooMomentMorphFuncND classes duplicated some copy-pasted code from stackoverflow. This is not factored out into a new private header file to avoid code duplication.

Also, a semicolon is added after TRACE_CREATE and TRACE_DESTROY in order to not confuse clang-format.
in:RooFit/RooStats

opened by guitargeek 6

Releases(v6-26-10)

v6-26-10(Nov 23, 2022)

Patch release of v6.26 series.

:spiral_notepad: Release notes :floppy_disk: Install instructions
Source code(tar.gz)
Source code(zip)
root_v6.26.10.Linux-centos8-x86_64-gcc8.5.tar.gz(208.58 MB)
root_v6.26.10.Linux-fedora32-x86_64-gcc10.3.tar.gz(230.55 MB)
root_v6.26.10.Linux-fedora34-x86_64-gcc11.2.tar.gz(277.80 MB)
root_v6.26.10.Linux-fedora36-x86_64-gcc12.2.tar.gz(278.71 MB)
root_v6.26.10.Linux-ubuntu18-x86_64-gcc7.5.tar.gz(225.80 MB)
root_v6.26.10.Linux-ubuntu20-x86_64-gcc9.4.tar.gz(226.78 MB)
root_v6.26.10.Linux-ubuntu22-x86_64-gcc11.3.tar.gz(267.96 MB)
root_v6.26.10.macos-11.7-arm64-clang120.pkg(320.39 MB)
root_v6.26.10.macos-11.7-arm64-clang120.tar.gz(206.32 MB)
root_v6.26.10.macos-11.7-x86_64-clang120.pkg(335.86 MB)
root_v6.26.10.macos-11.7-x86_64-clang120.tar.gz(218.58 MB)
root_v6.26.10.macos-12.6-arm64-clang140.pkg(318.36 MB)
root_v6.26.10.macos-12.6-arm64-clang140.tar.gz(206.22 MB)
root_v6.26.10.macos-12.6-x86_64-clang140.pkg(333.16 MB)
root_v6.26.10.macos-12.6-x86_64-clang140.tar.gz(217.71 MB)
root_v6.26.10.macos-13.0-arm64-clang140.pkg(318.36 MB)
root_v6.26.10.macos-13.0-arm64-clang140.tar.gz(206.22 MB)
root_v6.26.10.macos-13.0-x86_64-clang140.pkg(333.13 MB)
root_v6.26.10.macos-13.0-x86_64-clang140.tar.gz(217.68 MB)
root_v6.26.10.source.tar.gz(186.18 MB)
root_v6.26.10.win32.vc17.debug.exe(210.41 MB)
root_v6.26.10.win32.vc17.debug.zip(319.14 MB)
root_v6.26.10.win32.vc17.exe(106.94 MB)
root_v6.26.10.win32.vc17.zip(145.81 MB)
root_v6.26.10.win64.vc17.exe(111.94 MB)
root_v6.26.10.win64.vc17.zip(153.36 MB)
v6-26-04(Jun 16, 2022)

Patch release of v6.26 series.

:spiral_notepad: Release notes :floppy_disk: Install instructions
Source code(tar.gz)
Source code(zip)
html626.tar.gz(796.42 MB)
html626.tar.xz(527.13 MB)
root_v6.26.04.Linux-centos8-x86_64-gcc8.5.tar.gz(207.95 MB)
root_v6.26.04.Linux-fedora32-x86_64-gcc10.3.tar.gz(229.90 MB)
root_v6.26.04.Linux-fedora34-x86_64-gcc11.2.tar.gz(277.14 MB)
root_v6.26.04.Linux-ubuntu18-x86_64-gcc7.5.tar.gz(224.61 MB)
root_v6.26.04.Linux-ubuntu20-x86_64-gcc9.4.tar.gz(225.60 MB)
root_v6.26.04.Linux-ubuntu22-x86_64-gcc11.2.tar.gz(266.68 MB)
root_v6.26.04.macos-10.15-x86_64-clang120.pkg(336.00 MB)
root_v6.26.04.macos-10.15-x86_64-clang120.tar.gz(218.20 MB)
root_v6.26.04.macos-11.6-arm64-clang120.pkg(318.84 MB)
root_v6.26.04.macos-11.6-arm64-clang120.tar.gz(205.39 MB)
root_v6.26.04.macos-11.6-x86_64-clang120.pkg(334.29 MB)
root_v6.26.04.macos-11.6-x86_64-clang120.tar.gz(217.63 MB)
root_v6.26.04.macos-12.4-arm64-clang131.pkg(320.11 MB)
root_v6.26.04.macos-12.4-arm64-clang131.tar.gz(206.30 MB)
root_v6.26.04.macos-12.4-x86_64-clang131.pkg(335.52 MB)
root_v6.26.04.macos-12.4-x86_64-clang131.tar.gz(218.19 MB)
root_v6.26.04.source.tar.gz(184.74 MB)
root_v6.26.04.win32.vc16.debug.exe(202.41 MB)
root_v6.26.04.win32.vc16.debug.zip(302.82 MB)
root_v6.26.04.win32.vc16.exe(105.44 MB)
root_v6.26.04.win32.vc16.zip(144.61 MB)
root_v6.26.04.win64.vc17.exe(110.48 MB)
root_v6.26.04.win64.vc17.zip(151.56 MB)
v6-20-06(Jun 16, 2020)

Patch release of v6.20 series
Source code(tar.gz)
Source code(zip)
v6-20-04(Jun 16, 2020)

Patch release of the v6.20 series. See also https://root.cern/content/release-62004
Source code(tar.gz)
Source code(zip)
v6-20-02(Jun 16, 2020)

Patch release of the v6.20 series. See also https://root.cern/content/release-62002
Source code(tar.gz)
Source code(zip)
v6-20-00(Jun 16, 2020)

First release of the v6.20 series. See also https://root.cern/content/release-62000
Source code(tar.gz)
Source code(zip)
v6-18-04(Sep 22, 2019)

See https://root.cern/content/release-61804
Source code(tar.gz)
Source code(zip)
v6-18-02(Jun 16, 2020)

Patch release for the v6.18 series. See also https://root.cern/content/release-61802
Source code(tar.gz)
Source code(zip)
v6-18-00(Jul 15, 2019)

See https://root.cern/content/release-61800 for the highlights and a link to the release notes.
Source code(tar.gz)
Source code(zip)
v6-16-00(Feb 5, 2019)

See https://root.cern/content/release-61600 for the highlights and a link to the release notes.
Source code(tar.gz)
Source code(zip)
v6-14-00(Jun 19, 2018)

First release of the v6-14 series.
Source code(tar.gz)
Source code(zip)
v6-10-08(Nov 24, 2017)

Patch release of the v6-10 series.
Source code(tar.gz)
Source code(zip)
v6-11-02(Oct 6, 2017)
This is a development release. We created it to expose to users part of the enourmous amount of new functionalities which have been integrated in ROOT and will be available in ROOT 6.12/00. These are some of the highlights of the new features you can find in ROOT 6.11/02:

Even more endemic parallelism - Among the many things, a parallelised hadd (hadd -j) and a parallel enabled prompt: just try it root -t!

XCode9 support.

An enhanced TDataFrame. Now it's possible to cache datasets in memory, to create readers for any data format (not just ROOT) thanks to the new TDataSource interface, much faster type inference via an improved jitting, a plethora of new options for saving datasets on disk.

The LZ4 compression algorithm can be now used for faster reading and writing.

Writing files in parallel with TBufferMerger has become much faster and scalable.

See https://root.cern.ch/content/release-61102
Source code(tar.gz)
Source code(zip)
v6-10-06(Sep 26, 2017)

Patch release of the v6-10 series.
Source code(tar.gz)
Source code(zip)
v6-10-04(Aug 25, 2017)

Patch release of the v6-10 series.
Source code(tar.gz)
Source code(zip)

Owner

ROOT

A modular scientific software framework

GitHub Repository https://root.cern

DataPrep — The easiest way to prepare data in Python

1.5k Dec 27, 2022

For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

4 Dec 28, 2021

Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

6 Jun 07, 2022

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Web Trader Web Trader is a trading website that consolidates data from Nasdaq, allowing the user to search up the ticker symbol and price of any stock

21 Aug 30, 2022

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

1 Dec 03, 2021

yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.

The yt Project yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data. yt supports structured, varia

367 Dec 25, 2022

This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

1 Dec 29, 2021

High Dimensional Portfolio Selection with Cardinality Constraints

High-Dimensional Portfolio Selecton with Cardinality Constraints This repo contains code for perform proximal gradient descent to solve sample average

2 Mar 22, 2022

A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

1 Jan 02, 2022

OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase working capital.

Overview OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase

3 Feb 12, 2022

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Related tags

Overview

About

Cite

Live Demo for CERN Users

Screenshots

Installation and Getting Started

Help and Support

Contribution Guidelines

Comments

Performance

Methodology

Results Interpretation

How to fix the tests:

Performance Report

Explain what you would like to see improved

share how it could be improved

Releases(v6-26-10)

v6-26-10(Nov 23, 2022)

v6-26-04(Jun 16, 2022)

v6-20-06(Jun 16, 2020)

v6-20-04(Jun 16, 2020)

v6-20-02(Jun 16, 2020)

v6-20-00(Jun 16, 2020)

v6-18-04(Sep 22, 2019)

v6-18-02(Jun 16, 2020)

v6-18-00(Jul 15, 2019)

v6-16-00(Feb 5, 2019)

v6-14-00(Jun 19, 2018)

v6-10-08(Nov 24, 2017)

v6-11-02(Oct 6, 2017)

v6-10-06(Sep 26, 2017)

v6-10-04(Aug 25, 2017)

Owner

ROOT

DataPrep — The easiest way to prepare data in Python

For making Tagtog annotation into csv dataset

Geospatial data-science analysis on reasons behind delay in Grab ride-share services

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

yt is an open-source, permissively-licensed Python library for analyzing and visualizing volumetric data.

This mini project showcase how to build and debug Apache Spark application using Python

High Dimensional Portfolio Selection with Cardinality Constraints

A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase working capital.

Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Business Intelligence (BI) in Python, OLAP

Project: Netflix Data Analysis and Visualization with Python

Data analysis and visualisation projects from a range of individual projects and applications

Hg002-qc-snakemake - HG002 QC Snakemake

Big Data & Cloud Computing for Oceanography

General Assembly's 2015 Data Science course in Washington, DC

cLoops2: full stack analysis tool for chromatin interactions

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation