The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Overview

About

The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficient way. Having the data defined as a set of objects, specialized storage methods are used to get direct access to the separate attributes of the selected objects, without having to touch the bulk of the data. Included are histograming methods in an arbitrary number of dimensions, curve fitting, function evaluation, minimization, graphics and visualization classes to allow the easy setup of an analysis system that can query and process the data interactively or in batch mode, as well as a general parallel processing framework, PROOF, that can considerably speed up an analysis.

Thanks to the built-in C++ interpreter cling, the command, the scripting and the programming language are all C++. The interpreter allows for fast prototyping of the macros since it removes the time consuming compile/link cycle. It also provides a good environment to learn C++. If more performance is needed the interactively developed macros can be compiled using a C++ compiler via a machine independent transparent compiler interface called ACliC.

The system has been designed in such a way that it can query its databases in parallel on clusters of workstations or many-core machines. ROOT is an open system that can be dynamically extended by linking external libraries. This makes ROOT a premier platform on which to build data acquisition, simulation and data analysis systems.

License: LGPL v2.1+ CII Best Practices

Cite

When citing ROOT, please use both the reference reported below and the DOI specific to your ROOT version available on Zenodo DOI. For example, you can copy-paste and fill in the following citation:

Rene Brun and Fons Rademakers, ROOT - An Object Oriented Data Analysis Framework,
Proceedings AIHENP'96 Workshop, Lausanne, Sep. 1996,
Nucl. Inst. & Meth. in Phys. Res. A 389 (1997) 81-86.
See also "ROOT" [software], Release vX.YY/ZZ, dd/mm/yyyy,
(Select the right link for your release here: https://zenodo.org/search?page=1&size=20&q=conceptrecid:848818&all_versions&sort=-version).

Live Demo for CERN Users

Screenshots

These screenshots shows some of the plots (produced using ROOT) presented when the Higgs boson discovery was announced at CERN:

CMS Data MC Ratio Plot

Atlas P0 Trends

See more screenshots on our gallery.

Installation and Getting Started

See https://root.cern/install for installation instructions. For instructions on how to build ROOT from these source files, see https://root.cern/install/build_from_source.

Our "Getting started with ROOT" page is then the perfect place to get familiar with ROOT.

Help and Support

Contribution Guidelines

Comments
  • Update ROOT's llvm to llvm13.

    Update ROOT's llvm to llvm13.

    The things we need to do before merging this PR and can probably be done by various people in parallel

    Cling standalone:

    • [x] Fix cling CUDA tests
    • [ ] Fix the remaining test failures (6, see below)
    • [x] Revert the commit 'FIXME: Undo this change and debug why we have PendingInstances.'
    Cling test failures

    Failures in master on my system:

        Cling :: CodeUnloading/PCH/VTables.C
        Cling :: DynamicLibraryManager/callable_lib_L_AB_order1.C
    

    Remaining failures (excluding the ones above):

      Cling :: CodeGeneration/Symbols.C
      Cling :: CodeUnloading/AtExit.C
      Cling :: CodeUnloading/PCH/VTablesClingPCH.C
      Cling :: CodeUnloading/RereadFile.C
      Cling :: ErrorRecovery/StoredState.C
      Cling :: MultipleInterpreters/MultipleInterpreters.C
    

    ROOT:

    • [x] Compare the build size against master
    • [x] Compare the .pcm file size against master
    • [ ] Add flags to ignore compilation warnings coming from llvm
    • [x] Remove the FIXME from commit 'Add another symbol generator to resolve the generated lazy symbol' - the explanation is in the commit
    • [x] Fix the llvm::StringRef conversion failures on OSX
    Binary Size this PR needs 13% more space (2.3 vs 2. GB)
    du -hs root-release-llvm13
    2.3G	.
    (base) [email protected] /build/vvassilev/root-release-llvm13 $ du -hs ../root-release-master/
    2.0G	../root-release-master/
    
    Module files need ~5% more space on disk (215 vs 206 MB)
    diff -y llvm13 master 
    424K	lib/ASImageGui.pcm				      |	444K	lib/ASImageGui.pcm
    468K	lib/ASImage.pcm					      |	484K	lib/ASImage.pcm
    4.2M	lib/_Builtin_intrinsics.pcm			      |	4.0M	lib/_Builtin_intrinsics.pcm
    48K	lib/_Builtin_stddef_max_align_t.pcm		      |	44K	lib/_Builtin_stddef_max_align_t.pcm
    200K	lib/Cling_Runtime_Extra.pcm			      |	132K	lib/Cling_Runtime_Extra.pcm
    100K	lib/Cling_Runtime.pcm					100K	lib/Cling_Runtime.pcm
    11M	lib/Core.pcm					      |	9.6M	lib/Core.pcm
    564K	lib/EG.pcm					      |	584K	lib/EG.pcm
    5.7M	lib/Eve.pcm					      |	5.4M	lib/Eve.pcm
    652K	lib/FitPanel.pcm				      |	656K	lib/FitPanel.pcm
    504K	lib/Foam.pcm					      |	520K	lib/Foam.pcm
    440K	lib/Fumili.pcm					      |	460K	lib/Fumili.pcm
    1.2M	lib/Gdml.pcm						1.2M	lib/Gdml.pcm
    960K	lib/Ged.pcm					      |	968K	lib/Ged.pcm
    432K	lib/Genetic.pcm					      |	456K	lib/Genetic.pcm
    2.9M	lib/GenVector.pcm				      |	2.8M	lib/GenVector.pcm
    868K	lib/GeomBuilder.pcm				      |	876K	lib/GeomBuilder.pcm
    500K	lib/GeomPainter.pcm				      |	520K	lib/GeomPainter.pcm
    3.4M	lib/Geom.pcm					      |	3.3M	lib/Geom.pcm
    860K	lib/Gpad.pcm						860K	lib/Gpad.pcm
    836K	lib/Graf3d.pcm					      |	844K	lib/Graf3d.pcm
    1.0M	lib/Graf.pcm						1.0M	lib/Graf.pcm
    540K	lib/GuiBld.pcm					      |	556K	lib/GuiBld.pcm
    588K	lib/GuiHtml.pcm					      |	604K	lib/GuiHtml.pcm
    3.5M	lib/Gui.pcm					      |	3.4M	lib/Gui.pcm
    496K	lib/Gviz3d.pcm					      |	516K	lib/Gviz3d.pcm
    468K	lib/GX11.pcm					      |	484K	lib/GX11.pcm
    412K	lib/GX11TTF.pcm					      |	432K	lib/GX11TTF.pcm
    3.6M	lib/HistFactory.pcm				      |	3.4M	lib/HistFactory.pcm
    484K	lib/HistPainter.pcm				      |	500K	lib/HistPainter.pcm
    5.9M	lib/Hist.pcm					      |	5.7M	lib/Hist.pcm
    1.5M	lib/Html.pcm						1.5M	lib/Html.pcm
    1.8M	lib/Imt.pcm					      |	1.7M	lib/Imt.pcm
    1.9M	lib/libc.pcm						1.9M	lib/libc.pcm
    12M	lib/MathCore.pcm				      |	11M	lib/MathCore.pcm
    1.6M	lib/Matrix.pcm						1.6M	lib/Matrix.pcm
    3.1M	lib/Minuit2.pcm					      |	3.0M	lib/Minuit2.pcm
    544K	lib/Minuit.pcm					      |	560K	lib/Minuit.pcm
    476K	lib/MLP.pcm					      |	496K	lib/MLP.pcm
    1.2M	lib/MultiProc.pcm					1.2M	lib/MultiProc.pcm
    1.1M	lib/Net.pcm						1.1M	lib/Net.pcm
    712K	lib/NetxNG.pcm						712K	lib/NetxNG.pcm
    728K	lib/Physics.pcm					      |	736K	lib/Physics.pcm
    492K	lib/Postscript.pcm				      |	508K	lib/Postscript.pcm
    564K	lib/ProofBench.pcm				      |	584K	lib/ProofBench.pcm
    948K	lib/ProofDraw.pcm				      |	940K	lib/ProofDraw.pcm
    1.6M	lib/Proof.pcm						1.6M	lib/Proof.pcm
    732K	lib/ProofPlayer.pcm				      |	744K	lib/ProofPlayer.pcm
    596K	lib/Quadp.pcm					      |	608K	lib/Quadp.pcm
    392K	lib/RCsg.pcm					      |	412K	lib/RCsg.pcm
    536K	lib/Recorder.pcm				      |	556K	lib/Recorder.pcm
    5.4M	lib/RGL.pcm					      |	5.1M	lib/RGL.pcm
    1.6M	lib/RHTTP.pcm					      |	1.5M	lib/RHTTP.pcm
    412K	lib/RHTTPSniff.pcm				      |	436K	lib/RHTTPSniff.pcm
    400K	lib/Rint.pcm					      |	420K	lib/Rint.pcm
    2.6M	lib/RIO.pcm					      |	2.5M	lib/RIO.pcm
    23M	lib/RooFitCore.pcm				      |	22M	lib/RooFitCore.pcm
    1.1M	lib/RooFitHS3.pcm				      |	1008K	lib/RooFitHS3.pcm
    16M	lib/RooFit.pcm					      |	15M	lib/RooFit.pcm
    424K	lib/RooFitRDataFrameHelpers.pcm			      |	448K	lib/RooFitRDataFrameHelpers.pcm
    4.3M	lib/RooStats.pcm				      |	4.1M	lib/RooStats.pcm
    468K	lib/RootAuth.pcm				      |	484K	lib/RootAuth.pcm
    120K	lib/ROOT_Config.pcm					120K	lib/ROOT_Config.pcm
    15M	lib/ROOTDataFrame.pcm				      |	14M	lib/ROOTDataFrame.pcm
    332K	lib/ROOT_Foundation_C.pcm				332K	lib/ROOT_Foundation_C.pcm
    620K	lib/ROOT_Foundation_Stage1_NoRTTI.pcm		      |	600K	lib/ROOT_Foundation_Stage1_NoRTTI.pcm
    140K	lib/ROOT_Rtypes.pcm					140K	lib/ROOT_Rtypes.pcm
    4.1M	lib/ROOTTMVASofie.pcm					4.1M	lib/ROOTTMVASofie.pcm
    412K	lib/ROOTTPython.pcm				      |	432K	lib/ROOTTPython.pcm
    2.6M	lib/ROOTVecOps.pcm				      |	2.5M	lib/ROOTVecOps.pcm
    652K	lib/SessionViewer.pcm				      |	668K	lib/SessionViewer.pcm
    3.0M	lib/Smatrix.pcm					      |	2.9M	lib/Smatrix.pcm
    436K	lib/SpectrumPainter.pcm				      |	456K	lib/SpectrumPainter.pcm
    572K	lib/Spectrum.pcm				      |	584K	lib/Spectrum.pcm
    424K	lib/SPlot.pcm					      |	440K	lib/SPlot.pcm
    624K	lib/SQLIO.pcm					      |	640K	lib/SQLIO.pcm
    18M	lib/std.pcm					      |	17M	lib/std.pcm
    1.6M	lib/Thread.pcm					      |	1.5M	lib/Thread.pcm
    568K	lib/TMVAGui.pcm					      |	588K	lib/TMVAGui.pcm
    18M	lib/TMVA.pcm					      |	17M	lib/TMVA.pcm
    2.6M	lib/Tree.pcm					      |	2.5M	lib/Tree.pcm
    4.5M	lib/TreePlayer.pcm				      |	4.3M	lib/TreePlayer.pcm
    668K	lib/TreeViewer.pcm				      |	684K	lib/TreeViewer.pcm
    536K	lib/Unfold.pcm					      |	552K	lib/Unfold.pcm
    424K	lib/X3d.pcm					      |	448K	lib/X3d.pcm
    1.1M	lib/XMLIO.pcm					      |	1.0M	lib/XMLIO.pcm
    444K	lib/XMLParser.pcm				      |	464K	lib/XMLParser.pcm
    

    cc: @hahnjo, @Axel-Naumann

    opened by vgvassilev 750
  • [Exp PyROOT] Build PyROOT with multiple Python versions

    [Exp PyROOT] Build PyROOT with multiple Python versions

    The commits in this PR contain the necessary steps performed in order to allow the user to build PyROOT with more than one versions of Python. The version in use can be changed with the usual source thisroot.sh preceded by the specific Python version, e.g.: ROOT_PYTHON_VERSION=3.6 source bin/thisroot.sh performed inside the build directory. Quick summary of the commits: (1) set the necessary CMake variables to build the PyROOT libraries in lib/pythonX.Y (2) modify thisroot.sh to allow the user to select the Python version (3) necessary changes to pyunittests and tutorials CMake variables (4) installation

    new feature 
    opened by maxgalli 346
  • RooFit::MultiProcess & TestStatistics part 2 redo: RooFitZMQ & MultiProcess

    RooFit::MultiProcess & TestStatistics part 2 redo: RooFitZMQ & MultiProcess

    This PR is a do-over of #8385 and #8412 and, as such, again the second part of a split and clean-up of #8294. The most important blocker in those PRs was the inclusion of a patched libzmq in RooFitZMQ itself. This patch has now been included in libzmq proper. Another blocking review comment was that libzmq symbols must not be allowed to be exported through our libraries. This has been solved in theory, and in practice is pending another PR to libzmq. Having fixed these two blockers, we should now be able to continue.

    To recap:

    In this PR, we introduce two packages: RooFitZMQ and RooFit::MultiProcess. It also adds two builtins for ZeroMQ to ease dependency management: libzmq and cppzmq. The builtin for libzmq is especially necessary at this point because it has recently gained a necessary feature that has not been released yet.

    RooFit::MultiProcess is a task-based parallelization framework.

    It uses forked processes for parallelization, as opposed to threads. We chose this approach because A) the existing RooRealMPFE parallelization framework already made use of forks as well, so we had something to build on and B) it was at the time deemed infeasible to check the entire RooFit code for thread-safety. Moreover, we use MultiProcess to parallelize gradients -- i.e. the tasks to be executed in parallel are partial derivatives -- and these are sufficiently large tasks that communication in between tasks is not a big concern in the big fits that we aimed to parallelize.

    The communication between processes is done using ZeroMQ. The ZeroMQ dependency is wrapped in convenience classes contributed by @roelaaij which here are packaged as RooFitZMQ.

    Will un-draft the PR once the following is done (based on previous review comments by @guitargeek @hageboeck @amadio @lmoneta and also some other things from myself):

    • [x] includes: correct order (matching header, RooFit, ROOT, std) and ROOT includes in quotation marks
    • [x] fix ZMQ deprecation warnings
    • [x] refactor member names: underscore suffix
    • [x] document important things with doxygen
    • [x] remove commented out code and TODOs and other junk
    • [x] fix copyright headers + author lists (RooFitZMQ: me, Roel; MP: me, Inti, Vince)
    • [ ] rebase in 2-3 neat commits that all compile and pass tests
    • [x] clang-tidy up
    • [x] change libzmq builtin back to master after PR is merged: https://github.com/zeromq/libzmq/pull/4266
    • [ ] ~use enum class instead of template parameters for minimizer function implementation choice~ -> next PR

    Edit 18 Nov 2021: the following list is to keep track of unaddressed (at time of writing) comments made in this thread (because the thread is so long that it is very inconvenient to navigate on GitHub which doesn't load it all at once):

    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-773656413: only need to rebase, but that is already listed above.
    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-790026907: we have to double check whether the build issues still exist. They should be gone, because we don't build dictionaries anymore.
    • [x] https://github.com/root-project/root/pull/9078#discussion_r736998615: Related to the issue above, iiuc, because the include was missing from the dictionary, so this can probably also be marked resolved now.
    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-791797535: change inc to res in RooFitZMQ and MultiProcess and only include these zmq header exposing include directories to specific targets that need them using target_include_directories. This way, we don't transitively expose zmq includes to ROOT users.
      • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-791786326: The above solution also circumvents this issue with ZMQ_ENABLE_DRAFT preprocessor defines.
    • [x] https://github.com/root-project/root/pull/9078#pullrequestreview-791883192: change copyright/license headers.

    Let me know if you find additional items for the todo list.

    in:RooFit/RooStats 
    opened by egpbos 312
  • [cxxmodules] Fix failing runtime_cxxmodules tests by preloading modules

    [cxxmodules] Fix failing runtime_cxxmodules tests by preloading modules

    Currently, 36 tests are failing for runtime modules: https://epsft-jenkins.cern.ch/view/ROOT/job/root-nightly-runtime-cxxmodules/ We want to make these test pass so that we can say that the runtime modules is finally working.

    This patch enables ROOT to preload all modules at startup time. In my environment, this patch fixes 14 tests for runtime cxxmodules.

    Preloading all the modules has several advantages. 1. We do not have to rely on rootmap files which don't support some features (namespaces and templates) 2. Lookup would be faster because we don't have to do trampoline via rootmap files.

    The only disadvantage of preloading all the modules is the startup time performance. root.exe -q -l memory.C This is a release build without modules:

     cpu  time = 0.091694 seconds
     sys  time = 0.026187 seconds
     res  memory = 133.008 Mbytes
     vir  memory = 217.742 Mbytes
    

    This is a release build with modules, with this patch:

     cpu  time = 0.234134 seconds
     sys  time = 0.066774 seconds
     res  memory = 275.301 Mbytes
     vir  memory = 491.832 Mbytes
    

    As you can see, preloading all the modules makes both time and memory 2 to 3 times worse at a startup time.

    Edit : With hsimple.C root.exe -l -b tutorials/hsimple.C -q ~/CERN/ROOT/memory.C Release build without modules:

    Processing tutorials/hsimple.C...                                                                        
    hsimple   : Real Time =   0.04 seconds Cpu Time =   0.05 seconds                        
    (TFile *) 0x555ae2a9d560                                                                  
    Processing /home/yuka/CERN/ROOT/memory.C...                                                              
     cpu  time = 0.173591 seconds                                   
     sys  time = 0.011835 seconds                       
     res  memory = 135.32 Mbytes                                    
     vir  memory = 209.664 Mbytes 
    

    Release build with modules, with this patch:

    Processing tutorials/hsimple.C...
    hsimple   : Real Time =   0.04 seconds Cpu Time =   0.04 seconds
    (TFile *) 0x55d1b036d230
    Processing /home/yuka/CERN/ROOT/memory.C...
     cpu  time = 0.290742 seconds
     sys  time = 0.043851 seconds
     res  memory = 256.844 Mbytes
     vir  memory = 438.484 Mbytes
    

    However, it is a matter of course that we get slower startup time if we try to load all the modules at startup time, not on-demand. I haven't had a good benchmark for this but, in theory, it reduces execution time instead as we're anyway loading modules after the startup.

    opened by yamaguchi1024 282
  • [cmake] use only source dirs as include paths when building ROOT

    [cmake] use only source dirs as include paths when building ROOT

    Fully exclude ${CMAKE_BUILD_DIR)/include from includes paths when buiding ROOT libraries

    Several generated files placed first to ${CMAKE_BUILD_DIR)/ginclude and then copied to include.

    Dictionary generation still uses only ${CMAKE_BUILD_DIR)/include, otherwise cling complains about similar includes in different places. Once problem with cling fixed, source dirs can be used for it as well

    new feature 
    opened by linev 252
  • [cxxmodules] Implement global module indexing to improve performance.

    [cxxmodules] Implement global module indexing to improve performance.

    The global module index represents an efficient on-disk hash table which stores identifier->module mapping. Every time clang finds a unknown identifier we are informed and we can load the corresponding module on demand.

    This way we can provide minimal set of loaded modules. Currently, we see that for hsimple.C only the half of the modules are loaded. This can be further improved because we currently load all modules which have an identifier, that is when looking for (for example TPad) we will load all modules which have the identifier TPad, including modules which contain only a forward declaration of it.

    Kudos Arpitha Raghunandan (@arpi-r)!

    We still need some performance measurements but the preliminary results are promising.

    Performance

    Methodology

    We have a forwarding root.exe which essentially calls /usr/bin/time -v root.exe $@. We have processed and stored this information in csv files. We have run in three modes:

    1. root master without modules (modulesoff)
    2. root master with modules (moduleson)
    3. root master with this PR with modules (gmi)

    Run on Ubuntu 18.10 on Intel® Core™ i5-8250U CPU @ 1.60GHz × 8

    Results Interpretation

    A general comparison between 2) and 3) show that this PR makes ROOT about 3% faster and 25% more memory efficient.

    A general comparison between 1) and 3) shows that modules are still less efficient in a few cases which is expected because the PR loads more modules than it should. This will be addressed in subsequent PRs. A good trend is that some test already show that 3) is better than 1).

    The raw data could be found here. [work was done by Arpitha Raghunandan (@arpi-r)]

    Depends on #4005.

    opened by vgvassilev 219
  • [VecOps] RVec 2.0: small buffer optimization based on LLVM SmallVector

    [VecOps] RVec 2.0: small buffer optimization based on LLVM SmallVector

    • [x] add ARCHITECTURE.md
    • [x] use fCapacity == -1 to indicate memory-adoption mode
    • [x] switch asserts to throws
    • [x] expose the small buffer size as a template parameter (defaulted to sizeof(T)*8 > 1024 ? 0 : 8 or similar, see also https://lists.llvm.org/pipermail/llvm-dev/2020-November/146613.html and the way they currently do it in LLVM: https://llvm.org/doxygen/SmallVector_8h_source.html#l01101)
    • [x] re-check before/after benchmark runtimes (first measurements at https://eguiraud.web.cern.ch/eguiraud/decks/20201112_rvec_redesign_ppp )
    • [x] unit test for exceptions thrown during construction or resizing (and add note about lack of exception safety in docs)
    • [x] confirm that crediting of LLVM is ok (currently only in math/vecops/ARCHITECTURE.md)
    opened by eguiraud 200
  • [CMake] Add automatic FAILREGEX for gtests

    [CMake] Add automatic FAILREGEX for gtests

    gtests can print errors using ROOT's message system, but these get ignored completely. Several problems could have been caught automatically, but they went undetected.

    This adds a default regex to all gtests that checks for "(Fatal|Error|Warning) in <", unless an explicit FAILREGEX is passed to ROOT_ADD_GTEST.

    How to fix the tests:

    • [Easy, but unsafe] Add FAILREGEX "" to ROOT_ADD_GTEST. In that case, we will not grep for anything.
    • [Safe] Use the macros from https://github.com/root-project/root/blob/master/test/unit_testing_support/ROOTUnitTestSupport.h and catch the diagnostics
    • Fix what triggers the warnings/errors
    opened by hageboeck 200
  • Add vectorized implementations of first batch of TMath functions

    Add vectorized implementations of first batch of TMath functions

    This PR adds vectorized implementations of the following TMath functions using VecCore backend :

    • Log2
    • Breit-Wigner
    • Gaus
    • LaplaceDist
    • LaplaceDistI
    • Freq
    • Bessel I0, I1, J0, J1

    The first batch includes functions for which a definite speedup is obtained. Left out are the ones with more conditional branches. Work is ongoing to implement them as well.

    Here is the PR for benchmarks.

    Benchmarks from a trial run :

    ----------------------------------------------------------------------
    Benchmark                                Time           CPU Iterations
    -----------------------------------------------------------------------
    BM_TMath_Log2                       340895 ns     340801 ns       2042
    BM_TMath_BreitWigner                 42236 ns      42227 ns      16562
    BM_TMath_Gaus                       280188 ns     280130 ns       2476
    BM_TMath_LaplaceDist                246254 ns     246176 ns       2834
    BM_TMath_LaplaceDistI               291277 ns     291221 ns       2405
    BM_TMath_Freq                       388384 ns     388278 ns       1816
    BM_TMath_BesselI0                   283500 ns     283445 ns       2466
    BM_TMath_BesselI1                   327932 ns     327847 ns       2134
    BM_TMath_BesselJ0                   744044 ns     743897 ns        938
    BM_TMath_BesselJ1                   735381 ns     735235 ns        937
    BM_VectorizedTMath_Log2              97462 ns      97433 ns       7079
    BM_VectorizedTMath_BreitWigner       20773 ns      20769 ns      33494
    BM_VectorizedTMath_Gaus             127413 ns     127385 ns       5519
    BM_VectorizedTMath_LaplaceDist      118903 ns     118870 ns       5845
    BM_VectorizedTMath_LaplaceDistI     130724 ns     130693 ns       5367
    BM_VectorizedTMath_Freq             267444 ns     267389 ns       2590
    BM_VectorizedTMath_BesselI0         177544 ns     177503 ns       3936
    BM_VectorizedTMath_BesselI1         206571 ns     206523 ns       3370
    BM_VectorizedTMath_BesselJ0         326378 ns     326312 ns       2144
    BM_VectorizedTMath_BesselJ1         343600 ns     343531 ns       2014
    
    new contributor 
    opened by ArifAhmed1995 164
  • [cxxmodules] Enable the semantic global module index to boost performance.

    [cxxmodules] Enable the semantic global module index to boost performance.

    The global module index (GMI) is an optimization which hides the introduced by clang overhead when pre-loading the C++ modules at startup.

    The GMI represents a mapping between an identifier and a set of modules which contain this indentifier. This mean that if we TH1 is undeclared the GMI will load all modules which contain this identifier which is usually suboptimal, too.

    The semantic GMI maps identifiers only to modules which contain a definition of the entity behind the identifier. For cases such as typedefs where the entity introduces a synonym (rather than declaration) we map the first module we encounter. For namespaces we add all modules which has a namespace partition. The namespace case is still suboptimal and further improved by inspecting what exactly is being looked up in the namespace by the qualified lookup facilities.

    opened by vgvassilev 160
  • [tcling] Improve symbol resolution.

    [tcling] Improve symbol resolution.

    This patch consolidates the symbol resolution facilities throughout TCling into a new singleton class Dyld part of the cling's DynamicLibraryManager.

    The new dyld is responsible for:

    • Symlink resolution -- it implements a memory efficient representation of the full path to shared objects allowing search at constant time O(1). This also fixes issues when resolving symbols from OSX where the system libraries contain multiple levels of symlinks.
    • Bloom filter optimization -- it uses a stohastic data structure which gives a definitive answer if a symbol is not in the set. The implementation checks the .gnu.hash section in ELF which is the GNU implementation of a bloom filter and uses it. If the symbol is not in the bloom filter, the implementation builds its own and uses it. The measured performance of the bloom filter is 30% speed up for 2mb more memory. The custom bloom filter on top of the .gnu.hash filter gives 1-2% better performance. The advantage for the custom bloom filter is that it works on all implementations which do not support .gnu.hash (windows and osx). It is also customizable if we want to further reduce the false positive rates (currently at p=2%).
    • Hash table optimization -- we build a hash table which contains all symbols for a given library. This allows us to avoid the fallback symbol iteration if multiple symbols from the same library are requested. The hash table optimization targets to optimize the case where the bloom filter tells us the symbol is maybe in the library.

    Patch by Alexander Penev (@alexander-penev) and me!

    Performance Report

    |platform|test|PCH-time|Module-time|Module-PR-time| |:--------|:---|:---------:|:-----------:|:---------------| |osx 10.14|roottest-python-pythonizations|22,82|26,89|20,08| |osx 10.14| roottest-cling| 589,67|452,97|307,34| |osx 10.14| roottest-python| 377,69|475,78|311,5| |osx 10.14| roottest-root-hist| 60,59|90,98|49,65| |osx 10.14| roottest-root-math| 106,18|140,41|73,96| |osx 10.14| roottest-root-tree| 1287,53|1861|1149,35| |osx 10.14| roottest-root-treeformula | 568,43|907,46|531| |osx 10.15| root-io-stdarray| - | 126.02 | 31.42| |osx 10.15| roottest-root-treeformula| - | 327.08 | 231.14 |

    The effect of running ctest -j8: |platform|test|PCH-time|Module-time|Module-PR-time| |:--------|:---|:---------:|:-----------:|:---------------| |osx 10.14|roottest-python-pythonizations|14,45|18,89|13,03| |osx 10.14| roottest-cling| 88,96|118,94|100,1| |osx 10.14| roottest-python| 107,57|60,93|100,88| |osx 10.14| roottest-root-hist| 10,25|23,25|11,77| |osx 10.14| roottest-root-math| 8,33|21,23|9,27| |osx 10.14| roottest-root-tree| 555|840,89|510,97| |osx 10.14| roottest-root-treeformula | 235,44|402,82|228,91|

    We think in -j8 we lose the advantage of the new PR because the PCH had the rootmaps read in memory and restarting the processes allowed the kernel efficiently reuse that memory. Whereas, the modules and this PR scans the libraries from disk and builds in-memory optimization data structures. Reading from disk seems to be the bottleneck (not verified) but if that's an issue in future we can write out the index making subsequent runs at almost zero cost.

    opened by vgvassilev 159
  • Make ROOT terminology and workings easier to decypher

    Make ROOT terminology and workings easier to decypher

    Explain what you would like to see improved

    Documentation.

    I am having to spend way too much time trying to figure out what basic stuff is.

    Eg. a TTree is apparently a list of "independent columns" but so far looks to me very much like it is the equivalent of a table used to back a higher level representation of table in which case the columns are not independent - they would be related (unless "independent columns" is being used to mean statistically independent variables).

    And I came across "event" in code comments which sounds very like "event" is a "row" of data which'd make sense from a Cern perspective but is ambiguous/meaningless/confusing to a newbie.

    share how it could be improved

    A Glossary with ROOT term equivalents in other frames of reference

    Eg.

    Event ~ row ~ tuple ~ observation (assuming I guessed correctly)

    TTRee ~ RDataFrame/TDataFrame ~ dataset ~ Table ~ 1 or 2 dimensional Array or Tensor ~ a grid of data with one row per event/observation/record. TBranch ~ column of data in a grid or table of data TLeaf ~ element ~ cell - a single observation of single variable

    And where these are not correct list the differences between them to clarify what they actually are.

    Without a clear and precise understanding of what the terms mean you are never sure about what you are doing.


    Some (more) high level notes on how the framework works would be very useful at the start of the primer or comments in the code to explain "magic" when it happens - I was scratching my head as to how one particular object knew to use another when no relationship appeared in the code anywhere;

    
       // The canvas on which we'll draw the graph
        auto  mycanvas = new TCanvas();
    
     // lots of code like...
    
        // Draw the graph !
        graph.DrawClone("APE");
    
    // but no mention of mycanvas again until...    
    
        mycanvas->Print("graph_with_law.pdf");
    

    which raises all sorts of questions ( as it is not obvious what is going on ).


    Basic stuff first:

    Most people will want to read in a multi column file and get stats/analysis on those columns - fromCSV is buried pretty deep considering - why why why am I reading about "TTree"s when I can get going without it ?

    improvement 
    opened by bobOnGitHub 0
  • [RF] Fix and improvements in `testSumW2Error`

    [RF] Fix and improvements in `testSumW2Error`

    In testSumW2Error, a weighted clone of an unweighted dataset is created, where each event gets the weight 0.5.

    However, in the loop over the original dataset used to fill the new dataset, get(i) is never called, meaning the new weighted dataset is only filled with the same values. This resulted in an unstable fit, necessitating careful tweaking of the initial parameters to even get convergence. That's why it's better to copy the dataset correctly, even if this is just the test case. I noticed this problem when I was copy-pasting code to create another new unit test.

    Also, the binned dataset is now a binned clone of the unbinned dataset in the test, reducing the degree of randomness.

    Furthermore, some general code improvements are applied to the source file.

    in:RooFit/RooStats 
    opened by guitargeek 5
  • [RF] Completely implement `Offset(

    [RF] Completely implement `Offset("bin")` feature

    Fully implement and test the new Offset("bin") feature over the test matrix that is the tensor product of BatchMode(), doing and extended fit, RooDataSet vs. RooDataHist, and SumW2 correction. The test should compute the likelihood for a template PDF created from the data, and it should be numerically compatible with zero.

    void testOffsetBin()
    {
       using namespace RooFit;
       using RealPtr = std::unique_ptr<RooAbsReal>;
    
       // Create extended PDF model
       RooRealVar x("x", "x", -10, 10);
       RooRealVar mean("mean", "mean", 0, -10, 10);
       RooRealVar sigma("sigma", "width", 4, 0.1, 10);
       RooRealVar nEvents{"n_events", "n_events", 10000, 100, 100000};
    
       RooGaussian gauss("gauss", "gauss", x, mean, sigma);
       RooAddPdf extGauss("extGauss", "extGauss", gauss, nEvents);
    
       std::unique_ptr<RooDataSet> data{extGauss.generate(x)};
    
       {
          // Create weighted dataset and hist to test SumW2 feature
          RooRealVar weight("weight", "weight", 0.5, 0.0, 1.0);
          auto dataW = std::make_unique<RooDataSet>("dataW", "dataW", RooArgSet{x, weight}, "weight");
          for (std::size_t i = 0; i < data->numEntries(); ++i) {
             dataW->add(*data->get(i), 0.5); // try weights that are different from unity
          }
          std::swap(dataW, data); // try to replace the original dataset with weighted dataset
       }
    
       std::unique_ptr<RooDataHist> hist{data->binnedClone()};
    
       data->Print();
       hist->Print();
    
       // Create template PDF based on data
       RooHistPdf histPdf{"histPdf", "histPdf", x, *hist};
       RooAddPdf extHistPdf("extHistPdf", "extHistPdf", histPdf, nEvents);
    
       auto& pdf = extHistPdf;
    
       auto const bm = "off"; // it should also work work BatchMode("cpu")
    
       double nllVal01 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(false))}->getVal();
       double nllVal02 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(true)) }->getVal();
       double nllVal03 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(false))}->getVal();
       double nllVal04 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(true)) }->getVal();
    
       double nllVal1  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
       double nllVal2  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();
       double nllVal3  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
       double nllVal4  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();
    
       // The final unit test should also include the SumW2 option in the test matrix
    
       // For all configurations, the bin offset should have the effect of bringing
       // the NLL close to zero:
       std::cout << "Unbinned fit      :  " << nllVal01 << "    " << nllVal1 << std::endl;
       std::cout << "Unbinned ext. fit : " << nllVal02 << "   " << nllVal2 << std::endl;
       std::cout << "Binned fit        :  " << nllVal03 << "   " << nllVal3 << std::endl;
       std::cout << "Binned ext. fit   : " << nllVal04 << "   " << nllVal4 << std::endl;
    }
    
    new feature in:RooFit/RooStats 
    opened by guitargeek 0
  • [RF] Remove RooFormula code for gcc <= 4.8 when minimum standard is raised to C++17

    [RF] Remove RooFormula code for gcc <= 4.8 when minimum standard is raised to C++17

    This issue serves as a reminder that the code behind #ifndef ROOFORMULA_HAVE_STD_REGEX in RooFormula.cxx can be removed once the minimum C++ standard for ROOT is raised to C++17, because then gcc 4.8 is not supported anymore anyway. At that point, std::regex probably also works with visual studio, so the #ifndef _MSC_VER check can probably be removed in the same go.

    See #8583 as a reference for what files to check to know what the minimum supported C++ standard of ROOT is.

    in:RooFit/RooStats 
    opened by guitargeek 0
  • [RF] Exclude `RooGrid` class from IO

    [RF] Exclude `RooGrid` class from IO

    The RooGrid is a utility class for the RooMCIntegrator, which doesn't support IO itself. Therefore, it doesn't make sense to have a ClassDef(1) macro. It is only putting the unnecessary burden of keeping backwards compatibility on the developers.

    Therefore, this commit suggests to leave out the ClassDef macro out of RooGrid, and also remove the unnecessary base classes TObject and RooPrintable. There is only one printing function that makes sense anyway, which is kept without implementing the full RooPrintable interface.

    in:RooFit/RooStats 
    opened by guitargeek 6
  • [RF] Avoid code duplication with new private `Algorithms.h` file

    [RF] Avoid code duplication with new private `Algorithms.h` file

    The RooMomentMorphND and RooMomentMorphFuncND classes duplicated some copy-pasted code from stackoverflow. This is not factored out into a new private header file to avoid code duplication.

    Also, a semicolon is added after TRACE_CREATE and TRACE_DESTROY in order to not confuse clang-format.

    in:RooFit/RooStats 
    opened by guitargeek 6
Releases(v6-26-10)
Owner
ROOT
A modular scientific software framework
ROOT
University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

2 Nov 27, 2021
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

databooks is a package for reducing the friction data scientists while using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and assisting in the resolution of

dataroots 86 Dec 25, 2022
Python Practicum - prepare for your Data Science interview or get a refresher.

Python-Practicum Python Practicum - prepare for your Data Science interview or get a refresher. Data Data visualization using data on births from the

Jovan Trajceski 1 Jul 27, 2021
Bigdata Simulation Library Of Dream By Sandman Books

BIGDATA SIMULATION LIBRARY OF DREAM BY SANDMAN BOOKS ================= Solution Architecture Description In the realm of Dreaming, its ruler SANDMAN,

Maycon Cypriano 3 Jun 30, 2022
Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Surf's Up Weather analysis with Python, SQLite, SQLAlchemy, and Flask Overview The purpose of this analysis was to examine weather trends (precipitati

Art Tucker 1 Sep 05, 2021
Fitting thermodynamic models with pycalphad

ESPEI ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method

Phases Research Lab 42 Sep 12, 2022
Hydrogen (or other pure gas phase species) depressurization calculations

HydDown Hydrogen (or other pure gas phase species) depressurization calculations This code is published under an MIT license. Install as simple as: pi

Anders Andreasen 13 Nov 26, 2022
Fit models to your data in Python with Sherpa.

Table of Contents Sherpa License How To Install Sherpa Using Anaconda Using pip Building from source History Release History Sherpa Sherpa is a modeli

134 Jan 07, 2023
Analysiscsv.py for extracting analysis and exporting as CSV

wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess

32 Apr 25, 2022
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

Matrix Profile Foundation 302 Dec 29, 2022
BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

BioMASS 22 Dec 27, 2022
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions.

About Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions. The tool provides rich data and a summary g

9 Nov 16, 2022
A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

Andre Pretorius 9 Sep 15, 2022
Handle, manipulate, and convert data with units in Python

unyt A package for handling numpy arrays with units. Often writing code that deals with data that has units can be confusing. A function might return

The yt project 304 Jan 02, 2023
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
AWS Glue ETL Code Samples

AWS Glue ETL Code Samples This repository has samples that demonstrate various aspects of the new AWS Glue service, as well as various AWS Glue utilit

AWS Samples 1.2k Jan 03, 2023
Powerful, efficient particle trajectory analysis in scientific Python.

freud Overview The freud Python library provides a simple, flexible, powerful set of tools for analyzing trajectories obtained from molecular dynamics

Glotzer Group 195 Dec 20, 2022
Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)

Spark-DeltaLake-Demo Reliable, Scalable Machine Learning (2022) This project was completed in an attempt to become better acquainted with the latest b

8 Mar 21, 2022