monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

Overview

monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture

monolish is a linear equation solver library that monolithically fuses variable data type, matrix structures, matrix data format, vendor specific data transfer APIs, and vendor specific numerical algebra libraries.


monolish let developer forget about:

  • Performance tuning
  • Processor differences which execute library (Intel CPU, NVIDIA GPU, AMD CPU, ARM CPU, NEC SX-Aurora TSUBASA, etc.)
  • Vendor specific data transfer APIs (host RAM to Device RAM)
  • Finding bottlenecks and performance benchmarks
  • The argument data type of matrix/vector operations
  • Matrix structures and storage formats
  • Cumbersome package dependency

License

Copyright 2021 RICOS Co. Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    Running the following simple program

    #include <iostream>
    #include <monolish_blas.hpp>
    
    int main() {
      monolish::matrix::Dense<double>x(2, 3, 0.0, 10.0);
      x.print_all();
      return 0;
    }
    
    $ g++ -O3 main.cpp -o main.out -lmonolish_cpu
    

    will produce results like this.

    [email protected]:/$ ./main.out
    1 1 5.27196
    1 2 2.82358 <--
    1 3 2.13893 <--
    2 1 9.72054
    2 2 2.82358 <--
    2 3 2.13893 <--
    [email protected]:/$ ./main.out
    1 1 5.3061
    1 2 9.75236
    1 3 7.15652
    2 1 5.28961
    2 2 2.05967
    2 3 0.59838
    [email protected]:/$ ./main.out
    1 1 9.33149 <--
    1 2 4.75639 <--
    1 3 8.71093 <--
    2 1 9.33149 <--
    2 2 4.75639 <--
    2 3 8.71093 <--
    

    The arrows (<--) indicate that the number is repeating.

    This is probably due to that the pseudo-random number generator does not split well when it is parallelized by OpenMP.

    https://github.com/ricosjp/monolish/blob/1b89942e869b7d0acd2d82b4c47baeba2fbdf3e6/src/utils/dense_constructor.cpp#L120-L127

    This may happen not only with Dense, but also with random constructors of other data structures.

    I tested this on docker image ghcr.io/ricosjp/monolish/mkl:0.14.1.

    opened by lotz84 5
  • impl. transpose matvec, matmul

    impl. transpose matvec, matmul

    I want to give modern and intuitive transposition information. But I have no idea how to implement it easily.

    First, we create the following function as a prototype

    matmul(A,B,C) // C=AB
    matmul_TNN(A, B, C); // C=A^TB
    matvec(A,x,y); // y = Ax
    matvec_T(A, x, y); // y=A^Tx
    

    This interface is not beautiful. However, it has the following advantages

    • It does not affect other functions.
    • Easy to trace with logger
    • Simple to implement FFI in the future.
    • When beautiful ideas appear in the future, these functions can be implemented wrapping it.
    opened by t-hishinuma 2
  • try -fopenmp-cuda-mode flag

    try -fopenmp-cuda-mode flag

    memo:

    Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions.

    https://clang.llvm.org/docs/OpenMPSupport.html#basic-support-for-cuda-devices

    opened by t-hishinuma 2
  • Reserch the effect of the level information of the performance of cusparse ILU precondition

    Reserch the effect of the level information of the performance of cusparse ILU precondition

    The level information may not improve the performance but spend extra time doing analysis. For example, a tridiagonal matrix has no parallelism. In this case, CUSPARSE_SOLVE_POLICY_NO_LEVEL performs better than CUSPARSE_SOLVE_POLICY_USE_LEVEL. If the user has an iterative solver, the best approach is to do csrsv2_analysis() with CUSPARSE_SOLVE_POLICY_USE_LEVEL once. Then do csrsv2_solve() with CUSPARSE_SOLVE_POLICY_NO_LEVEL in the first run and with CUSPARSE_SOLVE_POLICY_USE_LEVEL in the second run, picking faster one to perform the remaining iterations.

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    opened by t-hishinuma 2
  •  ignoring return value in test

    ignoring return value in test

    matrix_transpose.cpp:60:3: warning: ignoring return value of 'monolish::matrix::COO<Float>& monolish::matrix::COO<Float>::transpose() [with Float = double]', declared with attribute nodiscard [-Wunused-result]
       60 |   A.transpose();
    
    opened by t-hishinuma 2
  • Automatic deploy at release

    Automatic deploy at release

    impl. in github actions

    • [x] generate Doxyben (need to chenge version name)
    • [x] generate deb file
    • [x] generate monolish docker

    need to get version number...

    opened by t-hishinuma 2
  • write how to install nvidia-docker

    write how to install nvidia-docker

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee > /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt update -y
    sudo apt install -y nvidia-docker2
    sudo systemctl restart docker
    
    opened by t-hishinuma 1
  • Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    What conflicts?

    libomp.so is contained in both package

    How to resolve?

    • (a) Use libomp5-12 distributed by ubuntu
    • (b) Create another package of libomp in allgebra stage (libomp-allgebra)
    opened by termoshtt 1
  • resolve curse of type name in src/

    resolve curse of type name in src/

    In src/, int and size_t are written. When change the class or function declarations in include/, I don't want to rewrite src/. Use auto or decltype() to remove them.

    opened by t-hishinuma 1
  • LLVM OpenMP Offloading can be installed by apt?

    LLVM OpenMP Offloading can be installed by apt?

    docker run -it --gpus all -v $PWD:/work nvidia/cuda:11.7.0-devel-ubuntu22.04
    ==
    apt update -y
    apt install -y git intel-mkl cmake ninja-build ccache clang clang-tools libomp-14-dev gcc gfortran
    git config --global --add safe.directory /work
    cd /work; make gpu
    

    pass??

    opened by t-hishinuma 0
  • cusparse IC / ILU functions is deprecated

    cusparse IC / ILU functions is deprecated

    but, sample code of cusparse is not updated

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    I dont like trial and error, so wait for the sample code to be updated.

    opened by t-hishinuma 0
Releases(0.17.0)
Owner
RICOS Co. Ltd.
株式会社科学計算総合研究所 / Research Institute for Computational Science Co. Ltd.
RICOS Co. Ltd.
Lingtrain Alignment Studio is an ML based app for texts alignment on different languages.

Lingtrain Alignment Studio Intro Lingtrain Alignment Studio is the ML based app for accurate texts alignment on different languages. Extracts parallel

Sergei Averkiev 186 Jan 03, 2023
A simple guide to MLOps through ZenML and its various integrations.

ZenBytes Join our Slack Community and become part of the ZenML family Give the main ZenML repo a GitHub star to show your love ZenBytes is a series of

ZenML 127 Dec 27, 2022
MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

2 Aug 23, 2022
MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

57 Dec 21, 2022
Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

Log Labs 28 Nov 09, 2022
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch

LibTraffic is a unified, flexible and comprehensive traffic prediction library, which provides researchers with a credibly experimental tool and a convenient development framework. Our library is imp

432 Jan 05, 2023
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in

Backprop 227 Dec 10, 2022
Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022
ETNA is an easy-to-use time series forecasting framework.

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from

Tinkoff.AI 674 Jan 07, 2023
This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning. It is a Web Application.

Developer Junaid 3 Aug 04, 2022
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

154 Dec 17, 2022
Implementation of linesearch Optimization Algorithms in Python

Nonlinear Optimization Algorithms During my time as Scientific Assistant at the Karlsruhe Institute of Technology (Germany) I implemented various Opti

Paul 3 Dec 06, 2022
Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Neighbourhood Retrieval with Distance Correlation Assign Pseudo class labels to datapoints in the latent space. NNDC is a slim wrapper around FAISS. N

The Learning Machines 1 Jan 16, 2022
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Igor Ivanov 671 Dec 25, 2022
Toolss - Automatic installer of hacking tools (ONLY FOR TERMUKS!)

Tools Автоматический установщик хакерских утилит (ТОЛЬКО ДЛЯ ТЕРМУКС!) Оригиналь

14 Jan 05, 2023
A simple machine learning python sign language detection project.

SST Coursework 2022 About the app A python application that utilises the tensorflow object detection algorithm to achieve automatic detection of ameri

Xavier Koh 2 Jun 30, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022
Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

Chris Yuan 1 Feb 06, 2022
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstraction

ZenML 2.6k Jan 08, 2023