Python & Julia port of codes in excellent R books

Overview

X4DS

This repo is a collection of

Python & Julia port of codes in the following excellent R books:

Python Stack Julia Stack
Language
Version
v3.9 v1.7
Data
Processing
  • Pandas
  • DataFrames
  • Visualization
  • Matplotlib
  • Seaborn
  • MakiE
  • AlgebraOfGraphics
  • Machine
    Learning
  • Scikit-Learn
  • MLJ
  • Probablistic
    Programming
  • PyMC
  • Turing
  • Code Styles

    2.1. Basics

    • prefer enumerate() over range(len())
    xs = range(3)
    
    # good
    for ind, x in enumerate(xs):
      print(f'{ind}: {x}')
    
    # bad
    for i in range(len(xs)):
      print(f'{i}: {xs[i]}')

    2.2. Matplotlib

    including seaborn

    • prefer Axes object over Figure object
    • use constrained_layout=True when draw subplots
    # good
    _, axes = plt.subplots(1, 2, constrained_layout=True)
    axes[0].plot(x1, y1)
    axes[1].hist(x2, y2)
    
    # bad
    plt.subplot(121)
    plt.plot(x1, y1)
    plt.subplot(122)
    plt.hist(x2, y2)
    • prefer axes.flatten() over plt.subplot() in cases where subplots' data is iterable
    • prefer zip() or enumerate() over range() for iterable objects
    # good
    _, ax = plt.subplots(2, 2, figsize=[12,8],constrained_layout=True)
    
    for ax, x, y in zip(axes.flatten(), xs, ys):
      ax.plot(x, y)
    
    # bad
    for i in range(4):
      ax = plt.subplot(2, 2, i+1)
      ax.plot(x[i], y[i])
    • prefer set() method over set_*() method
    # good
    ax.set(xlabel='x', ylabel='y')
    
    # bad
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    • Prefer despine() over ax.spines[*].set_visible()
    # good
    sns.despine()
    
    # bad
    ax.spines["top"].set_visible(False)
    ax.spines["bottom"].set_visible(False)
    ax.spines["right"].set_visible(False)
    ax.spines["left"].set_visible(False)

    2.3. Pandas

    • prefer df['col'] over df.col
    # good
    movies['duration']
    
    # bad
    movies.duration
    • prefer df.query over df[] or df.loc[] in simple-selection
    # good
    movies.query('duration >= 200')
    
    # bad
    movies[movies['duration'] >= 200]
    movies.loc[movies['duration'] >= 200, :]
    • prefer df.loc and df.iloc over df[] in multiple-selection
    # good
    movies.loc[movies['duration'] >= 200, 'genre']
    movies.iloc[0:2, :]
    
    # bad
    movies[movies['duration'] >= 200].genre
    movies[0:2]

    LaTeX Styles

    Multiple lines

    Reduce the use of begin{array}...end{array}

    • equations: begin{aligned}...end{aligned}
    $$
    \begin{aligned}
    y_1 = x^2 + 2*x \\
    y_2 = x^3 + x
    \end{aligned}
    $$
    • equations with conditions: begin{cases}...end{cases}
    $$
    \begin{cases}
    y = x^2 + 2*x & x > 0 \\
    y = x^3 + x & x ≤ 0
    \end{cases}
    $$
    • matrix: begin{matrix}...end{matrix}
    $$
    \begin{vmatrix}
      a + a^′ & b + b^′ \\ c & d
      \end{vmatrix}= \begin{vmatrix}
      a & b \\ c & d
      \end{vmatrix} + \begin{vmatrix}
      a^′ & b^′ \\ c & d
    \end{vmatrix}
    $$

    Brackets

    • prefer \Bigg...\Bigg over \left...\right
    $$
    A\Bigg[v_1\ v_2\ \ v_r\Bigg]
    $$
    • prefer \underset{}{} over \underset{}
    $$
    \underset{θ}{\mathrm{argmax}}\ p(x_i|θ)
    $$

    Expressions

    • prefer ^{\top} over ^T for transpose

    $$ 𝐀^⊤ $$

    $$
    𝐀^{\top}
    $$
    • prefer \to over \rightarrow for limit

    $$ \lim_{n → ∞} $$

    $$
    \lim_{n\to \infty}
    $$
    • prefer underset{}{} over \limits_

    $$ \underset{w}{\rm argmin}\ (wx +b) $$

    $$
    \underset{w}{\rm argmin}\ (wx +b)
    $$

    Fonts

    • prefer \mathrm over \mathop or \operatorname
    $$
    θ_{\mathrm{MLE}}=\underset{θ}{\mathrm{argmax}}\ ∑_{i = 1}^{N}\log p(x_i|θ)
    $$

    ISLR

    References

    style <style> table { border-collapse: collapse; text-align: center; } </style>
    Owner
    Gitony
    Gitony
    Automatic data visualization in atom with the nteract data-explorer

    Data Explorer Interactively explore your data directly in atom with hydrogen! The nteract data-explorer provides automatic data visualization, so you

    Ben Russert 65 Dec 01, 2022
    flask extension for integration with the awesome pydantic package

    Flask-Pydantic Flask extension for integration of the awesome pydantic package with Flask. Installation python3 -m pip install Flask-Pydantic Basics v

    249 Jan 06, 2023
    a plottling library for python, based on D3

    Hello August 2013 Hello! Maybe you're looking for a nice Python interface to build interactive, javascript based plots that look as nice as all those

    Mike Dewar 1.4k Dec 28, 2022
    Graphical visualizer for spectralyze by Lauchmelder23

    spectralyze visualizer Graphical visualizer for spectralyze by Lauchmelder23 Install Install matplotlib and ffmpeg. Put ffmpeg.exe in same folder as v

    Matthew 1 Dec 21, 2021
    Investment and risk technologies maintained by Fortitudo Technologies.

    Fortitudo Technologies Open Source This package allows you to freely explore open-source implementations of some of our fundamental technologies under

    Fortitudo Technologies 11 Dec 14, 2022
    NumPy and Pandas interface to Big Data

    Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

    Blaze 3.1k Jan 01, 2023
    PolytopeSampler is a Matlab implementation of constrained Riemannian Hamiltonian Monte Carlo for sampling from high dimensional disributions on polytopes

    PolytopeSampler PolytopeSampler is a Matlab implementation of constrained Riemannian Hamiltonian Monte Carlo for sampling from high dimensional disrib

    9 Sep 26, 2022
    Python package for hypergraph analysis and visualization.

    The HyperNetX library provides classes and methods for the analysis and visualization of complex network data. HyperNetX uses data structures designed to represent set systems containing nested data

    Pacific Northwest National Laboratory 304 Dec 27, 2022
    This is a place where I'm playing around with pandas to analyze data in a csv/excel file.

    pandas-csv-excel-analysis This is a place where I'm playing around with pandas to analyze data in a csv/excel file. 0-start A very simple cheat sheet

    Chuqin 3 Oct 05, 2022
    Data parsing and validation using Python type hints

    pydantic Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. De

    Samuel Colvin 12.1k Jan 06, 2023
    Python module for drawing and rendering beautiful atoms and molecules using Blender.

    Batoms is a Python package for editing and rendering atoms and molecules objects using blender. A Python interface that allows for automating workflows.

    Xing Wang 1 Jul 06, 2022
    Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects

    carcassonne_tools Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects NOTE NOTE NOTE The

    1 Nov 08, 2021
    Plotting library for IPython/Jupyter notebooks

    bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

    3.4k Dec 30, 2022
    Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization

    Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization

    Glumpy 1.1k Jan 05, 2023
    The Python ensemble sampling toolkit for affine-invariant MCMC

    emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

    Dan Foreman-Mackey 1.3k Jan 04, 2023
    A Python function that makes flower plots.

    Flower plot A Python 3.9+ function that makes flower plots. Installation This package requires at least Python 3.9. pip install

    Thomas Roder 4 Jun 12, 2022
    LinkedIn connections analyzer

    LinkedIn Connections Analyzer 🔗 https://linkedin-analzyer.herokuapp.com Hey hey 👋 , welcome to my LinkedIn connections analyzer. I recently found ou

    Okkar Min 5 Sep 13, 2022
    Parallel t-SNE implementation with Python and Torch wrappers.

    Multicore t-SNE This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also wo

    Dmitry Ulyanov 1.7k Jan 09, 2023
    Tools for calculating and visualizing Elo-like ratings of MLB teams using Retosheet data

    Overview This project uses historical baseball games data to calculate an Elo-like rating for MLB teams based on regular season match ups. The Elo rat

    Lukas Owens 0 Aug 25, 2021
    SummVis is an interactive visualization tool for text summarization.

    SummVis is an interactive visualization tool for analyzing abstractive summarization model outputs and datasets.

    Robustness Gym 246 Dec 08, 2022