Epidemiology analysis package

Overview

zepid

zEpid

PyPI version Build Status Documentation Status Join the chat at https://gitter.im/zEpid/community

zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is to provide a toolset to make epidemiology e-z. A variety of calculations and plots can be generated through various functions. For a sample walkthrough of what this library is capable of, please look at the tutorials available at https://github.com/pzivich/Python-for-Epidemiologists

A few highlights: basic epidemiology calculations, easily create functional form assessment plots, easily create effect measure plots, and causal inference tools. Implemented estimators include; inverse probability of treatment weights, inverse probability of censoring weights, inverse probabilitiy of missing weights, augmented inverse probability of treatment weights, time-fixed g-formula, Monte Carlo g-formula, Iterative conditional g-formula, and targeted maximum likelihood (TMLE). Additionally, generalizability/transportability tools are available including; inverse probability of sampling weights, g-transport formula, and doubly robust generalizability/transportability formulas.

If you have any requests for items to be included, please contact me and I will work on adding any requested features. You can contact me either through GitHub (https://github.com/pzivich), email (gmail: zepidpy), or twitter (@zepidpy).

Installation

Installing:

You can install zEpid using pip install zepid

Dependencies:

pandas >= 0.18.0, numpy, statsmodels >= 0.7.0, matplotlib >= 2.0, scipy, tabulate

Module Features

Measures

Calculate measures directly from a pandas dataframe object. Implemented measures include; risk ratio, risk difference, odds ratio, incidence rate ratio, incidence rate difference, number needed to treat, sensitivity, specificity, population attributable fraction, attributable community risk

Measures can be directly calculated from a pandas DataFrame object or using summary data.

Other handy features include; splines, Table 1 generator, interaction contrast, interaction contrast ratio, positive predictive value, negative predictive value, screening cost analyzer, counternull p-values, convert odds to proportions, convert proportions to odds

For guided tutorials with Jupyter Notebooks: https://github.com/pzivich/Python-for-Epidemiologists/blob/master/3_Epidemiology_Analysis/a_basics/1_basic_measures.ipynb

Graphics

Uses matplotlib in the background to generate some useful plots. Implemented plots include; functional form assessment (with statsmodels output), p-value function plots, spaghetti plot, effect measure plot (forest plot), receiver-operator curve, dynamic risk plots, and L'Abbe plots

For examples see: http://zepid.readthedocs.io/en/latest/Graphics.html

Causal

The causal branch includes various estimators for causal inference with observational data. Details on currently implemented estimators are below:

G-Computation Algorithm

Current implementation includes; time-fixed exposure g-formula, Monte Carlo g-formula, and iterative conditional g-formula

Inverse Probability Weights

Current implementation includes; IP Treatment W, IP Censoring W, IP Missing W. Diagnostics are also available for IPTW. IPMW supports monotone missing data

Augmented Inverse Probability Weights

Current implementation includes the augmented-IPTW estimator described by Funk et al 2011 AJE

Targeted Maximum Likelihood Estimator

TMLE can be estimated through standard logistic regression model, or through user-input functions. Alternatively, users can input machine learning algorithms to estimate probabilities. Supported machine learning algorithms include sklearn

Generalizability / Transportability

For generalizing results or transporting to a different target population, several estimators are available. These include inverse probability of sampling weights, g-transport formula, and doubly robust formulas

Tutorials for the usage of these estimators are available at: https://github.com/pzivich/Python-for-Epidemiologists/tree/master/3_Epidemiology_Analysis/c_causal_inference

G-estimation of Structural Nested Mean Models

Single time-point g-estimation of structural nested mean models are supported.

Sensitivity Analyses

Includes trapezoidal distribution generator, corrected Risk Ratio

Tutorials are available at: https://github.com/pzivich/Python-for-Epidemiologists/tree/master/3_Epidemiology_Analysis/d_sensitivity_analyses

Comments
  • Confusing about the time in MonteCarloGFormula

    Confusing about the time in MonteCarloGFormula

    Hi,I am confusing about how to use the time and history data. It seems that we should make a model with data of questionnaires k-1 and k-2, but I couldn't find such code implementation. I found you used data of time k only while fitting you model.

    #176 exposure_model #207 outcome_model #274 add_covariate_model

    question 
    opened by Jnnocent 14
  • SingleCrossFit `invalid value encountered in log`

    SingleCrossFit `invalid value encountered in log`

    @pzivich, When using Singlecrossfit TMLE for a continuous outcome with sm.Gaussian GLM class. I have encountered the following error:

    xxx/lib/python3.7/site-packages/zepid/causal/doublyrobust/crossfit.py:1663: RuntimeWarning: invalid value encountered in log
      log = sm.GLM(ys, np.column_stack((h1ws, h0ws)), offset=np.log(probability_to_odds(py_os)),
    xxx/lib/python3.7/site-packages/zepid/causal/doublyrobust/crossfit.py:1669: RuntimeWarning: invalid value encountered in log
      ystar0 = np.append(ystar0, logistic.cdf(np.log(probability_to_odds(py_ns)) - epsilon[1] / pa0s))
    

    Here is how I defined the estimator for superleaner as well as the parameter input.

    link_i = sm.genmod.families.links.identity()
    SL_glm = GLMSL(family = sm.families.family.Gaussian(link=link_i))
    GLMSL(family = sm.families.family.Binomial())
    
    sctmle = SingleCrossfitTMLE(dataset = df, exposure='treatment', outcome='paid_amt', continuous_bound = 0.01)
    sctmle.exposure_model('gender_cd_F + prospective_risk + age_nbr', GLMSL(family = sm.families.family.Binomial()), bound=[0.01, 0.99])
    sctmle.outcome_model('gender_cd_F + prospective_risk + age_nbr', SL_glm)
    sctmle.fit(n_splits = 2, n_partitions=3, random_state=12345, method = 'median')
    sctmle.summary()
    

    If I uses any other ML estimates such as Lasso, GBM, RandomForest from Sklearn for outcome model estimator, it will work fine. The error only related to use of GLMSL family.

    Could you share any idea of the reason of this error and how I can fix this issue? Much appreciated!

    opened by miaow27 8
  • Add G-formula

    Add G-formula

    One lofty goal is to implement the G-formula. Would need to code two versions; time-fixed and time-varying. The Chapter by Robins & Hernan is good reference. I have code that implements the g-formula using pandas. It is reasonably fast.

    TODO: generalize to a class, allow input models then predict, need to determine how to allow users to input custom treatment regimes (all/none/natural course are easy to do), compare results (https://www.ncbi.nlm.nih.gov/pubmed/25140837)

    Time-fixed version will be relatively easy to write up

    Time-varying will need the ability to specify a large amount of models and specify the order in which the models are fit.

    Note; I am also considering reorganizing in v0.2.0 that IPW/g-formula/doubly robust will all be contained within a folder caused causal, rather than adding to the current ipw folder

    enhancement 
    opened by pzivich 8
  • generalize branch

    generalize branch

    In the future, I think a zepid.causal.generalize branch would be a useful addition. This branch would contain some generalizability and transportability tools. Specifically, the g-transport formula, inverse probability of sampling weights, inverse odds of sampling weights, and doubly robust generalizability.

    Generally, I think I can repurpose a fair bit of the existing code. I need to consider how best to handle the distinction between generalizability (sample from target population) and transportability (sample not from target population). I am imagining that the user would pass in two data sets, the data to estimate the model on, and the other data set to generalize to.

    As far as I know, these resources are largely lacking in all other commonly used epi softwares. Plus this is becoming an increasingly hot topic in epi (and I think it will catch on more widely once people recognize you can go from your biased sample to a full population under an additional set of assumptions)

    Resources:

    • g-transport and IPSW estimators: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5466356/

    • inverse odds of sampling: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860052/

    • Doubly Robust estimator: https://arxiv.org/pdf/1902.06080.pdf

    • General introduction: https://arxiv.org/pdf/1805.00550.pdf

    Notes: Some record of Issa's presentation at CIRG. This is the more difficult doubly robust estimator. It is when only a subset of the cohort has some measure needed for transportability. Rather than throwing out the individual who don't have X2 measures, you can use the process in the arXiv paper. For the nested-nested population, the robust estimator has three parts. b_a(X_1, S) is harder to estimate but you can use the following algorithm

    1. model Y as a function of X={X1, X2} among S=1 and A=a

    2. Predict \hat{Y} among those with D=1

    3. Model \hat{Y} as X1, S in D=1

    4. Predict \hat{Y*} in D={0, 1}

    Also hard to estimate Pr(S=1|X) becuase X2 only observed for subset. Can use m-estimator to obtain. Can do this by a weighted regression with 1 for D=1 & S=1 and 1/C(X1, S) for D=1 & S=0. This is a little less obvious to me but seems doable

    enhancement Short-term Causal inference 
    opened by pzivich 7
  • TMLE & Machine Learning

    TMLE & Machine Learning

    TMLE is not guaranteed to attain nominal coverage when used with machine learning. A simulation paper showing major problems is: https://arxiv.org/abs/1711.07137 As a result, I don't feel like TMLE can continue to be supported with machine learning, especially since it implies the confidence intervals are way too narrow (sometimes resulting in 0% coverage). I know this is a divergence from R's tmleverse, but I would rather enforce the best practice/standards than allow incorrect use of methods

    Due to this issue, I will be dropping support for TMLE with machine learning. In place of this, I plan on adding CrossfitTMLE which will support machine learning approaches. The crossfitting will result in valid confidence intervals / inference.

    Tentative plan:

    • In v0.8.0, TMLE will throw a warning when using the custom_model argument.

    • Once the Crossfit-AIPW and Crossfit-TMLE are available (v0.9.0), TMLE will lose that functionality. If users want to use TMLE with machine learning, they will need to use a prior version

    bug change Short-term Causal inference 
    opened by pzivich 6
  • G-estimation of Structural Nested Models

    G-estimation of Structural Nested Models

    Add SNM to the zepid.causal branch. After this addition, all of Robin's g-methods will be implemented in zEpid.

    SNM are discussed in the Causal Inference book (https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) and The Chapter. SAS code for search-based and closed-form solvers is available at the site. Ideally will have both implemented. Will start with time-fixed estimator

    enhancement Long-term wishlist Causal inference 
    opened by pzivich 6
  • v0.8.0

    v0.8.0

    Update for version 0.8.0. Below are listed the planned additions

    • [x] ~Update how the weights argument works in applicable causal models (IPTW, AIPW, g-formula)~ #106 No longer using this approach

    Inverse Probability of Treatment Weights

    • [x] Changing entire structure #102

    • [x] Figure out why new structure is giving slightly different values for positivity calculations...

    • [x] Add g-bounds option to truncate weights

    • [x] Update tests for new structure

    • [x] Weight argument behaves slightly different (diagnostics now available for either IPTW alone or with other weights)

    • [x] New summary function for results

    • [x] ~Allowing for continuous treatments in estimation of MSM~ ...for later

    • [x] ~Plots available for binary or continuous treatments~ ...for later

    Inverse Probability of Censoring Weights

    • [x] ~Correction for pooled logistic weight calculation with late-entry occurring~ Raise ValueError if late-entry is detected. The user will need to do some additional work

    • [x] Create better documentation for when late-entry occurs for this model

    G-formula

    • [x] Add diagnostics (prediction accuracy of model)

    • [x] Add run_diagnostics()

    Augmented IPTW

    • [x] Add g-bounds

    • [x] Add diagnostics for weights and outcome model

    • [x] Add run_diagnostics()

    TMLE

    • [x] New warning when using machine learning algorithms to estimate nuisance functions #109

    • [x] Add diagnostics for weights and outcome model

    • [x] Add run_diagnostics()

    S-value

    • [x] Add calculator for S-values, a (potentially) more informative measure than p-values #107

    ReadTheDocs Documentation

    • [x] Add S-value

    • [x] Update IPTW

    • [x] Make sure run_diagnostics() and bound are sufficiently explained

    opened by pzivich 5
  • refactor spline so an anonymous function can be returned for use elsewhere

    refactor spline so an anonymous function can be returned for use elsewhere

    Previously my code might look like:

    rossi_with_splines[['age0', 'age1']] = spline(rossi_with_splines, var='age', term=2, restricted=True)
    cph = CoxPHFitter().fit(rossi_with_splines.drop('age', axis=1), 'week', 'arrest')
    
    # this part is nasty
    df =rossi_with_splines.drop_duplicates('age').sort_values('age')[['age', 'age0', 'age1']].set_index('age')
    (df * cph.hazards_[['age0', 'age1']].values).sum(1).plot()
    

    vs

    spline_transform, _ = create_spline_transform(df['age'], term=2, restricted=True)
    rossi_with_splines[['age0', 'age1']] = spline_transform(rossi_with_splines['age'])
    
    cph = CoxPHFitter().fit(rossi_with_splines.drop('age', axis=1), 'week', 'arrest')
    
    ages_to_consider = np.arange(20, 45))
    y = spline_transform(ages_to_consider).dot(cph.hazards_[['age0', 'age1']].values)
    plot(ages_to_consider, y)
    
    opened by CamDavidsonPilon 5
  • v0.5.0

    v0.5.0

    Version 0.5.0

    Features to be implemented:

    • [x] Replace AIPW with the more specific AIPTW #57

    • [x] Add support for monotone IPMW #55

    • [ ] ~~Add support for nonmonotone IPMW #55~~ As I have read further into this, it gets a little complicated (even for the unconditional scenario). Will save for later implementation

    • [ ] Add support for continuous treatments in TimeFixedGFormula #49

    • [ ] ~~Add stratify option to measures #56~~

    • [x] TMLE with continuous outcomes #39

    • [x] TMLE with missing data #39 (only applies to missing outcome data)

    • [ ] ~~Add support for stochastic interventions into TMLE #52~~ Above two changes to TMLE will take precedence. Stochastic treatments are to be added later

    • [ ] ~~Add support for permutation weighting (TBD depending on complexity)~~ Will open a new branch for this project. No idea on how long implementation may take

    • [x] Incorporate random error in MonteCarloRR

    Maintenance

    • [x] Add Python 3.7 support

    • [x] Check to see if matplotlib 3 breaks anything. LGTM via test_graphics_manual.py

    • [x] Magic-g warning updates for g-formula #63

    opened by pzivich 5
  • Add interference

    Add interference

    Later addition, but since statsmodels 0.9.0 has GLIMMIX, I would like to add something to deal with interference for the causal branch. I don't have any part of this worked out, so I will need to take some time to really learn what is happening in these papers

    References: https://www.ncbi.nlm.nih.gov/pubmed/21068053 https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12184 https://github.com/bsaul/inferference

    Branch plan:

    ---causal
          |
          -interference
    

    Verification: inferference the R package has some datasets that I can compare results with

    Other: Will need to update requirements to need statsmodels 0.9.0

    enhancement Long-term wishlist Causal inference 
    opened by pzivich 5
  • Enhancements to Monte-Carlo g-formula

    Enhancements to Monte-Carlo g-formula

    As noted in #73 and #77 there are some further optional enhancements I can add to MonteCarloGFormula

    Items to add:

    • [x] Censoring model

    • [ ] Competing risk model

    Testing:

    • [x] Test censoring model works as intended (compare to Keil 2014)

    • [ ] Test competing risks. May be easiest to simulate up a quick data set to compare. Don't have anything on hand

    The updates to Monte-Carlo g-formula will be added to a future update (haven't decided which version they will make it into)

    Optional:

    • [x] Reduce memory burden of unneeded replicants

    I sometimes run into a MemoryError when replicating Keil et al 2014 with many resamples. A potential way out of this is to "throw away" the observations that are not the final observation for that individual. Can add option low_memory=True to throw out those unnecessary observations. User could return the full simulated dataframe with False.

    enhancement 
    opened by pzivich 4
  • Unable to install latest 0.9.0 version through pip

    Unable to install latest 0.9.0 version through pip

    Using the latest version of pip 22.2.2 I am unable to install the most recent zEpid 0.9.0 release on python 3.7.0

    pip install -Iv zepid==0.9.0

    ERROR: Could not find a version that satisfies the requirement zepid==0.9.0 (from versions: 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.5, 0.1.6, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.8.2) ERROR: No matching distribution found for zepid==0.9.0

    opened by aidanberry12 7
  • Saving DAGs programatically

    Saving DAGs programatically

    I had corresponded with @pzivich over email and am posting our communication here for the benefit of other users.

    JD.

    Is it possible to program saving figures of directed acyclic graphs (DAGs) using zEpid? E.g. using the M-bias DAG code in the docs, typing plt.savefig('dag.png') only saves a blank PNG. To save it to disk, I'd need to plot the figure then manually click-and-point on the pop-up to save it.

    PZ.

    Unfortunately, saving the DAGs draw isn't too easy. In the background, I use NetworkX to organize and plot the diagram. NetworkX uses matplotlib, but it doesn't return the matplotlib axes object. So while you can tweak parts of the graph in various ways, NetworkX doesn't allow you to directly access the drawn part of the image. Normally, this isn't a problem but when it gets wrapped up in a class object that returns the matplotlib axes (which is what DirectedAcyclicGraph. draw_dag(...) does) it can lead to the issues you noted.

    Currently, the best work-around is to generate the image by hand. Below is some code that should do the trick to match what is output by DirectedAcyclicGraph

    import networkx as nx
    import matplotlib.pyplot as plt
    from zepid.causal.causalgraph import DirectedAcyclicGraph
    
    dag = DirectedAcyclicGraph(exposure='X', outcome="Y")
    dag.add_arrows((('X', 'Y'),
                    ('U1', 'X'), ('U1', 'B'),
                    ('U2', 'B'), ('U2', 'Y')
                   ))
    
    fig = plt.figure(figsize=(6, 5))
    ax = plt.subplot(1, 1, 1)
    positions = nx.spectral_layout(dag.dag)
    nx.draw_networkx(dag.dag, positions, node_color="#d3d3d3", node_size=1000, edge_color='black',
                     linewidths=1.0, width=1.5, arrowsize=15, ax=ax, font_size=12)
    plt.axis('off')
    plt.savefig("filename.png", format='png', dpi=300)
    plt.close()
    
    

    Thanks Paul for the advice!

    For longer term, it seems useful to build this or something similar into zEpid graphics to programatically save (complex) DAGs in Python for publication. Possibly using position values from DAGs generated in dagitty, which is handy to quickly graph and analyse complex DAGs. Just a thought.

    Cheers

    opened by joannadiong 11
  • Add Odds Ratio and other estimands for AIPTW and TMLE

    Add Odds Ratio and other estimands for AIPTW and TMLE

    Currently AIPTW only returns RD and RR. TMLE returns those and OR as well. I should add support for OR with AIPTW (even though I am not a huge fan of OR when we have nicer estimands)

    I should also add support for all / none, and things like ATT and ATU for TMLE and AIPTW both. Basically I need to look up the influence curve info in the TMLE book(s)

    enhancement Causal inference 
    opened by pzivich 0
  • MonteCarloGFormula

    MonteCarloGFormula

    Currently you need to set the np.random.seed outside of the function for reproducibility (which isn't good). I should use a similar RandomState approach that the cross-fit estimators use

    bug Causal inference 
    opened by pzivich 0
  • Update documentation (and possibly re-organize)

    Update documentation (and possibly re-organize)

    I wrote most of the ReadTheDocs documentation 2-3 years ago now. It is dated (and my understanding has expanded), so I should go back and review everything after the v0.9.0 release

    Here are some things to consider

    • Use a different split than time-fixed and time-varying exposures
    • Add a futures section (rather than having embedded in documents)
    • Update the LIPTW / SIPTW info (once done)
    • Replace Chat Gitter button with GitHub Discussions
    • Add SuperLearner page to docs
    enhancement help wanted Website 
    opened by pzivich 2
Releases(latest-version)
  • latest-version(Oct 23, 2022)

  • v0.9.0(Dec 30, 2020)

    v0.9.0

    The 0.9.x series drops support of Python 3.5.x. Only Python 3.6+ are now supported. Support has also been added for Python 3.8

    Cross-fit estimators have been implemented for better causal inference with machine learning. Cross-fit estimators include SingleCrossfitAIPTW, DoubleCrossfitAIPTW, SingleCrossfitTMLE, and DoubleCrossfitTMLE. Currently functionality is limited to treatment and outcome nuisance models only (i.e. no model for missing data). These estimators also do not accept weighted data (since most of sklearn does not support weights)

    Super-learner functionality has been added via SuperLearner. Additions also include emprical mean (EmpiricalMeanSL), generalized linear model (GLMSL), and step-wise backward/forward selection via AIC (StepwiseSL). These new estimators are wrappers that are compatible with SuperLearner and mimic some of the R superlearner functionality.

    Directed Acyclic Graphs have been added via DirectedAcyclicGraph. These analyze the graph for sufficient adjustment sets, and can be used to display the graph. These rely on an optional NetworkX dependency.

    AIPTW now supports the custom_model optional argument for user-input models. This is the same as TMLE now.

    zipper_plot function for creating zipper plots has been added.

    Housekeeping: bound has been updated to new procedure, updated how print_results displays to be uniform, created function to check missingness of input data in causal estimators, added warning regarding ATT and ATU variance for IPTW, and added back observation IDs for MonteCarloGFormula

    Future plans: TimeFixedGFormula will be deprecated in favor of two estimators with different labels. This will more clearly delineate ATE versus stochastic effects. The replacement estimators are to be added

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Oct 3, 2019)

    Added support for pygam's LogisticGAM for TMLE with custom models (Thanks darrenreger!)

    Removed warning for TMLE with custom models following updates to Issue #109 I plan on creating a smarter warning system that flags non-Donsker class machine learning algorithms and warns the user. I still need to think through how to do this.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jul 17, 2019)

    v0.8.0

    Major changes to IPTW. IPTW now supports calculation of a marginal structural model directly.

    Greater support for censored data in IPTW, AIPTW, and GEstimationSNM

    Addition of s-values

    Source code(tar.gz)
    Source code(zip)
  • v0.7.2(May 19, 2019)

  • v0.7.1(May 3, 2019)

  • v0.6.0(Mar 31, 2019)

    MonteCarloGFormula now includes a separate censoring_model() function for informative censoring. Additionally, I added a low memory option to reduce the memory burden during the Monte-Carlo procedure

    IterativeCondGFormula has been refactored to accept only data in a wide format. This allows for me to handle more complex treatment assignments and specify models correctly. Additional tests have been added comparing to R's ltmle

    There is a new branch in zepid.causal. This is the generalize branch. It contains various tools for generalizing or transporting estimates from a biased sample to the target population of interest. Options available are inverse probability of sampling weights for generalizability (IPSW), inverse odds of sampling weights for transportability (IPSW), the g-transport formula (GTransportFormula), and doubly-robust augmented inverse probability of sampling weights (AIPSW)

    RiskDifference now calculates the Frechet probability bounds

    TMLE now allows for specified bounds on the Q-model predictions. Additionally, avoids error when predicted continuous values are outside the bounded values.

    AIPTW now has confidence intervals for the risk difference based on influence curves

    spline now uses numpy.percentile to allow for older versions of NumPy. Additionally, new function create_spline_transform returns a general function for splines, which can be used within other functions

    Lots of documentation updates for all functions. Additionally, summary() functions are starting to be updated. Currently, only stylistic changes

    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Feb 8, 2019)

  • v0.3.2(Nov 5, 2018)

    MAJOR CHANGES:

    TMLE now allows estimation of risk ratios and odds ratios. Estimation procedure is based on tmle.R

    TMLE variance formula has been modified to match tmle.R rather than other resources. This is beneficial for future implementation of missing data adjustment. Also would allow for mediation analysis with TMLE (not a priority for me at this time).

    TMLE now includes an option to place bounds on predicted probabilities using the bound option. Default is to use all predicted probabilities. Either symmetrical or asymmetrical truncation can be specified.

    TimeFixedGFormula now allows weighted data as an input. For example, IPMW can be integrated into the time-fixed g-formula estimation. Estimation for weighted data uses statsmodels GEE. As a result of the difference between GLM and GEE, the check of the number of dropped data was removed.

    TimeVaryGFormula now allows weighted data as an input. For example, Sampling weights can be integrated into the time-fixed g-formula estimation. Estimation for weighted data uses statsmodels GEE.

    MINOR CHANGES:

    Added Sciatica Trial data set. Mertens, BJA, Jacobs, WCH, Brand, R, and Peul, WC. Assessment of patient-specific surgery effect based on weighted estimation and propensity scoring in the re-analysis of the Sciatica Trial. PLOS One 2014. Future plan is to replicate this analysis if possible.

    Added data from Freireich EJ et al., "The Effect of 6-Mercaptopurine on the Duration of Steriod-induced Remissions in Acute Leukemia: A Model for Evaluation of Other Potentially Useful Therapy" Blood 1963

    TMLE now allows general sklearn algorithms. Fixed issue where predict_proba() is used to generate probabilities within sklearn rather than predict. Looking at this, I am probably going to clean up the logic behind this and the rest of custom_model functionality in the future

    AIPW object now contains risk_difference and risk_ratio to match RiskRatio and RiskDifference classes

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Aug 27, 2018)

  • v0.2.1(Aug 13, 2018)

  • v0.2.0(Aug 7, 2018)

    BIG CHANGES:

    IPW all moved to zepid.causal.ipw. zepid.ipw is no longer supported

    IPTW, IPCW, IPMW are now their own classes rather than functions. This was done since diagnostics are easier for IPTW and the user can access items directly from the models this way.

    Addition of TimeVaryGFormula to fit the g-formula for time-varying exposures/confounders

    effect_measure_plot() is now EffectMeasurePlot() to conform to PEP

    ROC_curve() is now roc(). Also 'probability' was changed to 'threshold', since it now allows any continuous variable for threshold determinations

    MINOR CHANGES:

    Added sensitivity analysis as proposed by Fox et al. 2005 (MonteCarloRR)

    Updated Sensitivity and Specificity functionality. Added Diagnostics, which calculates both sensitivity and specificity.

    Updated dynamic risk plots to avoid merging warning. Input timeline is converted to a integer (x100000), merged, then back converted

    Updated spline to use np.where rather than list comprehension

    Summary data calculators are now within zepid.calc.utils

    FUTURE CHANGES:

    All pandas effect/association measure calculations will be migrating from functions to classes in a future version. This will better meet PEP syntax guidelines and allow users to extract elements/print results. Still deciding on the setup for this... No changes are coming to summary measure calculators (aside from possibly name changes). Intended as part of v0.3.0

    Addition of Targeted Maximum Likelihood Estimation (TMLE). No current timeline developed

    Addition of IPW for Interference settings. No current timeline but hopefully before 2018 ends

    Further conforming to PEP guidelines (my bad)

    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Jul 16, 2018)

    See CHANGELOG for the full list of details

    Briefly,

    Added causal branch Added time-fixed g-formula Added double-robust estimator Updated some fixes to errors

    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Jul 11, 2018)

  • v0.1.3(Jul 2, 2018)

  • v0.1.2(Jun 25, 2018)

Owner
Paul Zivich
Epidemiology post-doc working in epidemiologic methods and infectious diseases.
Paul Zivich
A pytorch implementation of faster RCNN detection framework (Use detectron2, it's a masterpiece)

Notice(2019.11.2) This repo was built back two years ago when there were no pytorch detection implementation that can achieve reasonable performance.

Ruotian(RT) Luo 1.8k Jan 01, 2023
harmonic-percussive-residual separation algorithm wrapped as a VST3 plugin (iPlug2)

Harmonic-percussive-residual separation plug-in This work is a study on the plausibility of a sines-transients-noise decomposition inspired algorithm

Derp Learning 9 Sep 01, 2022
Jiminy Cricket Environment (NeurIPS 2021)

Jiminy Cricket This is the repository for "What Would Jiminy Cricket Do? Towards Agents That Behave Morally" by Dan Hendrycks*, Mantas Mazeika*, Andy

Dan Hendrycks 15 Aug 29, 2022
Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

UncertaintyAwareCycleConsistency This repository provides the building blocks and the API for the work presented in the NeurIPS'21 paper Robustness vi

EML Tübingen 19 Dec 12, 2022
EXplainable Artificial Intelligence (XAI)

EXplainable Artificial Intelligence (XAI) This repository includes the codes for different projects on eXplainable Artificial Intelligence (XAI) by th

4 Nov 28, 2022
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 04, 2023
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 29, 2022
Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Google Cloud Platform 792 Dec 28, 2022
The hippynn python package - a modular library for atomistic machine learning with pytorch.

The hippynn python package - a modular library for atomistic machine learning with pytorch. We aim to provide a powerful library for the training of a

Los Alamos National Laboratory 37 Dec 29, 2022
COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

COPA-SSE Repository for COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning. COPA-SSE contains crowdsourced explanations for the Balanced

Ana Brassard 5 Jul 31, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 09, 2023
9th place solution in "Santa 2020 - The Candy Cane Contest"

Santa 2020 - The Candy Cane Contest My solution in this Kaggle competition "Santa 2020 - The Candy Cane Contest", 9th place. Basic Strategy In this co

toshi_k 22 Nov 26, 2021
Very Deep Convolutional Networks for Large-Scale Image Recognition

pytorch-vgg Some scripts to convert the VGG-16 and VGG-19 models [1] from Caffe to PyTorch. The converted models can be used with the PyTorch model zo

Justin Johnson 217 Dec 05, 2022
MLJetReconstruction - using machine learning to reconstruct jets for CMS

MLJetReconstruction - using machine learning to reconstruct jets for CMS The C++ data extraction code used here was based heavily on that foundv here.

ALPhA Davidson 0 Nov 17, 2021
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network This repository is the official implementation of MatchGAN: A S

Justin Sun 12 Dec 27, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021