Tools for computational pathology

Overview

tests Documentation Status Code style: black PyPI version Downloads codecov

A toolkit for computational pathology and machine learning.

View documentation

Please cite our paper

Installation

There are several ways to install PathML:

  1. pip install (recommended for users)
  2. clone repo to local machine and install from source (recommended for developers/contributors)

Options (1) and (2) require that you first install all external dependencies:

  • openslide
  • JDK 8

We recommend using conda for environment management. Download Miniconda here

Note: these instructions are for Linux. Commands may be different for other platforms.

Installation option 1: pip install

Create conda environment

conda create --name pathml python=3.8
conda activate pathml

Install external dependencies (Linux) with Apt

sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev

Install external dependencies (MacOS) with Brew

brew install openslide

Install OpenJDK 8

conda install openjdk==8.0.152

Optionally install CUDA (instructions here)

Install PathML

pip install pathml

Installation option 2: clone repo and install from source

Clone repo

git clone https://github.com/Dana-Farber-AIOS/pathml.git
cd pathml

Create conda environment

conda env create -f environment.yml
conda activate pathml

Optionally install CUDA (instructions here)

Install PathML:

pip install -e .

CUDA

To use GPU acceleration for model training or other tasks, you must install CUDA. This guide should work, but for the most up-to-date instructions, refer to the official PyTorch installation instructions.

Check the version of CUDA:

nvidia-smi

Install correct version of cudatoolkit:

# update this command with your CUDA version number
conda install cudatoolkit=11.0

After installing PyTorch, optionally verify successful PyTorch installation with CUDA support:

python -c "import torch; print(torch.cuda.is_available())"

Using with Jupyter

Jupyter notebooks are a convenient way to work interactively. To use PathML in Jupyter notebooks:

Set JAVA_HOME environment variable

PathML relies on Java to enable support for reading a wide range of file formats. Before using PathML in Jupyter, you may need to manually set the JAVA_HOME environment variable specifying the path to Java. To do so:

  1. Get the path to Java by running echo $JAVA_HOME in the terminal in your pathml conda environment (outside of Jupyter)
  2. Set that path as the JAVA_HOME environment variable in Jupyter:
    import os
    os.environ["JAVA_HOME"] = "/opt/conda/envs/pathml" # change path as needed
    

Register PathML as an IPython kernel

conda activate pathml
conda install ipykernel
python -m ipykernel install --user --name=pathml

This makes PathML available as a kernel in jupyter lab or notebook.

Contributing

PathML is an open source project. Consider contributing to benefit the entire community!

There are many ways to contribute to PathML, including:

  • Submitting bug reports
  • Submitting feature requests
  • Writing documentation and examples
  • Fixing bugs
  • Writing code for new features
  • Sharing workflows
  • Sharing trained model parameters
  • Sharing PathML with colleagues, students, etc.

See contributing for more details.

License

The GNU GPL v2 version of PathML is made available via Open Source licensing. The user is free to use, modify, and distribute under the terms of the GNU General Public License version 2.

Commercial license options are available also.

Contact

Questions? Comments? Suggestions? Get in touch!

[email protected]

Comments
  • Improve performance

    Improve performance

    Currently, writing to h5 is the primary performance bottleneck when running a pipeline (see profile here).

    Perhaps by refactoring our h5 integration, we can boost performance. For example, maybe we should store tiles in separate groups instead of in one big array. This would potentially let us write in parallel and also make it trivial to support overlapping tiles (#223).

    Some work on this was being tracked in #200 but I am creating this issue so that we can discuss here instead of on the pull request

    enhancement 
    opened by jacob-rosenthal 16
  • Warnings associated with circulating a keras model among dask workers

    Warnings associated with circulating a keras model among dask workers

    We are getting a set of warnings (which I think is contributing to a subsequent error https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867 and the warnings https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185) is around the loading of a saved keras checkpoint file.

    Here is the warning we get, which we get when we run the SegmentMIF function:

    WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.

    We believe that the keras saved model is being recycled dirtily to dask workers (existing locks not released etc.), causing the warnings in https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185 and eventually, the error in https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867.

    To Reproduce Here is our pipeline. I cannot share the data for regulatory reasons.

    pipeline = Pipeline([
        CollapseRunsVectra(),    
        SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=2, image_resolution=0.5, 
                   gpu=False, postprocess_kwargs_whole_cell=None, 
                   postprocess_kwrags_nuclear=None),
        QuantifyMIF('nuclear_segmentation')   
    ])
    
    bug 
    opened by surya-narayanan 13
  • Docker ci

    Docker ci

    Add a Dockerfile which builds a working environment for pathml and starts up a jupyterlab instance in the container, which users can connect to and get up and running quickly. Also add a github actions workflow to build the image and publish it to dockerhub whenever we create a new release

    This will close #145

    opened by jacob-rosenthal 11
  • Unable to open tile object (object 'array' doesn't exist)

    Unable to open tile object (object 'array' doesn't exist)

    Describe the bug Unable to access tile array from TileDataset.__getitem__() KeyError: "Unable to open object (object 'array' doesn't exist)"

    To Reproduce Traceback:

    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_4463/806735975.py in <module>
    ----> 1 tile_dataset.__getitem__(0)
    
    ~/pathml/pathml/ml/dataset.py in __getitem__(self, ix)
         54         ### this part copied from h5manager.get_tile()
         55         tile_image = self.h5["tiles"][str(k)]["array"][:]
    ---> 56 
         57         # get corresponding masks if there are masks
         58         if "masks" in self.h5["tiles"][str(k)].keys():
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    /opt/conda/envs/wtf/lib/python3.8/site-packages/h5py/_hl/group.py in __getitem__(self, name)
        286                 raise ValueError("Invalid HDF5 object reference")
        287         else:
    --> 288             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
        289 
        290         otype = h5i.get_type(oid)
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/h5o.pyx in h5py.h5o.open()
    
    KeyError: "Unable to open object (object 'array' doesn't exist)"
    

    Expected behavior Should be able to access the tile array object. I created the h5 file with the following code:

    slidename = < Path to Slide >
    slide = SlideData(slide_name, backend = "bioformats", slide_type = types.Vectra)
    slide.write(f'/parent_directory/{slide.name}.h5')
    
    bug 
    opened by surya-narayanan 11
  • Adding to h5 file

    Adding to h5 file

    Is it possible to re-run a slide with a different pipeline and add to the h5 file, without re-doing tiling? Happy to provide an example, if that would be helpful.

    opened by surya-narayanan 9
  • Resolving dependencies between PathML and Deepcell

    Resolving dependencies between PathML and Deepcell

    Describe the bug When we run pipelines for multiparametric images we often want to include models from deepcell https://github.com/vanvalenlab/deepcell-tf (especially for the SegmentMIF transform). It is difficult for users to solve the environment since installing deepcell downgrades packages like numpy to incompatible versions. This has caused installation problems for @MohamedOmar2020 and other internal users

    To Reproduce

    pipe = Pipeline(
        [
            CollapseRunsVectra(),
            SegmentMIF(
                model="mesmer",
                nuclear_channel=0,
                cytoplasm_channel=7,
                image_resolution=0.5,
            ),
            QuantifyMIF(segmentation_mask="cell_segmentation"),
        ]
    )
    dataset.run(pipe)
    

    Expected behavior We would expect this to run but following the default installation instructions (option 1 from pip) followed by pip install deepcell results in a series of numpy errors when we attempt to run the pipeline

    Working Solution These dependency problems are resolved (at least to the extent that the above pipeline can run) by upgrading numpy after deepcell installation as follows

    conda create --name pathml python=3.8
    conda activate pathml
    sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev
    conda install openjdk==8.0.152
    pip install pathml
    pip install deepcell
    pip install --upgrade numpy
    

    The question is: should we include this in our installation instructions for users who want to use multiparametric pipelines? Should we create a docker container for multiparametric pipelines? Should we remove our dependency on deepcell and try to wrap the model more directly in PathML (or train our own)?

    enhancement 
    opened by ryanccarelli 8
  • Weird segmentation results

    Weird segmentation results

    Hello, I have a problem with the segmentation resulting from the mesmer model. It looks like the model is not identifying cells properly since many cells are too large with too many nuclei. This is the code used to process the image:

    pipe = Pipeline([ CollapseRunsVectra(), SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=7, image_resolution=0.5), QuantifyMIF(segmentation_mask='cell_segmentation') ])

    slidedata.run(pipe, distributed = False, tile_size= (12784, 13234), tile_pad=False, overwrite_existing_tiles=True)

    img = slidedata.tiles[3].image[10000:10500,12000:12500, :] nuc_mask = slidedata.tiles[3].masks['nuclear_segmentation'][10000:10500,12000:12500, :] cell_mask = slidedata.tiles[3].masks['cell_segmentation'][10000:10500,12000:12500, :]

    img_fiji = np.expand_dims(img, axis=0) nuc_cytoplasm = np.stack((img_fiji[:,:,:,0], img_fiji[:,:,:,7]), axis=-1) rgb_image = create_rgb_image(nuc_cytoplasm, channel_colors=['blue', 'green']) cell_segmentation_predictions = np.expand_dims(cell_mask, axis=0) overlay_cell = make_outline_overlay(rgb_data=rgb_image, predictions=cell_segmentation_predictions)

    That is how it looks like when I overlay the segmentation on the original image in fiji: OverlaySeg1

    I loaded a small part of the original image in fiji and adjusted the brightness/contrast then used the mesmer model for segmentation (using deepcell directly not pathml) and the segmentation seems good. this is how it looks like: Screenshot 2021-07-19 at 1 39 19 PM

    Is it right to assume that the bad segmentation shown in the first image has something to do with the brightness/contrast of the raw image? Any ideas how to fix this?

    Thanks in advance

    opened by MohamedOmar2020 8
  • indices should be either on cpu or on the same device as the indexed tensor (cpu)

    indices should be either on cpu or on the same device as the indexed tensor (cpu)

    Describe the bug RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (CPU)

    To Reproduce

    n_classes_pannuke = 6

    load the model

    hovernet = HoVerNet(n_classes=n_classes_pannuke)

    wrap model to use multi-GPU

    hovernet = torch.nn.DataParallel(hovernet)

    set up optimizer

    opt = torch.optim.Adam(hovernet.parameters(), lr = 1e-4)

    learning rate scheduler to reduce LR by factor of 10 each 25 epochs

    scheduler = StepLR(opt, step_size=25, gamma=0.1)

    send model to GPU

    hovernet.to(device);

    n_epochs = 50

    print performance metrics every n epochs

    print_every_n_epochs = None

    evaluating performance on a random subset of validation mini-batches

    this saves time instead of evaluating on the entire validation set

    n_minibatch_valid = 50

    epoch_train_losses = {} epoch_valid_losses = {} epoch_train_dice = {} epoch_valid_dice = {}

    best_epoch = 0

    main training loop

    for i in tqdm(range(n_epochs)): minibatch_train_losses = [] minibatch_train_dice = []

    ### put model in training mode
    hovernet.train()
    
    for data in train_dataloader:
        ### send the data to the GPU
        images = data[0].float().to(device)
        masks = data[1].to(device)
        hv = data[2].float().to(device)
        tissue_type = data[3]
    
        ### zero out gradient
        opt.zero_grad()
    
        ### forward pass
        outputs = hovernet(images)
    
        ### compute loss
        loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)
    
        ### track loss
        minibatch_train_losses.append(loss.item())
    
        ### also track dice score to measure performance
        preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
        truth_binary = masks[:, -1, :, :] == 0
        dice = dice_score(preds_detection, truth_binary.cpu().numpy())
        minibatch_train_dice.append(dice)
    
        ### compute gradients
        loss.backward()
    
        ### step optimizer and scheduler
        opt.step()
    
    ### step LR scheduler
    scheduler.step()
    
    ### evaluate on random subset of validation data
    hovernet.eval()
    minibatch_valid_losses = []
    minibatch_valid_dice = []
    ### randomly choose minibatches for evaluating
    minibatch_ix = np.random.choice(range(len(valid_dataloader)), replace=False, size=n_minibatch_valid)
    with torch.no_grad():
        for j, data in enumerate(valid_dataloader):
            if j in minibatch_ix:
                # send the data to the GPU
                images = data[0].float().to(device)
                masks = data[1].to(device)
                hv = data[2].float().to(device)
                tissue_type = data[3]
    
                # forward pass
                outputs = hovernet(images)
    
                # compute loss
                loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)
    
                # track loss
                minibatch_valid_losses.append(loss.item())
    
                # also track dice score to measure performance
                preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
                truth_binary = masks[:, -1, :, :] == 0
                dice = dice_score(preds_detection, truth_binary.cpu().numpy())
                minibatch_valid_dice.append(dice)
    
    ### average performance metrics over minibatches
    mean_train_loss = np.mean(minibatch_train_losses)
    mean_valid_loss = np.mean(minibatch_valid_losses)
    mean_train_dice = np.mean(minibatch_train_dice)
    mean_valid_dice = np.mean(minibatch_valid_dice)
    
    ### save the model with best performance
    if i != 0:
        if mean_valid_loss < min(epoch_valid_losses.values()):
            best_epoch = i
            torch.save(hovernet.state_dict(), f"hovernet_best_perf.pt")
    
    ### track performance over training epochs
    epoch_train_losses.update({i : mean_train_loss})
    epoch_valid_losses.update({i : mean_valid_loss})
    epoch_train_dice.update({i : mean_train_dice})
    epoch_valid_dice.update({i : mean_valid_dice})
    
    if print_every_n_epochs is not None:
        if i % print_every_n_epochs == print_every_n_epochs - 1:
            print(f"Epoch {i+1}/{n_epochs}:")
            print(f"\ttraining loss: {np.round(mean_train_loss, 4)}\tvalidation loss: {np.round(mean_valid_loss, 4)}")
            print(f"\ttraining dice: {np.round(mean_train_dice, 4)}\tvalidation dice: {np.round(mean_valid_dice, 4)}")
    

    save fully trained model

    torch.save(hovernet.state_dict(), f"hovernet_fully_trained.pt") print(f"\nEpoch with best validation performance: {best_epoch}")

    Expected behavior Should start model training

    Screenshots image

    Additional context Anyone else also have this problem. I run this on HPC with 4 GPUs, each having 16G memory.

    bug 
    opened by luzy05111036 7
  • Issue with distributed processing

    Issue with distributed processing

    Hello, Thank you for fixing the distributed issue with the mesmer model. I am running the pipeline with 'distributed = True' flag but I am getting many warnings and errors. Additionally, the pipeline was supposed to return 145 tiles but it is returning only 3 !. This is a part of the log message:

    def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/io/h5ad.py:64: FutureWarning: The force_dense argument is deprecated. Use as_dense instead. warn( /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) storing 'coords' as categorical storing 'slice' as categorical storing 'tile' as categorical **> 2021-08-10 00:00:05.176962: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested distributed.worker - WARNING - Compute Failed Function: apply args: (Tile(coords=(1598, 6616), name=None, image shape: (1598, 1654, 2), slide_type=SlideType(stain=Fluor, platform=Vectra, tma=None, rgb=None, volumetric=None, time_series=None), labels=None, masks=None, counts=None)) kwargs: {} Exception: OutOfRangeError()**

    That last error (bold text) is repeated many times.

    Thanks in advance

    bug 
    opened by MohamedOmar2020 7
  • Error installing owing to cached version of torch

    Error installing owing to cached version of torch

    If one tries to install pathml after a previously failed installation attempt, one runs into the following error, which I think is due to using cached files. One suggested solution (for just torch) is to do pip --no-cache-dir install torchvision, but i dont know if this is going to solve the issue and how to integrate this into intalling pathml as a whole, without installing each dependency one by one.

    (pathml) [email protected]:~$ pip install pathml
    Collecting pathml
      Using cached pathml-2.0.4-py3-none-any.whl (83 kB)
    Collecting scipy
      Using cached scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.6 MB)
    Collecting python-bioformats>=4.0.0
      Using cached python_bioformats-4.0.5-py3-none-any.whl (41.4 MB)
    Collecting scikit-image
      Using cached scikit_image-0.19.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB)
    Collecting scikit-learn
      Using cached scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
    Requirement already satisfied: pip in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (22.0.3)
    Collecting openslide-python
      Using cached openslide-python-1.1.2.tar.gz (316 kB)
      Preparing metadata (setup.py) ... done
    Collecting dask[distributed]
      Using cached dask-2022.1.1-py3-none-any.whl (1.1 MB)
    Collecting pandas
      Using cached pandas-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
    Collecting matplotlib
      Using cached matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
    Collecting anndata>=0.7.6
      Using cached anndata-0.7.8-py3-none-any.whl (91 kB)
    Requirement already satisfied: numpy>=1.16.4 in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (1.22.2)
    Collecting h5py
      Using cached h5py-3.6.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
    Collecting opencv-contrib-python
      Using cached opencv_contrib_python-4.5.5.62-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (66.6 MB)
    Collecting pydicom
      Using cached pydicom-2.2.2-py3-none-any.whl (2.0 MB)
    Collecting torch
    SystemError: deallocated bytearray object has exported buffers
    ERROR: Exception:
    Traceback (most recent call last):
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 167, in exc_logging_wrapper
        status = run_func(*args)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
        return func(self, options, args)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 339, in run
        requirement_set = resolver.resolve(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
        result = self._result = resolver.resolve(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve
        state = resolution.resolve(requirements, max_rounds=max_rounds)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 373, in resolve
        failure_causes = self._attempt_to_pin_criterion(name)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 213, in _attempt_to_pin_criterion
        criteria = self._get_updated_criteria(candidate)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 204, in _get_updated_criteria
        self._add_to_criteria(criteria, requirement, parent=candidate)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
        if not criterion.candidates:
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
        return bool(self._sequence)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
        return any(self)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
        return (c for c in iterator if id(c) not in self._incompatible_ids)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
        candidate = func()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 215, in _make_candidate_from_link
        self._link_candidate_cache[link] = LinkCandidate(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 288, in __init__
        super().__init__(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__
        self.dist = self._prepare()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
        dist = self._prepare_distribution()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 299, in _prepare_distribution
        return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 487, in prepare_linked_requirement
        return self._prepare_linked_requirement(req, parallel_builds)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 532, in _prepare_linked_requirement
        local_file = unpack_url(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 214, in unpack_url
        file = get_http_url(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 94, in get_http_url
        from_path, content_type = download(link, temp_dir.path)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 133, in __call__
        resp = _http_get_download(self._session, link)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 116, in _http_get_download
        resp = session.get(target_url, headers=HEADERS, stream=True)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 542, in get
        return self.request('GET', url, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/session.py", line 454, in request
        return super().request(method, url, *args, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 529, in request
        resp = self.send(prep, **send_kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 645, in send
        r = adapter.send(request, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/adapter.py", line 48, in send
        cached_response = self.controller.cached_request(request)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/controller.py", line 151, in cached_request
        resp = self.serializer.loads(request, cache_data)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 95, in loads
        return getattr(self, "_loads_v{}".format(ver))(request, data)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 182, in _loads_v4
        cached = msgpack.loads(data, raw=False)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 128, in unpackb
        ret = unpacker._unpack()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
        ret[key] = self._unpack(EX_CONSTRUCT)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
        ret[key] = self._unpack(EX_CONSTRUCT)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 546, in _unpack
        typ, n, obj = self._read_header()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 488, in _read_header
        obj = self._read(n)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 407, in _read
        ret = self._buffer[i : i + n]
    MemoryError
    
    
    bug 
    opened by surya-narayanan 6
  • Extracting tile returns multi dim output from H&E qptiff

    Extracting tile returns multi dim output from H&E qptiff

    I think this is specific to my scanner, when running the following line on my qptiff H&E svs file,

    region = wsi.slide.extract_region(location = (900, 800), size = (500, 500))

    I get a 5 dimensional object, which is incompatible for downstream analysis.

    Do you think it may be useful to run a np.squeeze right before returning the array in wsi.slide.extract_region?

    opened by surya-narayanan 6
  • How to implement HoVer-Net Model in TIAToolBox?

    How to implement HoVer-Net Model in TIAToolBox?

    opened by WilliWespe 0
  • How to train hovernet starting with semantic-level mask image?

    How to train hovernet starting with semantic-level mask image?

    Is your feature request related to a problem? Please describe. I have large WSI data and multi-class jpeg masks, but I am so tired to find a solution to make them work with any hovernet implementation.

    Describe the solution you'd like I'd like to be able to feed my large WSI data along with the jpeg masks, and tiling and training then took place.

    Describe alternatives you've considered If I still need instance masks, I can do that I do this in watershed. But still don't know what format PathML would want (e.g npy, mat, json, jpeg ..etc

    Additional context

    Any help is highly appreciated.

    Thanks!

    enhancement 
    opened by OmarAshkar 3
  • Deepcell segmentation without cytoplasm channel

    Deepcell segmentation without cytoplasm channel

    I note that Deepcell provides both a segmentation (nuclear channel only) and mesmer (nuclear and cytoplasm) model (https://www.deepcell.org/predict) Our datasets do not have a single general cytoplasm marker that will capture all cell type cytoplasm required by mesmer model, eg tumour cells vs immune vs stromal cells Can both nuclear_channel=DAPI, cytoplasm_channel=DAPI in mesmer model? or can SegmentMIF(model='segmentation' be supported? thanks!

    enhancement 
    opened by jamesMo84 1
  • Allow SlideData to use existing h5path files

    Allow SlideData to use existing h5path files

    As motivated by https://github.com/Dana-Farber-AIOS/pathml/issues/332 and https://github.com/Dana-Farber-AIOS/pathml/issues/300, this modifies SlideData to read and update Tiles from an existing h5path file instead of requiring each pipeline run to recreate all tiles from scratch.

    This includes #335 as many transforms (e.g. BoxBlur) require np.uint8 data instead of the default float16 saved to h5path files. I was also working off my load-data-in-workers branch because it had significant performance changes for my use cases. Sorry about the branching messiness, hopefully the changes will be clearer as other branches are merged into dev.

    This makes breaking changes to the SlideData API, namely replacing generate_tiles with get_tiles and moving the tile parameterization from run to the SlideData constructor.

    opened by tddough98 0
  • Load tiles in parallel on workers and add options to `TissueDetectionHE`

    Load tiles in parallel on workers and add options to `TissueDetectionHE`

    This contains two separate improvements

    • add drop_empty_tiles and keep_mask options to the TissueDetectionHE transform to bypass saving tiles with no detected H&E tissue and bypass saving masks
    • parallelize tile image loading by using dask.delayed to avoid loading images on the main thread

    The first part is both for convenience and performance. It's possible to generate all tiles and then filter out the empty tiles and remove masks before writing the h5path to disk, but that requires that all the tiles be added to the Tiles which takes IO time. If these tiles and masks are never saved even to in-memory objects, processing can finish faster.

    The second part is a core performance issue with distributed processing. I believe it's relevant to https://github.com/Dana-Farber-AIOS/pathml/issues/211 and https://github.com/Dana-Farber-AIOS/pathml/issues/299. When processing tiles, I've found that loading time >> processing time, and currently, tile image data is loaded on the main thread and scatters the loaded tile to workers. This prevents any parallelism as all but one worker are always waiting for the main thread to load data and send them a tile.

    Additionally, as all tiles have to be loaded on the main thread, the block that generates the futures

    for tile in self.generate_tiles(
        level=level,
        shape=tile_size,
        stride=tile_stride,
        pad=tile_pad,
        **kwargs,
    ):
        if not tile.slide_type:
            tile.slide_type = self.slide_type
        # explicitly scatter data, i.e. send the tile data out to the cluster before applying the pipeline
        # according to dask, this can reduce scheduler burden and keep data on workers
        big_future = client.scatter(tile)
        f = client.submit(pipeline.apply, big_future)
        processed_tile_futures.append(f)
    

    has to load all tiles and send them all to workers before ANY tile can be added to the Tiles and the memory can be freed in the next block

    # as tiles are processed, add them to h5
    for future, tile in dask.distributed.as_completed(
        processed_tile_futures, with_results=True
    ):
        self.tiles.add(tile)
    

    causing the dramatic memory leaks seen in https://github.com/Dana-Farber-AIOS/pathml/issues/211.

    I've used dask.delayed to prevent reading from the input file until the image is accessed on the worker. The code that accesses the file and loads the image can now be run by each worker in parallel. To preserve the parallelism, we have to take care not to access and load tile.image on the main thread before loading it on the worker, or to at least wrap accesses in dask.delayed as in SlideData.generate_tiles.

    I had some issues with the backends not being picklable. The Backend has to be sent to each worker so it has access to the code that interfaces with the filesystem. I changed Backend filelike attributes to be lazily evaluated with the @property decorator.

    opened by tddough98 4
  • Parameterize dtype for h5path with `SlideData` constructor

    Parameterize dtype for h5path with `SlideData` constructor

    Currently, PathML stores all images with float16, forcing all image inputs to be upcast or downcast to this data type, which increases storage size or loses information. There already is a dtype parameter in the SlideData constructor, but it's only used to assist the BioFormatsBackend in loading images correctly. This repurposes that parameter to control what dtype h5py uses when writing image data.

    I also changed masks to stored as ENUM and use the strongest compression setting as boolean masks are highly compressible and easily compressed. The compression made a huge difference in file size, and using (HDFView)[https://www.hdfgroup.org/downloads/hdfview/] showed a compression ratio of 100-200x for masks. The ENUM data type is stored as an 8-bit integer (https://docs.h5py.org/en/stable/special.html#enumerated-types) but at least this is less than using float16.

    opened by tddough98 4
Releases(v2.1.0)
  • v2.1.0(Apr 22, 2022)

    What's Changed

    • Clean SegmentMIF by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/294
      • Removed GPU argument from SegmentMIF
      • Separated whole_cell and nuclear kwargs
    • Update README.md by @surya-narayanan in https://github.com/Dana-Farber-AIOS/pathml/pull/298
    • Update quantify mif by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/301
      • update the functional implementation F() to not require a tile object.
      • Add "label" property to counts matrix.
    • Fix tiling bug by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/306
      • Fixed bug for generate_tiles() within OpenSlideBackend and BioFormatsBackend. Tile shape evenly divides into slide shape
    • Added logging functionality by @BeeGass in https://github.com/Dana-Farber-AIOS/pathml/pull/304
      • Includes logger customization
    • Don't augment test or valid splits for PanNuke by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/309

    New Contributors

    • @BeeGass made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/304

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.4...v2.1.0

    Source code(tar.gz)
    Source code(zip)
  • v2.0.4(Feb 7, 2022)

    What's Changed

    • Fix bug caused by mixing up (i, j) and (x, y) coordinate systems in BioFormatsBackend (#278)
    • Add option to not normalize image in BioFormatsBackend.extract_region() (#279)
    • Fix logic when inferring correct backend to use from file path which was failing on paths containing periods (#284)
    • Fix bug to correctly pass image_resolution argument to Mesmer model (#286)
    • Fix outdated url for PanNuke dataset (#287) by @Yu-AnChen
    • Fix GitHub Actions configuration which was causing testing suite to hang (#289)

    New Contributors

    • @dependabot made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/275
    • @Yu-AnChen made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/287

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.3...v2.0.4

    Source code(tar.gz)
    Source code(zip)
  • v2.0.3(Jan 7, 2022)

  • v2.0.2(Jan 6, 2022)

    What's Changed

    • Streamline environment setup by removing spams as a dependency (#142) and updating environment.yml to create an environment with both PathML and deepcell (#259 #210)
    • Add a Dockerfile for another installation option, and a GitHub Actions workflow to build and publish it to Dockerhub on new release (#145)
    • Add series_as_channels flag to BioFormatsBackend.extract_region() to fix support for images from the MISI lab (#261)

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.1...v2.0.2

    Source code(tar.gz)
    Source code(zip)
  • v2.0.dev4(Jan 4, 2022)

  • v2.0.dev3(Jan 4, 2022)

  • v2.0.dev2(Jan 4, 2022)

  • 2.0.dev1(Jan 4, 2022)

  • v2.0.1(Dec 25, 2021)

    What's Changed

    • Improve h5path read/write by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/260

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.0...v2.0.1

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Dec 19, 2021)

    What's new in v2.0.0:

    • Changed h5path format and refactored h5manager to improve performance (#231)
    • support XYZCT images for TileDataset (#233)
    • Cleaned up versioning tracker (#236)
    • fix bug when reading region from openslide backend at higher levels (#242)
    • Add support for multi-series images with BioformatsBackend (#251)
    • Pin python-bioformats version to avoid any possibility of log4j hacks (#256)
    • Added optional flag in SlideDataset.run() to write slides to h5path as they finish processing (#226)
    • Added GitHub Actions workflow to automatically build package and publish to PyPI when a new release is created (#235)

    Because the file format is changed in this version, .h5path files saved in older versions will not be able to be loaded in this one, and vice versa (i.e. breaking backwards compatibility, hence the bumped major version).

    Source code(tar.gz)
    Source code(zip)
  • v1.0.4(Nov 29, 2021)

  • v1.0.dev4(Nov 29, 2021)

Owner
AI Operations and Data Science Services group
Structured Data Gradient Pruning (SDGP)

Structured Data Gradient Pruning (SDGP) Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by re

Bradley McDanel 10 Nov 11, 2022
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

Clova AI Research 34 Apr 13, 2022
In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

模式识别大作业——人脸检测与识别平台 本项目是一个简易的人脸检测识别平台,提供了人脸信息录入和人脸识别的功能。前端采用 html+css+js,后端采用 pytorch,

Xuhua Huang 5 Aug 02, 2022
《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Image2Reverb Image2Reverb is an end-to-end neural network that generates plausible audio impulse responses from single images of acoustic environments

Nikhil Singh 48 Nov 27, 2022
Code release for Universal Domain Adaptation(CVPR 2019)

Universal Domain Adaptation Code release for Universal Domain Adaptation(CVPR 2019) Requirements python 3.6+ PyTorch 1.0 pip install -r requirements.t

THUML @ Tsinghua University 229 Dec 23, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Relational Embedding for Few-Shot Classification (ICCV 2021) Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho [paper], [project hompage] We propose t

Dahyun Kang 82 Dec 24, 2022
A simple program for training and testing vit

Vit This is a simple program for training and testing vit. Key requirements: torch, torchvision and timm. Dataset I put 5 categories of the cub classi

xiezhenyu 2 Oct 11, 2022
A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

Apurva R. Umredkar 2 Nov 25, 2022
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 202 Dec 27, 2022
The Noise Contrastive Estimation for softmax output written in Pytorch

An NCE implementation in pytorch About NCE Noise Contrastive Estimation (NCE) is an approximation method that is used to work around the huge computat

Kaiyu Shi 287 Nov 25, 2022
Python package for dynamic system estimation of time series

PyDSE Toolset for Dynamic System Estimation for time series inspired by DSE. It is in a beta state and only includes ARMA models right now. Documentat

Blue Yonder GmbH 40 Oct 07, 2022
RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems This is our implementation for the paper: Weibo Gao, Qi Liu*, Zhenya Hu

BigData Lab @USTC 中科大大数据实验室 10 Oct 16, 2022
MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

MetaDrive: Composing Diverse Driving Scenarios for Generalizable RL [ Documentation | Demo Video ] MetaDrive is a driving simulator with the following

DeciForce: Crossroads of Machine Perception and Autonomy 276 Jan 04, 2023
Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

Accurate Phylogenetic Inference with a Symmetry-preserving Neural Network Model Claudia Solis-Lemus Shengwen Yang Leonardo Zepeda-Núñez This repositor

Leonardo Zepeda-Núñez 2 Feb 11, 2022
Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation in TensorFlow 2 Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan

Phan Nguyen 1 Dec 16, 2021
ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation (Accepted by BMVC'21) Abstract: Images acquir

10 Dec 08, 2022
Creating multimodal multitask models

Fusion Brain Challenge The English version of the document can be found here. Обновления 01.11 Мы выкладываем пример данных, аналогичных private test

Sber AI 43 Nov 28, 2022
Predicting Event Memorability from Contextual Visual Semantics

Predicting Event Memorability from Contextual Visual Semantics

0 Oct 06, 2021
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022