Count the MACs / FLOPs of your PyTorch model.

Overview

THOP: PyTorch-OpCounter

How to install

pip install thop (now continously intergrated on Github actions)

OR

pip install --upgrade git+https://github.com/Lyken17/pytorch-OpCounter.git

How to use

  • Basic usage

    from torchvision.models import resnet50
    from thop import profile
    model = resnet50()
    input = torch.randn(1, 3, 224, 224)
    macs, params = profile(model, inputs=(input, ))
  • Define the rule for 3rd party module.

    class YourModule(nn.Module):
        # your definition
    def count_your_model(model, x, y):
        # your rule here
    
    input = torch.randn(1, 3, 224, 224)
    macs, params = profile(model, inputs=(input, ), 
                            custom_ops={YourModule: count_your_model})
  • Improve the output readability

    Call thop.clever_format to give a better format of the output.

    from thop import clever_format
    macs, params = clever_format([macs, params], "%.3f")

Results of Recent Models

The implementation are adapted from torchvision. Following results can be obtained using benchmark/evaluate_famous_models.py.

Model Params(M) MACs(G)
alexnet 61.10 0.77
vgg11 132.86 7.74
vgg11_bn 132.87 7.77
vgg13 133.05 11.44
vgg13_bn 133.05 11.49
vgg16 138.36 15.61
vgg16_bn 138.37 15.66
vgg19 143.67 19.77
vgg19_bn 143.68 19.83
resnet18 11.69 1.82
resnet34 21.80 3.68
resnet50 25.56 4.14
resnet101 44.55 7.87
resnet152 60.19 11.61
wide_resnet101_2 126.89 22.84
wide_resnet50_2 68.88 11.46
Model Params(M) MACs(G)
resnext50_32x4d 25.03 4.29
resnext101_32x8d 88.79 16.54
densenet121 7.98 2.90
densenet161 28.68 7.85
densenet169 14.15 3.44
densenet201 20.01 4.39
squeezenet1_0 1.25 0.82
squeezenet1_1 1.24 0.35
mnasnet0_5 2.22 0.14
mnasnet0_75 3.17 0.24
mnasnet1_0 4.38 0.34
mnasnet1_3 6.28 0.53
mobilenet_v2 3.50 0.33
shufflenet_v2_x0_5 1.37 0.05
shufflenet_v2_x1_0 2.28 0.15
shufflenet_v2_x1_5 3.50 0.31
shufflenet_v2_x2_0 7.39 0.60
inception_v3 27.16 5.75
Issues
  • How to set counter for nn.functional.interpolate?

    How to set counter for nn.functional.interpolate?

    In my model, nn.functional.interpolate is used, whose computation should be also calculated in my case.

    nn.functional.interpolate: count_upsample_bilinear was appended in register_hooks in profile.py, and count_upsample_bilinear was implemented in count_hooks.py, but it did not work.

    I also tried to print m_type in add_hooks in profile, but the output did not include nn.functional.interpolate. It seems that we cannot add hooks to some APIs like nn.functional.interpolate which work differently from APIs from torch.nn in terms of parameters and handler. I wonder if we can only wrap them into 3rd party modules to set FLOPs counters for them like the example in README.md.

    Is there any workarounds introducing minimum modification to the model definition code?

    opened by fengyuentau 9
  • Calculation of parameters is inaccurate when modules do not participate in the forward

    Calculation of parameters is inaccurate when modules do not participate in the forward

    Hi, I found that the calculation of parameters is inaccurate if the modules are defined while not participating in the forward.

    opened by colorjam 8
  • Is the count_conv2d for FLOPs?

    Is the count_conv2d for FLOPs?

    I think the count_conv2d function is for MACC or Multiplications. In this function, total_ops is calculated by K x K x Cin x Wout x Hout X Cout. Isn't it for the MACC calculation?

    opened by sungsooo 7
  • skip connection structure

    skip connection structure

    It seems that the OpCounter doesn't take into account the skip connection structure.

    opened by ZhenpengChenCode 6
  • code is wrong

    code is wrong

    kernel = torch.Tensor([*(x[0].shape[2:])]) // torch.Tensor(list((m.output_size,))).squeeze() ^ SyntaxError: invalid syntax

    opened by yxlijun 6
  • Calculation of trainable parameters is inaccurate due to storage as floating-point variable

    Calculation of trainable parameters is inaccurate due to storage as floating-point variable

    I noticed this when testing out the module on a single, large Linear() layer:

    layer = torch.nn.Linear(8153, 7533, bias=False)
    ...
    flops, params = profile(layer, inputs=(inputs,))
    print(f"{flops} FLOPS    {params} parameters")
    

    The tool returns:

    843934466048.0 FLOPS    61416548.0 parameters
    

    The latter figure is off by 1 from the expected answer of 8153*7533=61,416,589. The fact that it printed out decimal precision was a hint, and pointed me to the following lines:

    https://github.com/Lyken17/pytorch-OpCounter/blob/1ede8b613c13808d9f52ce5666a18922972592be/thop/profile.py#L72-L76

    Since the above value is greater than 2^(24) = 16,777,216, it may not be perfectly represented in 32-bit floating-point format. Indeed, while p.numel() = 61416549, torch.Tensor([p.numel()]).dtype is torch.float32, so the value gets rounded to tensor([61416548.]).

    Total trainable parameters should probably be stored in variables with an explicit dtype=torch.int64.

    opened by felker 6
  • pip install thop failed

    pip install thop failed

    Hi! I want to install this file offline through pip install thop. But it appear errors. Environment: unbuntu python 3.6 torch 0.4.1 image Thanks!

    opened by InstantWindy 6
  • It doesnt work for my resnet18.

    It doesnt work for my resnet18.

    when I use it to calculate resnet18's flops,an error occurs:

    [INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>. [INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>. [INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU6'>. [WARN] Cannot find rule for <class 'torch.nn.modules.container.Sequential'>. Treat it as zero Macs and zero Params. [WARN] Cannot find rule for <class 'main.ResidualBlock'>. Treat it as zero Macs and zero Params. [INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>. [WARN] Cannot find rule for <class 'main.ResNet'>. Treat it as zero Macs and zero Params.

    my net architecture is:

    ResNet( (conv1): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6() ) (layer1): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (layer2): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (layer3): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (layer4): Sequential( (0): ResidualBlock( (left): Sequential( (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (left): Sequential( (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU6(inplace=True) (3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (shortcut): Sequential() ) ) (fc): Linear(in_features=512, out_features=10, bias=True) )

    I wanna know how to deal with it,very thanks.

    opened by JachinMa 5
  • NameError: name 'fprint' is not defined

    NameError: name 'fprint' is not defined

    hi ,thank you for your work.I met this problem''NameError: name 'fprint' is not defined'',when i run the test demo.as shown below 屏幕截图 2021-10-06 155512

    opened by cheun726 5
  • [WARN] Cannot find rule of module

    [WARN] Cannot find rule of module

    Hello,thanks for your excellent job . But,there is a warning i don‘t understand. what is the rule of module ? Can you give an example how to use it in individual module.

    WARN: [WARN] Cannot find rule for <class 'net.model.DC_blocks'>. Treat it as zero Macs and zero Params. ps(DC_blocks is my model )

    opened by peylnog 5
  • Restriction on reused modules might be too strict

    Restriction on reused modules might be too strict

    It seems that the reuse of module is banned by THOP tools

    https://github.com/Lyken17/pytorch-OpCounter/blob/1f4ddb7fb51c3b1a49d60708f9b857535e5dc4e1/thop/profile.py#L49-L51

    However, it is not quite reasonable. The re-use of ReLU-type module is very common, it can make your codes neater and causes no harm. e.g.

    class Model(nn.Module):
      def __init__(self):
        super().__init__()
        self.dropout = nn.Dropout(0.5)
        self.relu = nn.ReLU()
        self.lin1 = nn.Linear(4096, 4096)
        self.lin2 = nn.Linear(4096, 4096)
        self.lin3 = nn.Linear(4096, 4096)
      def forward(self, x):
        output = self.relu(self.lin1(x))
        output = self.relu(self.lin2(output))
        output = self.dropout(self.relu(self.lin3(output)))
        ....
    

    Besides, I think the network legality check is not THOP's work to do. This kind of restriction is unnecessary and will cause troublesome Error raise as I can see.

    opened by Kylin9511 5
  • calculation of AvgPooling

    calculation of AvgPooling

    Could you please give a closer explanation for the calculation of the AvgPool FLOPs? Despite I know that the impact of pooling layers is minimal, I'd like to understand the calculation. As I get from the code the formular for 2dAvgPooling you use is: (kernel_size + 1) * H_out * W_out * in_channels * out_channels But if I consider the formular on the pytorch website I'd calculate them as follows: For each kernel beeing applied on your input you have kernel_size * kernel_size - 1 additions plus one division plus one multiplication of your kernel_size inside the denominator. You apply out_channels kernels on in_channels input channels which result in a H_out * W_out output. This is the computation for all multiplications and additions. To be compliant with the regular understanding of MACs and FLOPs you devide by 2 as two MACs result in one FLOP. So finally I get the formular: FLOPs = (kernel_size * kernel_size - 1) * H_out * W_out * in_channels * out_channels / 2

    opened by MarkusAmann 5
  • Different result of Resnet50

    Different result of Resnet50

    Hi, thank you so much for your awesome work! I run the following code: from torchvision.models import resnet50 from thop import profile model = resnet50() flops, params = profile(model, input_size=(1, 3, 224,224)) print(flops, params)

    But I get the output: 4142713856.0 25557032.0 , which is different from your results in the table of README. And I'am using pytorch 1.0.1. Could you help me explain that? Thank you!

    opened by hellojialee 5
  • fix count_adap_avgpool

    fix count_adap_avgpool

    Summary: torch.DoubleTensor(list((m.output_size,))) will raise error when output_size contains None. Now, obtain the output size from y directly.

    opened by ShoufaChen 4
  • Maybe a bug?

    Maybe a bug?

    I use the tools to count FLOPs for EXTD-pytorch The FLOPs result is different from the result computed by EXTD-pytorch which has code for computing flops as follows:

    def compute_flops(model, image_size):
      import torch.nn as nn
      flops = 0.
      input_size = image_size
      for m in model.modules():
        if isinstance(m, nn.AvgPool2d) or isinstance(m, nn.MaxPool2d):
          input_size = input_size / 2.
        if isinstance(m, nn.Conv2d):
          if m.groups == 1:
            flop = (input_size[0] / m.stride[0] * input_size[1] / m.stride[1]) * m.kernel_size[0] ** 2 * m.in_channels * m.out_channels
          else:
            flop = (input_size[0] / m.stride[0] * input_size[1] / m.stride[1]) * m.kernel_size[0] ** 2 * ((m.in_channels/m.groups) * (m.out_channels/m.groups) * m.groups)
          flops += flop
          if m.stride[0] == 2: input_size = input_size / 2.
    
      return flops / 1000000000., flops / 1000000
    

    The result by OpCounter is 1.084G, while 11.15G by above code. Input size is 640x640

    opened by li3cmz 4
  • AttributeError: 'torch.Size' object has no attribute 'numel'

    AttributeError: 'torch.Size' object has no attribute 'numel'

    Traceback (most recent call last): File "/home/gpu/chen/IMIXNet-PyTorch/code/get_Netpara.py", line 21, in print("Torch:", thop.profile(torch.nn.Conv2d(1, 128, (3, 3)), inputs=(torch.zeros((1, 1, 128, 128)),), verbose=False)[0]) File "/home/gpu/.conda/envs/pytorch4/lib/python3.6/site-packages/thop/profile.py", line 92, in profile model(*inputs) File "/home/gpu/.conda/envs/pytorch4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 479, in call hook_result = hook(self, input, result) File "/home/gpu/.conda/envs/pytorch4/lib/python3.6/site-packages/thop/count_hooks.py", line 20, in count_convNd kernel_ops = m.weight.size()[2:].numel() # Kw x Kh AttributeError: 'torch.Size' object has no attribute 'numel'

    what‘s the problem here ? the version of thop is wrong?

    opened by SuckChen 4
  • Missing hook functions for inherited layer

    Missing hook functions for inherited layer

    Problem

    I have a layer inherited from Conv2d:

    class KaimingInitConv2d(torch.nn.Conv2d):
      def reset_parameters(self):
        torch.nn.init.kaiming_normal_(self.weight.data, a=0.0, nonlinearity="leaky_relu")
        if self.bias is not None:
          self.bias.data.fill_(0.0)
    

    In thop, the add_hooks function can't found the inheritance relationship between KaimingInitConv2d and torch.nn.Conv2d, it missing the MACs in every KaimingInitConv2d layer object.

    Possible solution

    Replace dict getitem

    def add_hooks(m):
        ...
        if m_type in custom_ops:  # if defined both op maps, use custom_ops to overwrite.
            fn = custom_ops[m_type]
        elif m_type in register_hooks:
            fn = register_hooks[m_type]
        ...
    

    With dict iteration:

    def add_hooks(m):
        ...
        for op_type, op_fn in register_hooks.items():
            if isinstance(m, op_type):
                fn = op_fn
                break
        for op_type, op_fn in custom_ops.items(): # if defined both op maps, use custom_ops to overwrite.
            if isinstance(m, op_type):
                fn = op_fn
                break
        ...
    

    I will send a PR as soon as available.

    opened by tuxzz 4
  • pip install failed, and 'torch.Size' object has no attribute 'nume

    pip install failed, and 'torch.Size' object has no attribute 'nume

    When I used pip install thop, I got

    Could not find a version that satisfies the requirement thop (from versions: )

    When I installed thop from source and used it to resnet50 model, I got the following error message:


    AttributeError Traceback (most recent call last) in () ----> 1 flops, params = profile(model, input_size=(1, 3, 224,224))

    in profile(model, input_size, custom_ops, device) 73 x = torch.zeros(input_size).to(device) 74 with torch.no_grad(): ---> 75 model(x) 76 77 total_ops = 0

    /anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 475 result = self._slow_forward(*input, **kwargs) 476 else: --> 477 result = self.forward(*input, **kwargs) 478 for hook in self._forward_hooks.values(): 479 hook_result = hook(self, input, result)

    /anaconda3/envs/pytorch/lib/python3.6/site-packages/torchvision/models/resnet.py in forward(self, x) 137 138 def forward(self, x): --> 139 x = self.conv1(x) 140 x = self.bn1(x) 141 x = self.relu(x)

    /anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 477 result = self.forward(*input, **kwargs) 478 for hook in self._forward_hooks.values(): --> 479 hook_result = hook(self, input, result) 480 if hook_result is not None: 481 raise RuntimeError(

    in count_convNd(m, x, y) 10 batch_size = x.size(0) 11 ---> 12 kernel_ops = m.weight.size()[2:].numel() 13 bias_ops = 1 if m.bias is not None else 0 14 ops_per_element = kernel_ops + bias_ops

    AttributeError: 'torch.Size' object has no attribute 'numel'

    opened by Kylin9511 4
  • Does the model need to be loaded everytime a profile is done?

    Does the model need to be loaded everytime a profile is done?

    I have a model which is needed to be profiled with inputs in different sizes. I tested the following example code in jupyter notebook, where the second run without reloading the model threw a warning and no result returned:

    import sys
    pyversion = sys.version.split(' ')[0]
    print('current python version: {:s}'.format(pyversion))
    
    >> current python version: 3.7.2
    
    import torch
    torchversion = torch.__version__
    print('current pytorch version: {:s}'.format(torch.__version__))
    
    import torchvision
    from torchvision import models
    print('current torchvision version: {:s}'.format(torchvision.__version__))
    
    >> current pytorch version: 1.1.0.post2
    >> current torchvision version: 0.3.0
    
    # VGG16
    vgg16 = models.vgg16()
    from thop import profile
    inp = torch.randn(1, 3, 224, 224, )
    profile(vgg16, inputs=(inp, ))
    
    >> .....(normal output, too long to list)
    
    inp = torch.randn(1, 3, 448, 448, )
    profile(vgg16, inputs=(inp, ))
    
    ---------------------------------------------------------------------------
    Warning                                   Traceback (most recent call last)
    <ipython-input-4-9bd5a4920d3c> in <module>
          1 inp = torch.randn(1, 3, 448, 448, )
    ----> 2 profile(vgg16, inputs=(inp, ))
    
    /usr/local/lib/python3.7/site-packages/thop/profile.py in profile(model, inputs, custom_ops, verbose)
         67 
         68     model.eval()
    ---> 69     model.apply(add_hooks)
         70 
         71     with torch.no_grad():
    
    /usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py in apply(self, fn)
        245         """
        246         for module in self.children():
    --> 247             module.apply(fn)
        248         fn(self)
        249         return self
    
    /usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py in apply(self, fn)
        245         """
        246         for module in self.children():
    --> 247             module.apply(fn)
        248         fn(self)
        249         return self
    
    /usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py in apply(self, fn)
        246         for module in self.children():
        247             module.apply(fn)
    --> 248         fn(self)
        249         return self
        250 
    
    /usr/local/lib/python3.7/site-packages/thop/profile.py in add_hooks(m)
         38         if hasattr(m, "total_ops") or hasattr(m, "total_params"):
         39             raise Warning("Either .total_ops or .total_params is already defined in %s.\n"
    ---> 40                           "Be careful, it might change your code's behavior." % str(m))
         41 
         42         m.register_buffer('total_ops', torch.zeros(1))
    
    Warning: Either .total_ops or .total_params is already defined in Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)).
    Be careful, it might change your code's behavior.
    

    Since I have a lot of inputs in different sizes (like images in a dataset), it is much slower if I reload the model everytime an input comes in. Is this a PyTorch design feature or this repo's? Is there any workarounds?

    opened by fengyuentau 4
  • Unexpected

    Unexpected "Cannot find rule"

    Hi, I encountered an unexpected issue with inception_v3 model inside torchvision.

    Hereby I give the MRE

    >>> import torchvision
    >>> import torch
    >>> from thop import profile                                                                                       
    >>> model = torchvision.models.inception_v3()                                                                                                                                                                                                                              
    >>> inputs = torch.Tensor(1,3,224,224)                                                                                                                                                                                                                                     
    >>> macs, params = profile(model, inputs=(inputs,))                                                                                                                                 
    [INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
    [INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>.
    [WARN] Cannot find rule for <class 'torchvision.models.inception.BasicConv2d'>. Treat it as zero Macs and zero Params.
    [WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionA'>. Treat it as zero Macs and zero Params.
    [WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionB'>. Treat it as zero Macs and zero Params.
    [WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionC'>. Treat it as zero Macs and zero Params.                                                                                                                                                      
    [INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
    [WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionAux'>. Treat it as zero Macs and zero Params.
    [WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionD'>. Treat it as zero Macs and zero Params.
    [WARN] Cannot find rule for <class 'torchvision.models.inception.InceptionE'>. Treat it as zero Macs and zero Params.                                                                                                                                                      
    [WARN] Cannot find rule for <class 'torchvision.models.inception.Inception3'>. Treat it as zero Macs and zero Params.
    >>> macs
    2847217792.0  # while it should be 5.75 G as indicated in README.md
    

    The model definition is as follows:

    class BasicConv2d(nn.Module):
        def __init__(self, in_channels, out_channels, **kwargs):
            super(BasicConv2d, self).__init__()
            self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
            self.bn = nn.BatchNorm2d(out_channels, eps=0.001)
    
        def forward(self, x):
            x = self.conv(x)
            x = self.bn(x)
            return F.relu(x, inplace=True)
    

    This should not happen as “BasicConv2d” is inherited from nn.Module and is thus "legal". I suppose that it was a corner case that I have encountered ?

    opened by MARMOTatZJU 4
  • Allow custom_ops to work with non-leaf modules

    Allow custom_ops to work with non-leaf modules

    Some complex modules have children and non-negligible FLOPs themselves. One example of this is nn.MultiheadAttention, which includes children Linear modules, but also significant computation internally.

    Unfortunately it's not possible to correctly define FLOPs for these modules because custom_ops is ignored for non-leaf modules.

    After this fix it's possible to define: custom_ops = {nn.MultiheadAttention: count_mha} and have it work.

    opened by myleott 4
  • Does it work for LSTM layer?

    Does it work for LSTM layer?

    opened by zhipeng-fan 4
  • Counting FLOPS during DropOut

    Counting FLOPS during DropOut

    Firstly, I sincerely thank you for THOP, a much needed product for the community. However, I would like to know how to find the FLOPs when nodes are dropped. A followup question would be, if we freeze the layers, the forward propagation will be executed but weights wont be updated during the back propagation. How to calculate FLOPs when layers are frozen.

    opened by Goutam-Kelam 4
  • About the Params(M) FLOPs(G) displayed in your result

    About the Params(M) FLOPs(G) displayed in your result

    Hi,

    The tool you developed is very helpful.

    But I am a little confused about the unit used in your result.

    For example, here Params(M) should be the number of parameters, not the size? image

    And how about FLOPs(G), 1x10^9 flops the total forward and backward for a model? But the result you got is quite different from the tensorflow profiler's result. So, I am a little confused.

    opened by dujiangsu 3
  • How to calculate 3D CNN Flops

    How to calculate 3D CNN Flops

    How to calculate 3D CNN flops ? thanks

    opened by slighting666 3
  • GFLOPS

    GFLOPS

    Hi!When my convolution kernel value is only 1 and -1 or when my input value is only 1 and -1, will the GFLOPS of the convolution operation be small? Thanks!

    opened by InstantWindy 3
  • support multiple inputs?

    support multiple inputs?

    My model has two inputs. How can I use the code for a multiple-input model? Thanks.

    opened by xuxy09 3
  • flops for depthwise convolution layer

    flops for depthwise convolution layer

    Hi, first of all, thank you for your code. It really helps my work. I am curious about the flops for depthwise convolutional layer, which should be different from the standard conv. But it seems there is no implementation for such layer in your code. Could you please add support for it? Thanks!

    opened by pgr2015 3
  • About the maxpooling

    About the maxpooling

    May I ask why do you disable all the maxpooling part of your codes?

    https://github.com/Lyken17/pytorch-OpCounter/blob/1f4ddb7fb51c3b1a49d60708f9b857535e5dc4e1/thop/profile.py#L23-L28

    https://github.com/Lyken17/pytorch-OpCounter/blob/1f4ddb7fb51c3b1a49d60708f9b857535e5dc4e1/thop/count_hooks.py#L111-L125

    opened by Kylin9511 3
  • clever_format is not exposed yet

    clever_format is not exposed yet

    The original output of profile is too ugly and it seems that you add a visualization tool in utils

    https://github.com/Lyken17/pytorch-OpCounter/blob/1f4ddb7fb51c3b1a49d60708f9b857535e5dc4e1/thop/utils.py#L2-L8

    However, the clever_format tool is not included in the __init__ so it can not be smoothly called. My recommendation is add an clever_format in your profile arguments, and make it True by default.

    https://github.com/Lyken17/pytorch-OpCounter/blob/1f4ddb7fb51c3b1a49d60708f9b857535e5dc4e1/thop/profile.py#L42-L43

    opened by Kylin9511 3
  • kwargs instead of args

    kwargs instead of args

    Usage of kwargs seems to be better than the current usage of args..

    opened by naviak 0
  • Different results when reusing the defined nn.LeakyReLU()

    Different results when reusing the defined nn.LeakyReLU()

    Thank you for sharing the excellent tools.

    I got a problem when I calculated the MACs of my model where a defined nn.LeakyReLU() was reused. An example is as following:

    import torch
    import torch.nn as nn
    from thop import profile
    
    class MyModel1(nn.Module):
        def __init__(self):
            super(MyModel1, self).__init__()
            act = nn.LeakyReLU()
            self.m1 = nn.Sequential(
                nn.Conv2d(32, 32, 3, 1, 1),
                act)
            self.m2 = nn.Sequential(
                nn.Conv2d(32, 32, 3, 1, 1),
                act)
    
        def forward(self, x):
            x = self.m1(x)
            x = self.m2(x)
            return x
    
    class MyModel2(nn.Module):
        def __init__(self):
            super(MyModel2, self).__init__()
            self.m1 = nn.Sequential(
                nn.Conv2d(32, 32, 3, 1, 1),
                nn.LeakyReLU())
            self.m2 = nn.Sequential(
                nn.Conv2d(32, 32, 3, 1, 1),
                nn.LeakyReLU())
    
        def forward(self, x):
            x = self.m1(x)
            x = self.m2(x)
            return x
    
    if __name__ == '__main__':
        model = MyModel1()
        input = torch.randn(1, 32, 256, 256)
        macs, params = profile(model, inputs=(input,))
        print('Model1: ', macs, params)
        # Output: Model1:  1228931072.0 18496.0
    
        model = MyModel2()
        input = torch.randn(1, 32, 256, 256)
        macs, params = profile(model, inputs=(input,))
        print('Model2: ', macs, params)
        # Output: Model2:  1216348160.0 18496.0
    

    For these two models, the operations are the same, but the calculated MACs are indeed different. Is this something wrong?

    Looking forward to your reply.

    opened by csbhr 0
  • different results when counting nn.BatchNorm1d in nn.Sequential

    different results when counting nn.BatchNorm1d in nn.Sequential

    class test_model1(nn.Module):
        def __init__(self):
            super(test_model1, self).__init__()
            self.block = nn.Sequential(nn.BatchNorm1d(64))
    
        def forward(self, x):
            x = self.block(x)
            return x
    
    
    class test_model2(nn.Module):
        def __init__(self):
            super(test_model2, self).__init__()
            self.bn = nn.BatchNorm1d(64)
            self.block = nn.Sequential(
                self.bn,
            )
    
        def forward(self, x):
            x = self.block(x)
            return x
    
    
    if __name__ == "__main__":
    
        data = torch.rand(1, 64, 1)
    
        model = test_model1().eval()
        macs, params = profile(model, inputs=(data,))
        macs, params = clever_format([macs, params], "%.3f")
        print(f"model1: MACs {macs} Params {params}")
    
        model2 = test_model2().eval()
        macs, params = profile(model2, inputs=(data,))
        macs, params = clever_format([macs, params], "%.3f")
        print(f"model2: MACs {macs} Params {params}")
    
    
    

    then get

    [INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm1d'>. [WARN] Cannot find rule for <class 'torch.nn.modules.container.Sequential'>. Treat it as zero Macs and zero Params. [WARN] Cannot find rule for <class 'main.test_model1'>. Treat it as zero Macs and zero Params. model1: MACs 128.000B Params 128.000B [INFO] Register count_bn() for <class 'torch.nn.modules.batchnorm.BatchNorm1d'>. [WARN] Cannot find rule for <class 'torch.nn.modules.container.Sequential'>. Treat it as zero Macs and zero Params. [WARN] Cannot find rule for <class 'main.test_model2'>. Treat it as zero Macs and zero Params. model2: MACs 512.000B Params 256.000B

    Please check it.

    opened by razIove 0
  • thop:  __floordiv__ is deprecated

    thop: __floordiv__ is deprecated

    E:\Program Files\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\thop\vision\basic_hooks.py:92: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). kernel = torch.DoubleTensor([*(x[0].shape[2:])]) // torch.DoubleTensor(list((m.output_size,))).squeeze()

    opened by liguge 1
  • Check params and flops for custom densenet model

    Check params and flops for custom densenet model

    I am trying to obtain the flops and parameters of my custom multi-head Densenet Model but I am not sure if the returned values are accurate after receiving these warnings . One head is to learn rgb images and the second head is to learn depth images. image Do I need to define custom rule for denselayer/denseblock ? If so, how can I go about doing it ? Any advice or help would be appreciated as I am a beginner in this field .

    Here is my model definition :

      class RGBDMH(nn.Module):
      
          Two-stream RGBD architecture
      
          Attributes
          ----------
          pretrained: bool
              If set to `True` uses the pretrained DenseNet model as the base. If set to `False`, the network
              will be trained from scratch. 
              default: True 
          num_channels: int
              Number of channels in the input.      
      
          def __init__(self, pretrained=True, num_channels=4):
      
              """ Init function
      
              Parameters
              ----------
              pretrained: bool
                  If set to `True` uses the pretrained densenet model as the base. Else, it uses the default network
                  default: True
              num_channels: int
                  Number of channels in the input. 
              """
              super(RGBDMH, self).__init__()
      
              dense_rgb = models.densenet161(pretrained=pretrained)
      
              dense_d = models.densenet161(pretrained=pretrained)
      
              features_rgb = list(dense_rgb.features.children())
      
              features_d = list(dense_d.features.children())
      
              temp_layer = features_d[0]
      
              mean_weight = np.mean(temp_layer.weight.data.detach().numpy(),axis=1) # for 96 filters
      
              new_weight = np.zeros((96,1,7,7))
        
              for i in range(1):
                  new_weight[:,i,:,:]=mean_weight
      
              features_d[0]=nn.Conv2d(1, 96, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      
              features_d[0].weight.data = torch.Tensor(new_weight)
      
              self.enc_rgb = nn.Sequential(*features_rgb[0:8])
      
              self.enc_d = nn.Sequential(*features_d[0:8])
      
              self.linear=nn.Linear(768,1)
      
              self.linear_rgb=nn.Linear(384,1)
      
              self.linear_d=nn.Linear(384,1)
      
              self.gavg_pool=nn.AdaptiveAvgPool2d(1)
              #import pdb; pdb.set_trace()
      
      
          def forward(self, img):
              """ Propagate data through the network
      
              Parameters
              ----------
              img: :py:class:`torch.Tensor` 
                The data to forward through the network. Expects Multi-channel images of size num_channelsx224x224
      
              Returns
              -------
              dec: :py:class:`torch.Tensor` 
                  Binary map of size 1x14x14
              op: :py:class:`torch.Tensor`
                  Final binary score.  
              gap: Gobal averaged pooling from the encoded feature maps
      
              """
      
              x_rgb = img[:, [0,1,2], :, :]
      
              x_depth = img[:, 3, :, :].unsqueeze(1)
      
              enc_rgb = self.enc_rgb(x_rgb)
      
              enc_d = self.enc_d(x_depth)
      
      
              gap_rgb = self.gavg_pool(enc_rgb).squeeze() 
              gap_d = self.gavg_pool(enc_d).squeeze() 
      
              gap_d=gap_d.view(-1,384)
      
              gap_rgb=gap_rgb.view(-1,384)
              
              gap_rgb = nn.Sigmoid()(gap_rgb) 
              gap_d = nn.Sigmoid()(gap_d) 
      
              op_rgb=self.linear_rgb(gap_rgb)
      
              op_d=self.linear_d(gap_d)
      
      
              op_rgb = nn.Sigmoid()(op_rgb)
      
              op_d = nn.Sigmoid()(op_d)
      
              gap=torch.cat([gap_rgb,gap_d], dim=1)
              op = self.linear(gap)
              op = nn.Sigmoid()(op)
       
              return gap, op, op_rgb, op_d
    
    opened by Speedarion 0
  • Update basic_hooks.py to avoid `//` warning

    Update basic_hooks.py to avoid `//` warning

    to avoid warning message while using the operator // (floordiv) :

    UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). kernel = torch.DoubleTensor([*(x[0].shape[2:])]) // torch.DoubleTensor(

    opened by xiaoyuan0203 0
  • where is main function?

    where is main function?

    null

    opened by sunhufei 1
  • Remove redundant code.

    Remove redundant code.

    The code block has been moved into the counter.py. Remove it from the basic_hooks.py.

    opened by lijunzh 0
  • Different op types / weighted operations count

    Different op types / weighted operations count

    Hey, An exciting project you have here, looking forward to using it!

    A question; I'd be interested in a more detailed estimation of complexity, in particular such that different non-linearities would have different weight. Typically, for example, a log- or exp-operation performed on a CPU is 25 times more expensive than a regular multiply-add-with-carry (MAC). So, basically, I'm thinking that there could be a configuration file for the profiler, with a table stating the proportional complexity of different non-linearities. Such weighting would make the profiler output better reflect the true cost of execution. Alternatively, the profiler could give as an optional output a subdivision of operation types which have been used.

    So, what do you think?

    For a short example, see Table 1 in my wiki at https://wiki.aalto.fi/display/ITSP/Other+performance+measures The more detailed information which I use can be found on page 259 of https://www.itu.int/rec/T-REC-G.191-200911-S/en a newer version of that is available on page 277 of file STLmanual.pdf from https://www.itu.int/rec/T-REC-G.191-201901-I/en

    I'm aware that for best accuracy, we should also include for-loops and if-statements in the counting, but I would assume that they have less impact in big models. I'm primarily interested on the nonlinear operations, since in my experience they have a large contribution to the overall complexity in the models I use.

    If this has wider interest, then I can contribute something to it, but I don't have time to do it myself completely.

    cheers,

    Tom https://research.aalto.fi/en/persons/tom-b%C3%A4ckstr%C3%B6m

    opened by tombackstrom 3
Owner
Ligeng Zhu
Ph.D. student in [email protected], alumni at SFU and ZJU.
Ligeng Zhu
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 7, 2022
Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

fix_m1_rgb Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr. No warranty provided for using th

Kevin Gao 90 Feb 21, 2022
Multiple custom object count and detection using YOLOv3-Tiny method

Electronic-Component-YOLOv3 Introduce This project created to detect, count, and recognize multiple custom object using YOLOv3-Tiny method. The target

Derwin Mahardika 1 Dec 7, 2021
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 4 Dec 10, 2021
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 2 Jan 21, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 61 Feb 10, 2022
MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

LxzGordon 1 Oct 24, 2021
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 836 Feb 18, 2022
Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

Roxbili 1 Feb 9, 2022
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 369 Feb 24, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Feb 24, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 89 Feb 11, 2022
ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

ReConsider ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin

Facebook Research 42 Feb 20, 2022
Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance.

Qualcomm Innovation Center 87 Jan 31, 2022
LIAO Shuiying 3 Sep 30, 2021
Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

Yasunori Shimura 11 Nov 28, 2021
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

null 3 Nov 13, 2021
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 177 Feb 7, 2022