NeuroSurgeon.Masking package

Submodules

Mask Layer

class NeuroSurgeon.Masking.mask_layer.MaskLayer(*args: Any, **kwargs: Any)

Bases: Module

This is an abstract class that defines the minimum functionality of a mask layer. All mask layers inherit from this class.

Parameters:
  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.

property ablation
property mask_bias
property mask_unit
property use_masks
train(train_bool)
calculate_l0()

Returns the L0 norm of the mask. This is used for L0 regularization and for reporting on mask size

:return The total L0 norm of the mask :rtype float

calculate_max_l0()

Returns the maximum L0 norm of the mask (i.e. the number of prunable weights/neurons in the layer).

:return The maximum L0 norm of the mask :rtype float

abstract reset_parameters()
abstract forward()
abstract from_layer()

Continuous Sparsification

class NeuroSurgeon.Masking.contsparse_layer.ContSparseLayer(*args: Any, **kwargs: Any)

Bases: MaskLayer

An abstract class defining the basic functionality of Continuous Sparsification layers. Continuous Sparsification was introduced in Savarese et al. 2021 (https://arxiv.org/abs/1912.04427). It introduces a deterministic approximation to the L0 penalty (in contrast to the stochastic approach implemented by the Hard Concrete Layer). Continuous sparsification introduces mask parameters, which get mulitplied by a temperature parameter and then passed through a sigmoid to create a soft mask. To train a continuous sparsification layer, one must anneal the temperature parameter, increasing it after every epoch. This eventually turns a soft mask into a hard mask, by the end of training. At eval time, the mask is explicitly turned into a binary mask, with all positive mask parameters mapping to 1, and all negative mask parameters mapping to 0.

The easiest way to anneal the temperature parameter is with a callback. Here is an example!

class TemperatureCallback:
    def __init__(self, total_epochs, final_temp):
        self.temp_increase = final_temp ** (1.0 / total_epochs)

    def update(self, model):
        temp = model.temperature
        model.temperature = temp * self.temp_increase

Parameters:
  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.

property mask_init_value
property temperature

A float that is multiplied by the mask parameters before they are passed into a sigmoid function for creating a soft mask.

Returns:

The temperature parameter

Return type:

float

property force_resample

A boolean value that determines whether a mask is resampled when creating a mask using ablation = “randomly_sampled” or ablation = “complement_sampled”. Otherwise a mask is sampled once and returned when calling these functions.

Returns:

A boolean determining whether masks are resampled when using ablation = “randomly_sampled” or “complement_sampled”

Return type:

bool

class NeuroSurgeon.Masking.contsparse_layer.ContSparseLinear(*args: Any, **kwargs: Any)

Bases: ContSparseLayer

A Linear Layer that implements Continuous Sparsification.

Parameters:
  • in_features (int) – Size of each input sample

  • out_features (int) – Size of each output sample

  • bias (bool) – If set to False, the layer will not learn an additive bias. Default: True

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: torch.nn.Linear, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_value: float = 0.0)

Creates a ContSparseLinear layer from a nn.Linear layer.

Parameters:
  • layer (nn.Linear) – An instance of a nn.Linear layer.

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification Linear layer with the same weights as the layer argument

Return type:

ContSparseLinear

reset_parameters()

Reset network parameters.

forward(data: torch.Tensor, **kwargs) torch.Tensor

Performs a forward pass

Parameters:

data (torch.Tensor) – Input tensors

Returns:

Output tensor

Return type:

torch.Tensor

class NeuroSurgeon.Masking.contsparse_layer.ContSparseGPTConv1D(*args: Any, **kwargs: Any)

Bases: ContSparseLayer

A GPT-style Conv1D Layer that implements Continuous Sparsification.

Parameters:
  • nf (int) – Number of output features

  • nx (int) – Number of input features

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: transformers.pytorch_utils.Conv1D, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_value: float = 0.0)

Creates a ContSparseGPTConv1D layer from a Conv1D layer.

Parameters:
  • layer (Conv1D) – An instance of a Conv1D layer from pytorch_utils

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification GPTConv1D layer with the same weights as the layer argument

Return type:

ContSparseGPTConv1D

forward(x)

Performs a forward pass

Parameters:

x (torch.Tensor) – Input tensor

Returns:

Output tensor

Return type:

torch.Tensor

class NeuroSurgeon.Masking.contsparse_layer.ContSparseConv2d(*args: Any, **kwargs: Any)

Bases: _ContSparseConv

A Conv2d layer that implements continuous sparsification

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • padding (int) – Padding added to all four sides of the input. Default: 0

  • stride (int) – Stride of the convolution. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool) – If True, adds a learnable bias to the output. Default: True

  • padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: torch.nn.Conv2d, ablation: str = 'none', mask_unit: str = 'none', mask_bias: bool = False, mask_init_value: float = 0.0)

Create a ContSparseConv2d layer from a nn.Conv2d layer

Parameters:
  • layer (nn.Conv2d) – A nn.Conv2d layer

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification Conv2d layer with the same weights as the layer argument

Return type:

ContSparseConv2d

class NeuroSurgeon.Masking.contsparse_layer.ContSparseConv1d(*args: Any, **kwargs: Any)

Bases: _ContSparseConv

A Conv1d layer that implements continuous sparsification

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • padding (int) – Padding added to all four sides of the input. Default: 0

  • stride (int) – Stride of the convolution. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool) – If True, adds a learnable bias to the output. Default: True

  • padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: torch.nn.Conv1d, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_value: float = 0.0)

Create a ContSparseConv1d layer from a nn.Conv1d layer

Parameters:
  • layer (nn.Conv1d) – A nn.Conv1d layer

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification Conv1d layer with the same weights as the layer argument

Return type:

ContSparseConv1d

Hard Concrete Masking

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteLayer(*args: Any, **kwargs: Any)

Bases: MaskLayer

An abstract class defining the basic functionality of Hard Concrete layers. Hard Concrete Masking was introduced in Louizos et al. 2018 (https://arxiv.org/abs/1712.01312). It introduces a stochastic approximation to the L0 penalty.

Parameters:
  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.

  • mask_init_percentage (float) – Determines approximately how many parameters are left unpruned if one creates a hard mask by sampling from the hard concrete distribution and binarizing

  • left_stretch (float) – Determines how much the binary concrete distribution is stretched to give more mass to 0.0

  • right_stretch (float) – Determines how much the binary concrete distribution is stretched to give more mass to 1.0

  • temperature (float) – Determines the sampling temperature of the binary concrete distribution

property mask_init_percentage
property force_resample
calculate_l0()

Returns the L0 norm of the mask. This is used for L0 regularization and for reporting on mask size. This function overrides the default behavior (defined in MaskLayer) in order to provide the regularization term given in Louizos et al. 2018 during training

:return The L0 norm of the mask :rtype float

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteLinear(*args: Any, **kwargs: Any)

Bases: HardConcreteLayer

A Linear Layer that implements Hard Concrete Masking.

Parameters:
  • in_features (int) – Size of each input sample

  • out_features (int) – Size of each output sample

  • bias (bool) – If set to False, the layer will not learn an additive bias. Default: True

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: torch.nn.Linear, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_percentage: float = 0.5)

Creates a HardConcreteLinear layer from a nn.Linear layer.

Parameters:
  • layer (nn.Linear) – An instance of a nn.Linear layer.

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete Linear layer with the same weights as the layer argument

Return type:

HardConcreteLinear

reset_parameters()

Reset network parameters.

forward(data: torch.Tensor, **kwargs) torch.Tensor

Performs a forward pass

Parameters:

data (torch.Tensor) – Input tensors

Returns:

Output tensor

Return type:

torch.Tensor

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteGPTConv1D(*args: Any, **kwargs: Any)

Bases: HardConcreteLayer

A GPT-style Conv1D Layer that implements Hard Concrete Masking.

Parameters:
  • nf (int) – Number of output features

  • nx (int) – Number of input features

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: transformers.pytorch_utils.Conv1D, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_percentage: float = 0.5)

Creates a HardConcreteGPTConv1D layer from a Conv1D layer.

Parameters:
  • nf (int) – Number of output features

  • nx (int) – Number of input features

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete GPTConv1D layer with the same weights as the layer argument

Return type:

HardConcreteGPTConv1D

forward(x)

Performs a forward pass

Parameters:

x (torch.Tensor) – Input tensors

Returns:

Output tensor

Return type:

torch.Tensor

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteConv2d(*args: Any, **kwargs: Any)

Bases: _HardConcreteConv

A Conv2d layer that implements Hard Concrete Masking.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • padding (int) – Padding added to all four sides of the input. Default: 0

  • stride (int) – Stride of the convolution. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool) – If True, adds a learnable bias to the output. Default: True

  • padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: torch.nn.Conv2d, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_percentage: float = 0.5)

Create a HardConcreteConv2d layer from a nn.Conv2d layer

Parameters:
  • layer (nn.Conv2d) – A nn.Conv2d layer

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete Conv2d layer with the same weights as the layer argument

Return type:

HardConcreteConv2d

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteConv1d(*args: Any, **kwargs: Any)

Bases: _HardConcreteConv

A Conv1d layer that implements Hard Concrete Masking.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • padding (int) – Padding added to all four sides of the input. Default: 0

  • stride (int) – Stride of the convolution. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool) – If True, adds a learnable bias to the output. Default: True

  • padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: torch.nn.Conv1d, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = True, mask_init_percentage: float = 0.5)

Create a HardConcreteConv1d layer from a nn.Conv1d layer

Parameters:
  • layer (nn.Conv1d) – A nn.Conv1d layer

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete Conv1d layer with the same weights as the layer argument

Return type:

HardConcreteConv1d

Magnitude Pruning

class NeuroSurgeon.Masking.magprune_layer.MagPruneLayer(*args: Any, **kwargs: Any)

Bases: MaskLayer

An abstract class defining the basic functionality of Magnitude Pruning layers. Magnitude pruning is not a differentiable masking strategy - it deterministically prunes the N% lowest magnitude weights. Masking always occurs at the weight-level. This pruning strategy might serve as a baseline to compare against gradient-based masking strategies (i.e. Continuous Sparsification or Hard Concrete Masking). This is a common strategy in model pruning, notably in work analyzing the Lottery Ticket Hypothesis (https://arxiv.org/abs/1803.03635)

Parameters:
  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.

  • prune_percentage (float) – Determines the percentage of weights to prune

property prune_percentage
property force_resample
class NeuroSurgeon.Masking.magprune_layer.MagPruneLinear(*args: Any, **kwargs: Any)

Bases: MagPruneLayer

A Linear Layer that implements Magnitude Pruning.

Parameters:
  • in_features (int) – Size of each input sample

  • out_features (int) – Size of each output sample

  • bias (bool) – If set to False, the layer will not learn an additive bias. Default: True

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: torch.nn.Linear, ablation: str = 'none', mask_bias: bool = False, prune_percentage: float = 0.2)

Creates a MagPruneLinear layer from a nn.Linear layer.

Parameters:
  • layer (nn.Linear) – An instance of a nn.Linear layer.

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning Linear layer with the same weights as the layer argument

Return type:

MagPruneLinear

reset_parameters()

Reset network parameters.

forward(data: torch.Tensor, **kwargs) torch.Tensor

Performs a forward pass

Parameters:

data (torch.Tensor) – Input tensors

Returns:

Output tensor

Return type:

torch.Tensor

class NeuroSurgeon.Masking.magprune_layer.MagPruneGPTConv1D(*args: Any, **kwargs: Any)

Bases: MagPruneLayer

A GPT-style Conv1D Layer that implements Magnitude Pruning.

Parameters:
  • nf (int) – Number of output features

  • nx (int) – Number of input features

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: transformers.pytorch_utils.Conv1D, ablation: str = 'none', mask_bias: bool = False, prune_percentage: float = 0.2)

Creates a MagPruneGPTConv1D layer from a Conv1D layer.

Parameters:
  • nf (int) – Number of output features

  • nx (int) – Number of input features

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning GPTConv1D layer with the same weights as the layer argument

Return type:

MagPruneGPTConv1D

forward(x)

Performs a forward pass

Parameters:

data (torch.Tensor) – Input tensors

Returns:

Output tensor

Return type:

torch.Tensor

class NeuroSurgeon.Masking.magprune_layer.MagPruneConv2d(*args: Any, **kwargs: Any)

Bases: _MagPruneConv

A Conv2d layer that implements Magnitude Pruning.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • padding (int) – Padding added to all four sides of the input. Default: 0

  • stride (int) – Stride of the convolution. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool) – If True, adds a learnable bias to the output. Default: True

  • padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: torch.nn.Conv2d, ablation: str = 'none', mask_bias: bool = False, prune_percentage: float = 0.2)

Create a MagPruneConv2d layer from a nn.Conv2d layer

Parameters:
  • layer (nn.Conv2d) – A nn.Conv2d layer

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning Conv2d layer with the same weights as the layer argument

Return type:

MagPruneConv2d

class NeuroSurgeon.Masking.magprune_layer.MagPruneConv1d(*args: Any, **kwargs: Any)

Bases: _MagPruneConv

A Conv1d layer that implements Magnitude Pruning.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • padding (int) – Padding added to all four sides of the input. Default: 0

  • stride (int) – Stride of the convolution. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool) – If True, adds a learnable bias to the output. Default: True

  • padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: torch.nn.Conv1d, ablation, mask_bias, prune_percentage)

Create a MagPruneConv1d layer from a nn.Conv1d layer

Parameters:
  • layer (nn.Conv1d) – A nn.Conv1d layer

  • ablation (str) –

    A string that determines how masks are produced from the mask layer parameters. Valid options include:

    • none: Producing a standard binary mask

    • zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.

    • random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.

    • randomly_sampled: Sampling a random binary mask of the same size as the standard mask.

    • complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.

  • mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False

  • prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning Conv1d layer with the same weights as the layer argument

Return type:

MagPruneConv1d

Module contents