NeuroSurgeon.Masking package

Submodules

`Mask Layer`

class NeuroSurgeon.Masking.mask_layer.MaskLayer(*args: Any, **kwargs: Any)

Bases: Module

This is an abstract class that defines the minimum functionality of a mask layer. All mask layers inherit from this class.

Parameters:

ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.

property ablation

property mask_bias

property mask_unit

property use_masks

train(train_bool)

calculate_l0()

Returns the L0 norm of the mask. This is used for L0 regularization and for reporting on mask size

:return The total L0 norm of the mask :rtype float

calculate_max_l0()

Returns the maximum L0 norm of the mask (i.e. the number of prunable weights/neurons in the layer).

:return The maximum L0 norm of the mask :rtype float

abstract reset_parameters()

abstract forward()

abstract from_layer()

`Continuous Sparsification`

class NeuroSurgeon.Masking.contsparse_layer.ContSparseLayer(*args: Any, **kwargs: Any)

Bases: MaskLayer

An abstract class defining the basic functionality of Continuous Sparsification layers. Continuous Sparsification was introduced in Savarese et al. 2021 (https://arxiv.org/abs/1912.04427). It introduces a deterministic approximation to the L0 penalty (in contrast to the stochastic approach implemented by the Hard Concrete Layer). Continuous sparsification introduces mask parameters, which get mulitplied by a temperature parameter and then passed through a sigmoid to create a soft mask. To train a continuous sparsification layer, one must anneal the temperature parameter, increasing it after every epoch. This eventually turns a soft mask into a hard mask, by the end of training. At eval time, the mask is explicitly turned into a binary mask, with all positive mask parameters mapping to 1, and all negative mask parameters mapping to 0.

The easiest way to anneal the temperature parameter is with a callback. Here is an example!

class TemperatureCallback:
    def __init__(self, total_epochs, final_temp):
        self.temp_increase = final_temp ** (1.0 / total_epochs)

    def update(self, model):
        temp = model.temperature
        model.temperature = temp * self.temp_increase

…

Parameters:

ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.

property mask_init_value

property temperature

A float that is multiplied by the mask parameters before they are passed into a sigmoid function for creating a soft mask.

Returns:: The temperature parameter
Return type:: float

property force_resample

A boolean value that determines whether a mask is resampled when creating a mask using ablation = “randomly_sampled” or ablation = “complement_sampled”. Otherwise a mask is sampled once and returned when calling these functions.

Returns:: A boolean determining whether masks are resampled when using ablation = “randomly_sampled” or “complement_sampled”
Return type:: bool

class NeuroSurgeon.Masking.contsparse_layer.ContSparseLinear(*args: Any, **kwargs: Any)

Bases: ContSparseLayer

A Linear Layer that implements Continuous Sparsification.

Parameters:

in_features (int) – Size of each input sample
out_features (int) – Size of each output sample
bias (bool) – If set to False, the layer will not learn an additive bias. Default: True
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: torch.nn.Linear, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_value: float = 0.0)

Creates a ContSparseLinear layer from a nn.Linear layer.

Parameters:

layer (nn.Linear) – An instance of a nn.Linear layer.
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification Linear layer with the same weights as the layer argument

Return type:

ContSparseLinear

reset_parameters(): Reset network parameters.

forward(data: torch.Tensor, **kwargs) → torch.Tensor

Performs a forward pass

Parameters:: data (torch.Tensor) – Input tensors
Returns:: Output tensor
Return type:: torch.Tensor

class NeuroSurgeon.Masking.contsparse_layer.ContSparseGPTConv1D(*args: Any, **kwargs: Any)

Bases: ContSparseLayer

A GPT-style Conv1D Layer that implements Continuous Sparsification.

Parameters:

nf (int) – Number of output features
nx (int) – Number of input features
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: transformers.pytorch_utils.Conv1D, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_value: float = 0.0)

Creates a ContSparseGPTConv1D layer from a Conv1D layer.

Parameters:

layer (Conv1D) – An instance of a Conv1D layer from pytorch_utils
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification GPTConv1D layer with the same weights as the layer argument

Return type:

ContSparseGPTConv1D

forward(x)

Performs a forward pass

Parameters:: x (torch.Tensor) – Input tensor
Returns:: Output tensor
Return type:: torch.Tensor

class NeuroSurgeon.Masking.contsparse_layer.ContSparseConv2d(*args: Any, **kwargs: Any)

Bases: _ContSparseConv

A Conv2d layer that implements continuous sparsification

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
padding (int) – Padding added to all four sides of the input. Default: 0
stride (int) – Stride of the convolution. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool) – If True, adds a learnable bias to the output. Default: True
padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: torch.nn.Conv2d, ablation: str = 'none', mask_unit: str = 'none', mask_bias: bool = False, mask_init_value: float = 0.0)

Create a ContSparseConv2d layer from a nn.Conv2d layer

Parameters:

layer (nn.Conv2d) – A nn.Conv2d layer
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification Conv2d layer with the same weights as the layer argument

Return type:

ContSparseConv2d

class NeuroSurgeon.Masking.contsparse_layer.ContSparseConv1d(*args: Any, **kwargs: Any)

Bases: _ContSparseConv

A Conv1d layer that implements continuous sparsification

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
padding (int) – Padding added to all four sides of the input. Default: 0
stride (int) – Stride of the convolution. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool) – If True, adds a learnable bias to the output. Default: True
padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

classmethod from_layer(layer: torch.nn.Conv1d, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_value: float = 0.0)

Create a ContSparseConv1d layer from a nn.Conv1d layer

Parameters:

layer (nn.Conv1d) – A nn.Conv1d layer
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_value (float) – The value that the mask parameters are initialized with. Default: 0.0

Returns:

Continuous Sparsification Conv1d layer with the same weights as the layer argument

Return type:

ContSparseConv1d

`Hard Concrete Masking`

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteLayer(*args: Any, **kwargs: Any)

Bases: MaskLayer

An abstract class defining the basic functionality of Hard Concrete layers. Hard Concrete Masking was introduced in Louizos et al. 2018 (https://arxiv.org/abs/1712.01312). It introduces a stochastic approximation to the L0 penalty.

Parameters:

ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.
mask_init_percentage (float) – Determines approximately how many parameters are left unpruned if one creates a hard mask by sampling from the hard concrete distribution and binarizing
left_stretch (float) – Determines how much the binary concrete distribution is stretched to give more mass to 0.0
right_stretch (float) – Determines how much the binary concrete distribution is stretched to give more mass to 1.0
temperature (float) – Determines the sampling temperature of the binary concrete distribution

property mask_init_percentage

property force_resample

calculate_l0()

Returns the L0 norm of the mask. This is used for L0 regularization and for reporting on mask size. This function overrides the default behavior (defined in MaskLayer) in order to provide the regularization term given in Louizos et al. 2018 during training

:return The L0 norm of the mask :rtype float

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteLinear(*args: Any, **kwargs: Any)

Bases: HardConcreteLayer

A Linear Layer that implements Hard Concrete Masking.

Parameters:

in_features (int) – Size of each input sample
out_features (int) – Size of each output sample
bias (bool) – If set to False, the layer will not learn an additive bias. Default: True
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: torch.nn.Linear, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_percentage: float = 0.5)

Creates a HardConcreteLinear layer from a nn.Linear layer.

Parameters:

layer (nn.Linear) – An instance of a nn.Linear layer.
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete Linear layer with the same weights as the layer argument

Return type:

HardConcreteLinear

reset_parameters(): Reset network parameters.

forward(data: torch.Tensor, **kwargs) → torch.Tensor

Performs a forward pass

Parameters:: data (torch.Tensor) – Input tensors
Returns:: Output tensor
Return type:: torch.Tensor

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteGPTConv1D(*args: Any, **kwargs: Any)

Bases: HardConcreteLayer

A GPT-style Conv1D Layer that implements Hard Concrete Masking.

Parameters:

nf (int) – Number of output features
nx (int) – Number of input features
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: transformers.pytorch_utils.Conv1D, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_percentage: float = 0.5)

Creates a HardConcreteGPTConv1D layer from a Conv1D layer.

Parameters:

nf (int) – Number of output features
nx (int) – Number of input features
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete GPTConv1D layer with the same weights as the layer argument

Return type:

HardConcreteGPTConv1D

forward(x)

Performs a forward pass

Parameters:: x (torch.Tensor) – Input tensors
Returns:: Output tensor
Return type:: torch.Tensor

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteConv2d(*args: Any, **kwargs: Any)

Bases: _HardConcreteConv

A Conv2d layer that implements Hard Concrete Masking.

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
padding (int) – Padding added to all four sides of the input. Default: 0
stride (int) – Stride of the convolution. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool) – If True, adds a learnable bias to the output. Default: True
padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: torch.nn.Conv2d, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = False, mask_init_percentage: float = 0.5)

Create a HardConcreteConv2d layer from a nn.Conv2d layer

Parameters:

layer (nn.Conv2d) – A nn.Conv2d layer
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete Conv2d layer with the same weights as the layer argument

Return type:

HardConcreteConv2d

class NeuroSurgeon.Masking.hardconcrete_layer.HardConcreteConv1d(*args: Any, **kwargs: Any)

Bases: _HardConcreteConv

A Conv1d layer that implements Hard Concrete Masking.

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
padding (int) – Padding added to all four sides of the input. Default: 0
stride (int) – Stride of the convolution. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool) – If True, adds a learnable bias to the output. Default: True
padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

classmethod from_layer(layer: torch.nn.Conv1d, ablation: str = 'none', mask_unit: str = 'weight', mask_bias: bool = True, mask_init_percentage: float = 0.5)

Create a HardConcreteConv1d layer from a nn.Conv1d layer

Parameters:

layer (nn.Conv1d) – A nn.Conv1d layer
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_unit (str) – A string that determines whether masks are produced at the weight or neuron level. Valid options include [“neuron”, “weight”]. Default: “weight”
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
mask_init_percentage (float) – The approximate number of parameters left unpruned by a sampled mask. Default: 0.5

Returns:

Hard Concrete Conv1d layer with the same weights as the layer argument

Return type:

HardConcreteConv1d

`Magnitude Pruning`

class NeuroSurgeon.Masking.magprune_layer.MagPruneLayer(*args: Any, **kwargs: Any)

Bases: MaskLayer

An abstract class defining the basic functionality of Magnitude Pruning layers. Magnitude pruning is not a differentiable masking strategy - it deterministically prunes the N% lowest magnitude weights. Masking always occurs at the weight-level. This pruning strategy might serve as a baseline to compare against gradient-based masking strategies (i.e. Continuous Sparsification or Hard Concrete Masking). This is a common strategy in model pruning, notably in work analyzing the Lottery Ticket Hypothesis (https://arxiv.org/abs/1803.03635)

Parameters:

ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms.
prune_percentage (float) – Determines the percentage of weights to prune

property prune_percentage

property force_resample

class NeuroSurgeon.Masking.magprune_layer.MagPruneLinear(*args: Any, **kwargs: Any)

Bases: MagPruneLayer

A Linear Layer that implements Magnitude Pruning.

Parameters:

in_features (int) – Size of each input sample
out_features (int) – Size of each output sample
bias (bool) – If set to False, the layer will not learn an additive bias. Default: True
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: torch.nn.Linear, ablation: str = 'none', mask_bias: bool = False, prune_percentage: float = 0.2)

Creates a MagPruneLinear layer from a nn.Linear layer.

Parameters:

layer (nn.Linear) – An instance of a nn.Linear layer.
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning Linear layer with the same weights as the layer argument

Return type:

MagPruneLinear

reset_parameters(): Reset network parameters.

forward(data: torch.Tensor, **kwargs) → torch.Tensor

Performs a forward pass

Parameters:: data (torch.Tensor) – Input tensors
Returns:: Output tensor
Return type:: torch.Tensor

class NeuroSurgeon.Masking.magprune_layer.MagPruneGPTConv1D(*args: Any, **kwargs: Any)

Bases: MagPruneLayer

A GPT-style Conv1D Layer that implements Magnitude Pruning.

Parameters:

nf (int) – Number of output features
nx (int) – Number of input features
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: transformers.pytorch_utils.Conv1D, ablation: str = 'none', mask_bias: bool = False, prune_percentage: float = 0.2)

Creates a MagPruneGPTConv1D layer from a Conv1D layer.

Parameters:

nf (int) – Number of output features
nx (int) – Number of input features
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. This has no effect on this layer, as there are no bias terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning GPTConv1D layer with the same weights as the layer argument

Return type:

MagPruneGPTConv1D

forward(x)

Performs a forward pass

Parameters:: data (torch.Tensor) – Input tensors
Returns:: Output tensor
Return type:: torch.Tensor

class NeuroSurgeon.Masking.magprune_layer.MagPruneConv2d(*args: Any, **kwargs: Any)

Bases: _MagPruneConv

A Conv2d layer that implements Magnitude Pruning.

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
padding (int) – Padding added to all four sides of the input. Default: 0
stride (int) – Stride of the convolution. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool) – If True, adds a learnable bias to the output. Default: True
padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: torch.nn.Conv2d, ablation: str = 'none', mask_bias: bool = False, prune_percentage: float = 0.2)

Create a MagPruneConv2d layer from a nn.Conv2d layer

Parameters:

layer (nn.Conv2d) – A nn.Conv2d layer
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning Conv2d layer with the same weights as the layer argument

Return type:

MagPruneConv2d

class NeuroSurgeon.Masking.magprune_layer.MagPruneConv1d(*args: Any, **kwargs: Any)

Bases: _MagPruneConv

A Conv1d layer that implements Magnitude Pruning.

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
padding (int) – Padding added to all four sides of the input. Default: 0
stride (int) – Stride of the convolution. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool) – If True, adds a learnable bias to the output. Default: True
padding_mode (str) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

classmethod from_layer(layer: torch.nn.Conv1d, ablation, mask_bias, prune_percentage)

Create a MagPruneConv1d layer from a nn.Conv1d layer

Parameters:

layer (nn.Conv1d) – A nn.Conv1d layer
ablation (str) –
A string that determines how masks are produced from the mask layer parameters. Valid options include:
- none: Producing a standard binary mask
- zero_ablate: Inverting the standard binary mask. Used for pruning discovered subnetworks.
- random_ablate: Inverting the standard binary mask and reinitializing zero’d elements. Used for pruning discovered subnetworks.
- randomly_sampled: Sampling a random binary mask of the same size as the standard mask.
- complement_sampled: Sampling a random binary mask of the same size as the standard mask from the complement set of entries as the standard mask.
mask_bias (bool) – Determines whether to mask bias terms in addition to weight terms. Default: False
prune_percentage (float) – The percentage of weights to prune. Default: 0.2

Returns:

Magnitude Pruning Conv1d layer with the same weights as the layer argument

Return type:

MagPruneConv1d

NeuroSurgeon.Masking package

Submodules

Mask Layer

Continuous Sparsification

Hard Concrete Masking

Magnitude Pruning

Module contents

`Mask Layer`

`Continuous Sparsification`

`Hard Concrete Masking`

`Magnitude Pruning`