sleap.nn.architectures.resnet#

ResNet-based backbones.

This module primarily generalizes the ResNet architectures for configurable output stride based on atrous convolutions, DeepLabv2-style (https://arxiv.org/abs/1606.00915).

ResNet variants that have pretrained weights can be loaded for transfer learning.

Based on the tf.keras.applications implementation: tensorflow/tensorflow

class sleap.nn.architectures.resnet.ResNet101(upsampling_stack: Optional[UpsamplingStack] = None, features_output_stride: int = 16, pretrained: bool = True, frozen: bool = False, skip_connections: bool = False, model_name=_Nothing.NOTHING, stack_configs=_Nothing.NOTHING)[source]#

ResNet101 backbone.

This model has a stack of 3, 4, 23 and 3 residual blocks.

upsampling_stack#

Definition of the upsampling layers that convert the ResNet backbone features into the output features with the desired stride. See the UpsamplingStack documentation for more. If not provided, the activations from the last backbone block will be the output.

Type:

Optional[sleap.nn.architectures.upsampling.UpsamplingStack]

features_output_stride#

Output stride of the standard ResNet backbone. Canonically, ResNets have 5 layers with 2-stride, resulting in a final feature output layer with stride of 32. If a lower value is specified, the strided convolution layers will be adjusted to have a stride of 1, but the receptive field is maintained by compensating with dilated (atrous) convolution kernel expansion, in the same style as DeepLabv2. Valid values are 1, 2, 4, 8, 16 or 32.

Type:

int

pretrained#

If True, initialize with weights pretrained on ImageNet. If False, random weights will be used.

Type:

bool

frozen#

If True, the backbone weights will be not be trainable. This is useful for fast fine-tuning of ResNet features, but relies on having an upsampling stack with sufficient representational capacity to adapt the fixed features.

Type:

bool

skip_connections#

If True, form skip connections between outputs of each block in the ResNet backbone and the upsampling stack.

Type:

bool

Note

This defines the ResNetv1 architecture, not v2.

class sleap.nn.architectures.resnet.ResNet152(upsampling_stack: Optional[UpsamplingStack] = None, features_output_stride: int = 16, pretrained: bool = True, frozen: bool = False, skip_connections: bool = False, model_name=_Nothing.NOTHING, stack_configs=_Nothing.NOTHING)[source]#

ResNet152 backbone.

This model has a stack of 3, 4, 23 and 3 residual blocks.

upsampling_stack#

Definition of the upsampling layers that convert the ResNet backbone features into the output features with the desired stride. See the UpsamplingStack documentation for more. If not provided, the activations from the last backbone block will be the output.

Type:

Optional[sleap.nn.architectures.upsampling.UpsamplingStack]

features_output_stride#

Output stride of the standard ResNet backbone. Canonically, ResNets have 5 layers with 2-stride, resulting in a final feature output layer with stride of 32. If a lower value is specified, the strided convolution layers will be adjusted to have a stride of 1, but the receptive field is maintained by compensating with dilated (atrous) convolution kernel expansion, in the same style as DeepLabv2. Valid values are 1, 2, 4, 8, 16 or 32.

Type:

int

pretrained#

If True, initialize with weights pretrained on ImageNet. If False, random weights will be used.

Type:

bool

frozen#

If True, the backbone weights will be not be trainable. This is useful for fast fine-tuning of ResNet features, but relies on having an upsampling stack with sufficient representational capacity to adapt the fixed features.

Type:

bool

skip_connections#

If True, form skip connections between outputs of each block in the ResNet backbone and the upsampling stack.

Type:

bool

Note

This defines the ResNetv1 architecture, not v2.

class sleap.nn.architectures.resnet.ResNet50(upsampling_stack: Optional[UpsamplingStack] = None, features_output_stride: int = 16, pretrained: bool = True, frozen: bool = False, skip_connections: bool = False, model_name=_Nothing.NOTHING, stack_configs=_Nothing.NOTHING)[source]#

ResNet50 backbone.

This model has a stack of 3, 4, 6 and 3 residual blocks.

upsampling_stack#

Definition of the upsampling layers that convert the ResNet backbone features into the output features with the desired stride. See the UpsamplingStack documentation for more. If not provided, the activations from the last backbone block will be the output.

Type:

Optional[sleap.nn.architectures.upsampling.UpsamplingStack]

features_output_stride#

Output stride of the standard ResNet backbone. Canonically, ResNets have 5 layers with 2-stride, resulting in a final feature output layer with stride of 32. If a lower value is specified, the strided convolution layers will be adjusted to have a stride of 1, but the receptive field is maintained by compensating with dilated (atrous) convolution kernel expansion, in the same style as DeepLabv2. Valid values are 1, 2, 4, 8, 16 or 32.

Type:

int

pretrained#

If True, initialize with weights pretrained on ImageNet. If False, random weights will be used.

Type:

bool

frozen#

If True, the backbone weights will be not be trainable. This is useful for fast fine-tuning of ResNet features, but relies on having an upsampling stack with sufficient representational capacity to adapt the fixed features.

Type:

bool

skip_connections#

If True, form skip connections between outputs of each block in the ResNet backbone and the upsampling stack.

Type:

bool

Note

This defines the ResNetv1 architecture, not v2.

class sleap.nn.architectures.resnet.ResNetv1(model_name: str, stack_configs: Sequence[Mapping[str, Any]], upsampling_stack: Optional[UpsamplingStack] = None, features_output_stride: int = 16, pretrained: bool = True, frozen: bool = False, skip_connections: bool = False)[source]#

ResNetv1 backbone with configurable output stride and pretrained weights.

model_name#

Backbone name. Must be one of “resnet50”, “resnet101”, or “resnet152” if using pretrained weights.

Type:

str

stack_configs#

List of dictionaries containing the keyword arguments for each stack. The stack_fn will be called consecutively with each element of stack_configs expanded as keyword arguments. Each element must contain the “stride1” key specifying the stride of the first layer of the stack. This may be adjusted to achieve the desired target output stride by converting strided convs into dilated convs.

Type:

Sequence[Mapping[str, Any]]

upsampling_stack#

Definition of the upsampling layers that convert the ResNet backbone features into the output features with the desired stride. See the UpsamplingStack documentation for more. If not provided, the activations from the last backbone block will be the output.

Type:

Optional[sleap.nn.architectures.upsampling.UpsamplingStack]

features_output_stride#

Output stride of the standard ResNet backbone. Canonically, ResNets have 5 layers with 2-stride, resulting in a final feature output layer with stride of 32. If a lower value is specified, the strided convolution layers will be adjusted to have a stride of 1, but the receptive field is maintained by compensating with dilated (atrous) convolution kernel expansion, in the same style as DeepLabv2. Valid values are 1, 2, 4, 8, 16 or 32.

Type:

int

pretrained#

If True, initialize with weights pretrained on ImageNet. If False, random weights will be used.

Type:

bool

frozen#

If True, the backbone weights will be not be trainable. This is useful for fast fine-tuning of ResNet features, but relies on having an upsampling stack with sufficient representational capacity to adapt the fixed features.

Type:

bool

skip_connections#

If True, form skip connections between outputs of each block in the ResNet backbone and the upsampling stack.

Type:

bool

Note

This defines the ResNetv1 architecture, not v2.

property down_blocks: int#

Return the number of downsampling steps in the model.

classmethod from_config(config: ResNetConfig) ResNetv1[source]#

Create a model from a set of configuration parameters.

Parameters:

config – An ResNetConfig instance with the desired parameters.

Returns:

An instance of this class with the specified configuration.

make_backbone(x_in: Tensor) Tuple[Tensor, List[IntermediateFeature]][source]#

Create the full backbone starting with the specified input tensor.

Parameters:

x_in – Input tensor of shape (samples, height, width, channels).

Returns:

A tuple of the final output tensor at the stride specified by the upsampling_stack.features_output_stride class attribute, and a list of intermediate tensors after each upsampling step.

The intermediate features are useful when creating multi-head architectures with different output strides for the heads.

property maximum_stride: int#

Return the maximum stride that the input must be divisible by.

property output_scale: float#

Return relative scaling factor of this backbone.

property output_stride: int#

Return stride of the output of the backbone.

sleap.nn.architectures.resnet.block_v1(x: Tensor, filters: int, kernel_size: int = 3, stride: int = 1, dilation_rate: int = 1, conv_shortcut: bool = True, name: Optional[str] = None) Tensor[source]#

Create a ResNetv1 residual block.

Parameters:
  • x – input tensor.

  • filters – integer, filters of the bottleneck layer.

  • kernel_size – default 3, kernel size of the bottleneck layer.

  • stride – default 1, stride of the first layer.

  • dilation_rate – default 1, atrous convolution dilation rate of first layer.

  • conv_shortcut – default True, use convolution shortcut if True, otherwise identity shortcut.

  • name – string, block label.

Returns:

Output tensor for the residual block.

sleap.nn.architectures.resnet.imagenet_preproc_v1(X: Tensor) Tensor[source]#

Preprocess images according to ImageNet/caffe/channels_last.

Parameters:

X – Tensor of shape (samples, height, width, 3) of dtype float32 with values in the range [0, 1]. The channels axis is in RGB ordering.

Returns:

Tensor of the same shape and dtype with channels reversed to BGR ordering and values scaled to [0, 255] and subtracted by the ImageNet/caffe pretrained model channel means (103.939, 116.779, 123.68) for BGR respectively. The effective range of values will then be around ~[-128, 127].

sleap.nn.architectures.resnet.make_backbone_fn(stack_fn: Callable[[Tensor, Any], Tuple[Tensor, List[IntermediateFeature]]], stack_configs: Sequence[Mapping[str, Any]], output_stride: int) Callable[[Tensor, int], Tensor][source]#

Return a function that creates a block stack with output stride adjustments.

Parameters:
  • stack_fn – Function that takes a tensor as the first positional argument, followed by any number of keyword arguments. This function will construct each stack of blocks in the backbone.

  • stack_configs – List of dictionaries containing the keyword arguments for each stack. The stack_fn will be called consecutively with each element of stack_configs expanded as keyword arguments. Each element must contain the “stride1” key specifying the stride of the first layer of the stack. This may be adjusted to achieve the desired target output stride by converting strided convs into dilated convs.

  • output_stride – The desired target output stride. The final output of the returned backbone creation function will be at this stride relative to the input stride.

Returns:

Function that creates the backbone stacks based on the stack_configs.

This function will have the signature:

x_out, intermediate_feats = backbone_fn(x_in, current_stride)

The current stride describes the stride of the x_in input tensor.

Raises:

ValueError – If the desired output stride cannot be achieved.

sleap.nn.architectures.resnet.make_resnet_model(backbone_fn: Callable[[Tensor, int], Tensor], preact: bool = False, use_bias: bool = True, model_name: str = 'resnet', weights: str = 'imagenet', input_tensor: Optional[Tensor] = None, input_shape: Optional[Tuple[int]] = None, stem_filters: int = 64, stem_stride1: int = 2, stem_stride2: int = 2) Tuple[Model, List[IntermediateFeature]][source]#

Instantiate the ResNet, ResNetV2 (TODO), and ResNeXt (TODO) architecture.

Optionally loads weights pre-trained on ImageNet.

Parameters:
  • backbone_fn – a function that returns output tensor for the stacked residual blocks.

  • preact – whether to use pre-activation or not (True for ResNetV2, False for ResNet and ResNeXt).

  • use_bias – whether to use biases for convolutional layers or not (True for ResNet and ResNetV2, False for ResNeXt).

  • model_name – string, model name.

  • include_top – whether to include the fully-connected layer at the top of the network.

  • weights – one of None (random initialization), ‘imagenet’ (pre-training on ImageNet), or the path to the weights file to be loaded.

  • input_tensor – optional Keras tensor (i.e. output of tf.keras.layers.Input()) to use as image input for the model.

  • input_shape – optional shape tuple, only to be specified if include_top is False (otherwise the input shape has to be (224, 224, 3) (with channels_last data format) or (3, 224, 224) (with channels_first data format). It should have exactly 3 inputs channels.

Returns:

A tuple of the tf.keras.Model mapping input to final feature outputs, and a list of `IntermediateFeature`s from every block in the backbone.

Raises:

ValueError – in case of invalid argument for weights.

sleap.nn.architectures.resnet.stack_v1(x: Tensor, filters: int, blocks: int, stride1: int = 2, dilation_rate: int = 1, name: Optional[str] = None) Tensor[source]#

Create a set of stacked ResNetv1 residual blocks.

Parameters:
  • x – input tensor.

  • filters – integer, filters of the bottleneck layer in a block.

  • blocks – integer, blocks in the stacked blocks.

  • stride1 – default 2, stride of the first layer in the first block.

  • dilation_rate – default 1, atrous convolution dilation rate of first layer in the first block.

  • name – string, stack label.

Returns:

Output tensor for the stacked blocks.

sleap.nn.architectures.resnet.tile_channels(X: Tensor) Tensor[source]#

Tile single channel to 3 channel tensor.

This functon is useful to replicate grayscale single-channel images into 3-channel monochrome RGB images.

Parameters:

X – Tensor of shape (samples, height, width, 1).

Returns:

Tensor of shape (samples, height, width, 3) where the channels are identical.