Training

sleap.nn.training

SLEAP model training.

class sleap.nn.training.NodeLoss(node_ind, name='node_loss', **kwargs)[source]

Compute node-specific loss.

Useful for monitoring the MSE for specific body parts (channels).

node_ind

Index of channel to compute MSE for.

name

Name of the loss tensor.

result()[source]

Computes and returns the metric value tensor.

Result computation is an idempotent operation that simply calculates the metric value using the state variables.

update_state(y_gt, y_pr, sample_weight=None)[source]

Accumulates statistics for the metric.

Note: This function is executed as a graph function in graph mode. This means:

  1. Operations on the same resource are executed in textual order. This should make it easier to do things like add the updated value of a variable to another, for example.

  2. You don’t need to worry about collecting the update ops to execute. All update ops added to the graph by this function will be executed.

As a result, code should generally work the same way with graph or eager execution.

Please use tf.config.experimental_run_functions_eagerly(True) to execute this function eagerly for debugging or profiling.

Parameters
  • *args

  • **kwargs – A mini-batch of inputs to the Metric.

class sleap.nn.training.OHKMLoss(K=2, weight=5, name='ohkm', **kwargs)[source]

Online hard keypoint mining loss.

This loss serves to dynamically reweight the MSE of the top-K worst channels in each batch. This is useful when fine tuning a model to improve performance on a hard part to optimize for (e.g., small, hard to see, often not visible).

Note: This works with any type of channel, so it can work for PAFs as well.

K

Number of worst performing channels to compute loss for.

weight

Scalar factor to multiply with the MSE for the top-K worst channels.

name

Name of the loss tensor.

sleap.nn.training.main()[source]

CLI for training.

sleap.nn.architectures

class sleap.nn.architectures.hourglass.StackedHourglass(num_stacks: int = 3, num_filters: int = 32, depth: int = 3, batch_norm: bool = True, intermediate_inputs: bool = True, upsampling_layers: bool = True, interp: str = 'bilinear', initial_stride: int = 1)[source]

Stacked hourglass block.

This function builds and connects multiple hourglass blocks. See hourglass for more specifics on the implementation.

Individual hourglasses can be customized by providing an iterable of hyperparameters for each of the arguments of the function (except num_output_channels). If scalars are provided, all hourglasses will share the same hyperparameters.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. If the number of channels are not the same as num_filters, an additional residual block is applied to this input.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • num_filters – The number feature channels of the block. These features are used throughout the hourglass and will be passed on to the next block and need not match the num_output_channels. Must be divisible by 2.

  • depth – The number of pooling steps applied to the input. The input must be a tensor with 2^depth height and width to allow for symmetric pooling and upsampling with skip connections.

  • batch_norm – Apply batch normalization after each convolution

  • intermediate_inputs – Re-introduce the input tensor x_in after each hourglass by concatenating with intermediate outputs

  • upsampling_layers – Use upsampling instead of transposed convolutions.

  • interp – Method to use for interpolation when upsampling smaller features.

  • initial_stride – Stride of first convolution to use for reducing input resolution.

output(x_in, num_output_channels)[source]

Generate a tensorflow graph for the backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width

  • are divisible by `2^down_blocks. (that) –

  • num_output_channels – The number of output channels of the block. These

  • the final output tensors on which intermediate supervision may be (are) –

  • applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

sleap.nn.architectures.hourglass.hourglass_block(x_in, num_output_channels, num_filters, depth=3, batch_norm=True, upsampling_layers=True, interp='bilinear')[source]

Creates a single hourglass block.

This function builds an hourglass block from residual blocks and max pooling.

The hourglass is defined as a set of depth residual blocks followed by 2-strided max pooling for downsampling, then an intermediate residual block, followed by depth blocks of upsampling -> skip Add -> residual blocks.

The output tensors are then produced by linear activation with 1x1 convs.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have num_filters channels since the hourglass adds a residual to this input.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • num_filters – The number feature channels of the block. These features are used throughout the hourglass and will be passed on to the next block and need not match the num_output_channels. Must be divisible by 2.

  • depth – The number of pooling steps applied to the input. The input must be a tensor with 2^depth height and width to allow for symmetric pooling and upsampling with skip connections.

  • batch_norm – Apply batch normalization after each convolution

  • upsampling_layers – Use upsampling instead of transposed convolutions.

  • interp – Method to use for interpolation when upsampling smaller features.

Returns

tf.Tensor of the features output by the block with num_filters

channels. This tensor can be passed on to the next hourglass or ignored if this is the last hourglass.

x_out: tf.Tensor of the output of the block of the same width and height

as the input with num_output_channels channels.

Return type

x

sleap.nn.architectures.hourglass.stacked_hourglass(x_in, num_output_channels, num_stacks=3, num_filters=32, depth=3, batch_norm=True, intermediate_inputs=True, upsampling_layers=True, interp='bilinear', initial_stride=1)[source]

Stacked hourglass block.

This function builds and connects multiple hourglass blocks. See hourglass for more specifics on the implementation.

Individual hourglasses can be customized by providing an iterable of hyperparameters for each of the arguments of the function (except num_output_channels). If scalars are provided, all hourglasses will share the same hyperparameters.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. If the number of channels are not the same as num_filters, an additional residual block is applied to this input.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • num_filters – The number feature channels of the block. These features are used throughout the hourglass and will be passed on to the next block and need not match the num_output_channels. Must be divisible by 2.

  • depth – The number of pooling steps applied to the input. The input must be a tensor with 2^depth height and width to allow for symmetric pooling and upsampling with skip connections.

  • batch_norm – Apply batch normalization after each convolution

  • intermediate_inputs – Re-introduce the input tensor x_in after each hourglass by concatenating with intermediate outputs

  • upsampling_layers – Use upsampling instead of transposed convolutions.

  • interp – Method to use for interpolation when upsampling smaller features.

  • initial_stride – Stride of first convolution to use for reducing input resolution.

Returns

List of tf.Tensors of the output of the block of the same width and height

as the input with num_output_channels channels.

Return type

x_outs

Implements wrappers for constructing (optionally pretrained) DenseNets.

See original paper: https://arxiv.org/abs/1608.06993

class sleap.nn.architectures.densenet.DenseNet121(upsampling_layers: bool = True, interp: str = 'bilinear', up_blocks: int = 5, refine_conv_up: bool = False, pretrained: bool = True)[source]

DenseNet121 backbone.

This backbone has ~7M params.

upsampling_layers

Use upsampling instead of transposed convolutions.

interp

Method to use for interpolation when upsampling smaller features.

up_blocks

Number of upsampling steps to perform. The backbone reduces the output scale by 1/32. If set to 5, outputs will be upsampled to the input resolution.

refine_conv_up

If true, applies a 1x1 conv after each upsampling step.

pretrained

Load pretrained ImageNet weights for transfer learning. If False, random weights are used for initialization.

property down_blocks

Returns the number of downsampling steps in the model.

output(x_in, num_output_channels)[source]

Builds the layers for this backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by `2^down_blocks.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

property output_scale

Returns relative scaling factor of this backbone.

class sleap.nn.architectures.densenet.DenseNet169(upsampling_layers: bool = True, interp: str = 'bilinear', up_blocks: int = 5, refine_conv_up: bool = False, pretrained: bool = True)[source]

DenseNet169 backbone.

This backbone has ~12.6M params.

upsampling_layers

Use upsampling instead of transposed convolutions.

interp

Method to use for interpolation when upsampling smaller features.

up_blocks

Number of upsampling steps to perform. The backbone reduces the output scale by 1/32. If set to 5, outputs will be upsampled to the input resolution.

refine_conv_up

If true, applies a 1x1 conv after each upsampling step.

pretrained

Load pretrained ImageNet weights for transfer learning. If False, random weights are used for initialization.

property down_blocks

Returns the number of downsampling steps in the model.

output(x_in, num_output_channels)[source]

Builds the layers for this backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by `2^down_blocks.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

property output_scale

Returns relative scaling factor of this backbone.

class sleap.nn.architectures.densenet.DenseNet201(upsampling_layers: bool = True, interp: str = 'bilinear', up_blocks: int = 5, refine_conv_up: bool = False, pretrained: bool = True)[source]

DenseNet201 backbone.

This backbone has ~18.3M params.

upsampling_layers

Use upsampling instead of transposed convolutions.

interp

Method to use for interpolation when upsampling smaller features.

up_blocks

Number of upsampling steps to perform. The backbone reduces the output scale by 1/32. If set to 5, outputs will be upsampled to the input resolution.

refine_conv_up

If true, applies a 1x1 conv after each upsampling step.

pretrained

Load pretrained ImageNet weights for transfer learning. If False, random weights are used for initialization.

property down_blocks

Returns the number of downsampling steps in the model.

output(x_in, num_output_channels)[source]

Builds the layers for this backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by `2^down_blocks.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

property output_scale

Returns relative scaling factor of this backbone.

class sleap.nn.architectures.densenet.GeneralizedDenseNet(n_dense_blocks_1: int = 3, n_dense_blocks_2: int = 6, n_dense_blocks_3: int = 12, n_dense_blocks_4: int = 8, upsampling_layers: bool = True, interp: str = 'bilinear', up_blocks: int = 5, refine_conv_up: bool = False)[source]

Generalized version of the 4-block DenseNet backbone.

This allows for selecting the number of blocks in each dense layer, but cannot use pretrained weights since the configuration may not have been previously used.

n_dense_blocks_1

Number of blocks in dense layer 1.

n_dense_blocks_2

Number of blocks in dense layer 2.

n_dense_blocks_3

Number of blocks in dense layer 3.

n_dense_blocks_4

Number of blocks in dense layer 4.

upsampling_layers

Use upsampling instead of transposed convolutions.

interp

Method to use for interpolation when upsampling smaller features.

up_blocks

Number of upsampling steps to perform. The backbone reduces the output scale by 1/32. If set to 5, outputs will be upsampled to the input resolution.

refine_conv_up

If true, applies a 1x1 conv after each upsampling step.

property down_blocks

Returns the number of downsampling steps in the model.

output(x_in, num_output_channels)[source]

Builds the layers for this backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by `2^down_blocks.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

property output_scale

Returns relative scaling factor of this backbone.

class sleap.nn.architectures.densenet.UDenseNet(stem_stride: int = 1, stem_filters: int = 64, dense_blocks: List[int] = [2, 4, 6, 8], output_scale: Union[float, List[float]] = 1.0, n_heads: int = 1, head_filters: Union[int, List[int]] = 64)[source]

UDenseNet backbone, a UNet-like architecture with skip connections to heads.

stem_stride

Initial downsampling stride in the stem block.

stem_filters

Initial number of conv filters in the stem block.

dense_blocks

List of integers defining the size of each dense block. Can be of any length > 0.

output_scale

Scale of the output tensor relative to the input.

n_heads

Number of heads to produce. Intermediate heads will pass features from every scale to the next head, starting from the first backbone with dense blocks and transitions.

head_filters

Filters to use in each head block after concatenation with previous filters.

property down_blocks

Returns the number of downsampling steps in the model.

output(x_in, n_output_channels)[source]

Builds the layers for this backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor.

  • n_output_channels – Number of output channels.

Returns

A tf.keras.Model with as many outputs as the n_heads attribute for this

model.

class sleap.nn.architectures.leap.LeapCNN(down_blocks: int = 3, up_blocks: int = 3, upsampling_layers: int = True, num_filters: int = 64, interp: str = 'bilinear')[source]

LEAP CNN block.

Implementation generalized from original paper (Pereira et al., 2019).

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by 2^down_blocks.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • down_blocks – The number of pooling steps applied to the input. The input must be a tensor with 2^down_blocks height and width.

  • up_blocks – The number of upsampling steps applied after downsampling.

  • upsampling_layers – If True, use upsampling instead of transposed convs.

  • num_filters – The base number feature channels of the block. The number of filters is doubled at each pooling step.

  • interp – Method to use for interpolation when upsampling smaller features.

output(x_in, num_output_channels)[source]

Generate a tensorflow graph for the backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width

  • are divisible by `2^down_blocks. (that) –

  • num_output_channels – The number of output channels of the block. These

  • the final output tensors on which intermediate supervision may be (are) –

  • applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

sleap.nn.architectures.leap.leap_cnn(x_in, num_output_channels, down_blocks=3, up_blocks=3, upsampling_layers=True, num_filters=64, interp='bilinear')[source]

LEAP CNN block.

Implementation generalized from original paper (Pereira et al., 2019).

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by 2^down_blocks.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • down_blocks – The number of pooling steps applied to the input. The input must be a tensor with 2^down_blocks height and width.

  • up_blocks – The number of upsampling steps applied after downsampling.

  • upsampling_layers – If True, use upsampling instead of transposed convs.

  • num_filters – The base number feature channels of the block. The number of filters is doubled at each pooling step.

  • interp – Method to use for interpolation when upsampling smaller features.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

class sleap.nn.architectures.unet.StackedUNet(num_stacks: int = 3, depth: int = 3, convs_per_depth: int = 2, num_filters: int = 16, kernel_size: int = 5, upsampling_layers: bool = True, intermediate_inputs: bool = True, interp: str = 'bilinear')[source]

Stacked U-net block.

See unet for more specifics on the implementation.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by 2^depth.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • num_stacks – The number of blocks to stack on top of each other.

  • depth – The number of pooling steps applied to the input. The input must be a tensor with 2^depth height and width to allow for symmetric pooling and upsampling with skip connections.

  • convs_per_depth – The number of convolutions applied before pooling or after upsampling.

  • num_filters – The base number feature channels of the block. The number of filters is doubled at each pooling step.

  • kernel_size – Size of the convolutional kernels for each filter.

  • intermediate_inputs – Re-introduce the input tensor x_in after each block by concatenating with intermediate outputs

  • upsampling_layers – Use upsampling instead of transposed convolutions.

  • interp – Method to use for interpolation when upsampling smaller features.

output(x_in, num_output_channels)[source]

Generate a tensorflow graph for the backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width

  • are divisible by `2^down_blocks. (that) –

  • num_output_channels – The number of output channels of the block. These

  • the final output tensors on which intermediate supervision may be (are) –

  • applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

class sleap.nn.architectures.unet.UNet(down_blocks: int = 3, up_blocks: int = 3, convs_per_depth: int = 2, num_filters: int = 16, kernel_size: int = 5, upsampling_layers: bool = True, interp: str = 'bilinear')[source]

U-net block.

Implementation based off of CARE.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by 2^depth.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • down_blocks – The number of pooling steps applied to the input. The input must be a tensor with 2^depth height and width to allow for symmetric pooling and upsampling with skip connections.

  • up_blocks – The number of upsampling steps applied after downsampling.

  • convs_per_depth – The number of convolutions applied before pooling or after upsampling.

  • num_filters – The base number feature channels of the block. The number of filters is doubled at each pooling step.

  • kernel_size – Size of the convolutional kernels for each filter.

  • upsampling_layers – Use upsampling instead of transposed convolutions.

  • interp – Method to use for interpolation when upsampling smaller features.

output(x_in, num_output_channels)[source]

Generate a tensorflow graph for the backbone and return the output tensor.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width

  • are divisible by `2^down_blocks. (that) –

  • num_output_channels – The number of output channels of the block. These

  • the final output tensors on which intermediate supervision may be (are) –

  • applied.

Returns

tf.Tensor of the output of the block of with num_output_channels channels.

Return type

x_out

sleap.nn.architectures.unet.stacked_unet(x_in, num_output_channels, num_stacks=3, depth=3, convs_per_depth=2, num_filters=16, kernel_size=5, upsampling_layers=True, intermediate_inputs=True, interp='bilinear')[source]

Stacked U-net block.

See unet for more specifics on the implementation.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by 2^depth.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • num_stacks – The number of blocks to stack on top of each other.

  • depth – The number of pooling steps applied to the input. The input must be a tensor with 2^depth height and width to allow for symmetric pooling and upsampling with skip connections.

  • convs_per_depth – The number of convolutions applied before pooling or after upsampling.

  • num_filters – The base number feature channels of the block. The number of filters is doubled at each pooling step.

  • kernel_size – Size of the convolutional kernels for each filter.

  • intermediate_inputs – Re-introduce the input tensor x_in after each block by concatenating with intermediate outputs

  • upsampling_layers – Use upsampling instead of transposed convolutions.

  • interp – Method to use for interpolation when upsampling smaller features.

Returns

tf.Tensor of the output of the block of the same width and height

as the input with num_output_channels channels.

Return type

x_outs

sleap.nn.architectures.unet.unet(x_in, num_output_channels, down_blocks=3, up_blocks=3, convs_per_depth=2, num_filters=16, kernel_size=5, upsampling_layers=True, interp='bilinear')[source]

U-net block.

Implementation based off of CARE.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer. Must have height and width that are divisible by 2^depth.

  • num_output_channels – The number of output channels of the block. These are the final output tensors on which intermediate supervision may be applied.

  • down_blocks – The number of pooling steps applied to the input. The input must be a tensor with 2^depth height and width to allow for symmetric pooling and upsampling with skip connections.

  • up_blocks – The number of upsampling steps applied after downsampling.

  • convs_per_depth – The number of convolutions applied before pooling or after upsampling.

  • num_filters – The base number feature channels of the block. The number of filters is doubled at each pooling step.

  • kernel_size – Size of the convolutional kernels for each filter.

  • upsampling_layers – Use upsampling instead of transposed convolutions.

  • interp – Method to use for interpolation when upsampling smaller features.

Returns

tf.Tensor of the output of the block of the same width and height

as the input with num_output_channels channels.

Return type

x_out

sleap.nn.architectures.common.conv(num_filters, kernel_size=(3, 3), activation='relu', **kwargs)[source]

Convenience presets for Conv2D.

Parameters
  • num_filters – Number of output filters (channels)

  • kernel_size – Size of convolution kernel

  • activation – Activation function applied to output

  • **kwargs – Arbitrary keyword arguments passed on to keras.layers.Conv2D

Returns

keras.layers.Conv2D instance built with presets

sleap.nn.architectures.common.conv1(num_filters, **kwargs)[source]

Convenience presets for 1x1 Conv2D.

Parameters
  • num_filters – Number of output filters (channels)

  • **kwargs – Arbitrary keyword arguments passed on to keras.layers.Conv2D

Returns

keras.layers.Conv2D instance built with presets

sleap.nn.architectures.common.conv3(num_filters, **kwargs)[source]

Convenience presets for 3x3 Conv2D.

Parameters
  • num_filters – Number of output filters (channels)

  • **kwargs – Arbitrary keyword arguments passed on to keras.layers.Conv2D

Returns

keras.layers.Conv2D instance built with presets

sleap.nn.architectures.common.expand_to_n(x, n)[source]

Expands an object x to n elements if scalar.

This is a utility function that wraps np.tile functionality.

Parameters
  • x – Scalar of any type

  • n – Number of repetitions

Returns

Tiled version of x with __len__ == n.

sleap.nn.architectures.common.residual_block(x_in, num_filters=None, batch_norm=True)[source]

Residual bottleneck block.

This function builds a residual block that is used at every step of stacked hourglass construction. Note that the layers are actually instantiated and connected.

The bottleneck is constructed by applying a 1x1 conv with num_filters / 2 channels, a 3x3 conv with num_filters / 2 channels, and a 1x1 conv with num_filters. The output of this last conv is skip-connected with the input via an Add layer (the residual).

If the input x_in has a different number of channels as num_filters, an additional 1x1 conv is applied to the input whose output will be used for the skip connection.

Parameters
  • x_in – Input 4-D tf.Tensor or instantiated layer

  • num_filters – The number output channels of the block. If not specified, defaults to the same number of channels as the input tensor. Must be divisible by 2 since the bottleneck halves the number of filters in the intermediate convs.

  • batch_norm – Apply batch normalization after each convolution

Returns

tf.Tensor of the output of the block of the same width and height

as the input with num_filters channels.

Return type

x_out

sleap.nn.architectures.common.scale_input(X)[source]

Rescale input to [-1, 1].

sleap.nn.architectures.common.tile_channels(X)[source]

Tiles single channel to 3 channel.

sleap.nn.architectures.common.upsampled_average_block(tensors: List[tensorflow.python.framework.ops.Tensor], target_size: int = None, interp: str = 'bilinear') → tensorflow.python.framework.ops.Tensor[source]

Upsamples tensors to a common size and reduce by averaging.

Parameters
  • tensors – A list of tensors of possibly different heights/widths, but the same number of channels.

  • target_size – Size that the tensors be upsampled to. If None, this is set to the size of the largest tensor in the list.

  • interp – Interpolation method (“nearest” or “bilinear”).

Returns

A single tensor with the target size that is the average of all of the input tensors.