sleap.nn.data.normalization#

Transformers for normalizing data formats.

class sleap.nn.data.normalization.Normalizer(image_key: str = 'image', ensure_float: bool = True, ensure_rgb: bool = False, ensure_grayscale: bool = False, imagenet_mode: Optional[str] = None)[source]#

Data transformer to normalize images.

This is useful as a transformation to data streams that require specific data ranges such as for pretrained models with specific preprocessing constraints.

image_key#

String name of the key containing the images to normalize.

Type:

str

ensure_float#

If True, converts the image to a tf.float32 if not already.

Type:

bool

ensure_rgb#

If True, converts the image to RGB if not already.

Type:

bool

ensure_grayscale#

If True, converts the image to grayscale if not already.

Type:

bool

imagenet_mode#

Specifies an ImageNet-based normalization mode commonly used in tf.keras.applications-based pretrained models. No effect if not set. Valid values are: “tf”: Values will be scaled to [-1, 1], expanded to RGB if grayscale. “caffe”: Values will be scaled to [0, 255], expanded to RGB if grayscale,

RGB channels flipped to BGR, and subtracted by a fixed mean.

“torch”: Values will be scaled to [0, 1], expanded to RGB if grayscale,

subtracted by a fixed mean, and scaled by fixed standard deviation.

Type:

Optional[str]

classmethod from_config(config: PreprocessingConfig, image_key: str = 'image') Normalizer[source]#

Build an instance of this class from its configuration options.

Parameters:
  • config – An PreprocessingConfig instance with the desired parameters.

  • image_key – String name of the key containing the images to normalize.

Returns:

An instance of this class.

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset that contains centroids computed from the inputs.

Parameters:

ds_input – A dataset with image key specified in the image_key attribute.

Returns:

A tf.data.Dataset with elements containing the same images with normalization applied.

sleap.nn.data.normalization.convert_rgb_to_bgr(image: Tensor) Tensor[source]#

Convert an RGB image to BGR format by reversing the channel order.

Parameters:

image – Tensor of any dtype with shape (…, 3) in RGB format. If grayscale, the image will be converted to RGB first.

Returns:

The input image with the channels axis reversed.

sleap.nn.data.normalization.ensure_float(image: Tensor) Tensor[source]#

Convert the image to a tf.float32.

Parameters:

image – Tensor of any dtype.

Returns:

A tensor of the same shape as image but with dtype tf.float32. If the image was already of tf.float32 dtype, it will not be changed.

If the input was of an integer type, it will be scaled to the range [0, 1] according to the dtype’s maximum value.

See also: tf.image.convert_image_dtype

sleap.nn.data.normalization.ensure_grayscale(image: Tensor) Tensor[source]#

Convert image to grayscale if in RGB format.

Parameters:

image – Tensor of any dtype of shape (height, width, channels). Channels are expected to be 1 or 3.

Returns:

A grayscale image of shape (height, width, 1) of the same dtype as the input.

See also: tf.image.rgb_to_grayscale

sleap.nn.data.normalization.ensure_int(image: Tensor) Tensor[source]#

Convert the image to a tf.uint8.

If the image is a floating dtype, then converts and scales data from [0, 1] to [0, 255] as needed. Otherwise, returns image as is.

Parameters:

image – Tensor of any dtype.

Returns:

A tensor of the same shape as image but with dtype tf.uint8. If the image was not a floating dtype, then it will not be changed.

If the input was float with range [0, 1], it will be scaled to [0, 255].

sleap.nn.data.normalization.ensure_min_image_rank(image: Tensor) Tensor[source]#

Expand the image to a minimum rank of 3 by adding single dimensions.

Parameters:

image – Tensor of any rank and dtype.

Returns:

The expanded image to a minimum rank of 3.

If the input was rank-2, it is assumed be of shape (height, width), so a singleton channels axis is appended to produce a tensor of shape (height, width, 1).

If the image was already of rank >= 3, it will be returned without changes.

See also: sleap.nn.data.utils.expand_to_rank

sleap.nn.data.normalization.ensure_rgb(image: Tensor) Tensor[source]#

Convert image to RGB if in grayscale format.

Parameters:

image – Tensor of any dtype of shape (height, width, channels). Channels are expected to be 1 or 3.

Returns:

A grayscale image of shape (height, width, 1) of the same dtype as the input.

See also: tf.image.grayscale_to_rgb

sleap.nn.data.normalization.scale_image_range(image: Tensor, min_val: float, max_val: float) Tensor[source]#

Scale the range of image values.

Parameters:
  • image – Tensor of any shape of dtype tf.float32 with values in the range [0, 1].

  • min_val – The minimum number that values will be scaled to.

  • max_val – The maximum number that values will be scaled to.

Returns:

The scaled image of the same shape and dtype tf.float32. Values in the input that were 0 will now be scaled to min_val, and values that were 1.0 will be scaled to max_val.

sleap.nn.data.normalization.scale_to_imagenet_caffe_mode(image: Tensor) Tensor[source]#

Scale images according to the “caffe” preprocessing mode.

This applies the preprocessing operations implemented in tf.keras.applications for models pretrained on ImageNet.

Parameters:

image – Any image tensor of rank >= 2. If rank >=3, the last axis is assumed to be of size 3 corresponding to RGB-ordered channels.

Returns:

The preprocessed image of dtype tf.float32 and shape (…, height, width, 3) with BGR channel ordering.

Values will be in the approximate range of [-127.5, 127.5].

Notes

The preprocessing steps applied are:
  1. If needed, expand to rank-3 by adding singleton dimensions to the end. This assumes rank-2 images are grayscale of shape (height, width) and will be expanded to (height, width, 1).

  2. Convert to RGB if not already in 3 channel format.

  3. Reverse the channel ordering to convert RGB to BGR format.

  4. Convert to tf.float32 in the range [0.0, 1.0].

  5. Scale the values to the range [0.0, 255.0].

  6. Subtract the ImageNet mean values (103.939, 116.779, 123.68) for channels in BGR format.

This preprocessing mode is required when using pretrained ResNetV1 models.

sleap.nn.data.normalization.scale_to_imagenet_tf_mode(image: Tensor) Tensor[source]#

Scale images according to the “tf” preprocessing mode.

This applies the preprocessing operations implemented in tf.keras.applications for models pretrained on ImageNet.

Parameters:

image – Any image tensor of rank >= 2.

Returns:

The preprocessed image of dtype tf.float32 and shape (…, height, width, 3) with RGB channel ordering.

Values will be in the range [-1.0, 1.0].

Notes

The preprocessing steps applied are:
  1. If needed, expand to rank-3 by adding singleton dimensions to the end. This assumes rank-2 images are grayscale of shape (height, width) and will be expanded to (height, width, 1).

  2. Convert to RGB if not already in 3 channel format.

  3. Convert to tf.float32 in the range [0.0, 1.0].

  4. Scale the values to the range [-1.0, 1.0].

This preprocessing mode is required when using pretrained ResNetV2, MobileNetV1, MobileNetV2 and NASNet models.

sleap.nn.data.normalization.scale_to_imagenet_torch_mode(image: Tensor) Tensor[source]#

Scale images according to the “torch” preprocessing mode.

This applies the preprocessing operations implemented in tf.keras.applications for models pretrained on ImageNet.

Parameters:

image – Any image tensor of rank >= 2. If rank >=3, the last axis is assumed to be of size 3 corresponding to RGB-ordered channels.

Returns:

The preprocessed image of dtype tf.float32 and shape (…, height, width, 3) with RGB channel ordering.

Values will be in the approximate range of [-0.5, 0.5].

Notes

The preprocessing steps applied are:
  1. If needed, expand to rank-3 by adding singleton dimensions to the end. This assumes rank-2 images are grayscale of shape (height, width) and will be expanded to (height, width, 1).

  2. Convert to RGB if not already in 3 channel format.

  3. Convert to tf.float32 in the range [0.0, 1.0].

  4. Subtract the ImageNet mean values (0.485, 0.456, 0.406) for channels in RGB format.

  5. Divide by the ImageNet standard deviation values (0.229, 0.224, 0.225) for channels in RGB format.

This preprocessing mode is required when using pretrained DenseNet models.