sleap.nn.data.dataset_ops#

Transformers for dataset (multi-example) operations, e.g., shuffling and batching.

These are mostly wrappers for standard tf.data.Dataset ops.

class sleap.nn.data.dataset_ops.Batcher(batch_size: int = 8, drop_remainder: bool = False, unrag: bool = True)[source]#

Batching transformer for use in pipelines.

This class enables variable-length example keys to be batched by converting them to ragged tensors prior to concatenation, then converting them back to dense tensors.

See the notes in the Shuffling and Repeater transformers if training. If using in inference, this transformer will be used on its own without dropping remainders.

The ideal (training) pipeline follows the order:

shuffle -> batch -> repeat

batch_size#

Number of elements within a batch. Every key will be stacked within their first axis (with expansion) such that it has batch_size length.

Type:

int

drop_remainder#

If True, final elements with fewer than batch_size examples will be dropped once the end of the input dataset iteration is reached. This should be True for training and False for inference.

Type:

bool

unrag#

If False, any tensors that were of variable length will be left as `tf.RaggedTensor`s. If True, the tensors will be converted to full tensors with NaN padding. Leaving tensors as ragged is useful when downstream ops will continue to use them in variable length form.

Type:

bool

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset with batched elements.

Parameters:

ds_input – Any dataset that produces dictionaries keyed by strings and values with any rank tensors.

Returns:

A tf.data.Dataset with elements containing the same keys, but with each tensor promoted to 1 rank higher (except for scalars with rank 0 will be promoted to rank 2).

The keys of each element will contain batch_size individual elements stacked along the axis 0, such that length (shape[0]) is equal to batch_size.

Any keys that had variable length elements within the batch will be padded with NaNs to the size of the largest element’s length for that key.

class sleap.nn.data.dataset_ops.LambdaFilter(filter_fn: Callable[[Dict[str, Tensor]], bool])[source]#

Transformer for filtering examples out of a dataset.

This class is useful for eliminating examples that fail to meet some criteria, e.g., when no peaks are found.

filter_fn#

Callable that takes an example dictionary as input and returns True if the the element should be kept.

Type:

Callable[[Dict[str, tensorflow.python.framework.ops.Tensor]], bool]

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset with filtering applied.

Parameters:

ds_input – Any dataset that produces dictionaries keyed by strings and values with any rank tensors.

Returns:

A tf.data.Dataset with elements containing the same keys, but with potentially fewer elements.

class sleap.nn.data.dataset_ops.Prefetcher(prefetch: bool = True, buffer_size: int = -1)[source]#

Prefetching transformer for use in pipelines.

Prefetches elements from the input dataset to minimize the processing bottleneck as elements are requested since prefetching can occur in parallel.

prefetch#

If False, returns the input dataset unmodified.

Type:

bool

buffer_size#

Keep buffer_size elements loaded in the buffer. If set to -1 (tf.data.experimental.AUTOTUNE), this value will be optimized automatically to decrease latency.

Type:

int

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset with prefetching to maintain a buffer during iteration.

Parameters:

ds_input – Any dataset.

Returns:

A tf.data.Dataset with identical elements. Processing that occurs with the elements that are produced can be done in parallel (e.g., training on the GPU) while new elements are generated from the pipeline.

class sleap.nn.data.dataset_ops.Preloader[source]#

Preload elements of the underlying dataset to generate in-memory examples.

This transformer can lead to considerable performance improvements at the cost of memory consumption.

This is functionally equivalent to tf.data.Dataset.cache, except the cached examples are accessible directly via the examples attribute.

examples#

Stored list of preloaded elements.

Type:

List[Any]

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset that generates preloaded elements.

Parameters:

ds_input – Any tf.data.Dataset that generates examples as a dictionary of tensors. Should not be repeating infinitely.

Returns:

A dataset that generates the same examples.

This is similar to prefetching, except that examples are yielded through a generator and loaded when this method is called rather than during pipeline iteration.

class sleap.nn.data.dataset_ops.Repeater(repeat: bool = True, epochs: int = -1)[source]#

Repeating transformer for use in pipelines.

Repeats the underlying elements indefinitely or for a number of “iterations” or “epochs”.

If placed before batching, this can create mini-batches with examples from across epoch boundaries.

If placed after batching, this may never reach examples that are dropped as remainders if not shuffling.

The ideal pipeline follows the order:

shuffle -> batch -> repeat

repeat#

If False, returns the input dataset unmodified.

Type:

bool

epochs#

If -1, repeats the input dataset elements infinitely. Otherwise, loops through the elements of the input dataset this number of times.

Type:

int

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset with repeated loops over the input elements.

Parameters:

ds_input – Any dataset.

Returns:

A tf.data.Dataset with elements containing the same keys, but repeated for epochs iterations.

class sleap.nn.data.dataset_ops.Shuffler(shuffle: bool = True, buffer_size: int = 64, reshuffle_each_iteration: bool = True)[source]#

Shuffling transformer for use in pipelines.

The input to this transformer should not be repeated or batched (though the latter would technically work). Repeating prevents the shuffling from going through “epoch” or “iteration” loops in the underlying dataset.

Though batching before shuffling works and respects epoch boundaries, it is not recommended as it implies that the same examples will always be optimized for together within a mini-batch. This is not as effective for promoting generalization as element-wise shuffling which produces new combinations of elements within mini- batches.

The ideal pipeline follows the order:

shuffle -> batch -> repeat

shuffle#

If False, returns the input dataset unmodified.

Type:

bool

buffer_size#

Number of examples to keep in a buffer to sample uniformly from. If set too high, it may take a long time to fill the initial buffer, especially if it resets every epoch.

Type:

int

reshuffle_each_iteration#

If True, resets the sampling buffer every iteration through the underlying dataset.

Type:

bool

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset with shuffled element order.

Parameters:

ds_input – Any dataset.

Returns:

A tf.data.Dataset with elements containing the same keys, but in a shuffled order, if enabled.

If the input dataset is repeated, this doesn’t really respect epoch boundaries since it never reaches the end of the iterator.

class sleap.nn.data.dataset_ops.Unbatcher[source]#

Unbatching transformer for use in pipelines.

property input_keys: List[str]#

Return the keys that incoming elements are expected to have.

property output_keys: List[str]#

Return the keys that outgoing elements will have.

transform_dataset(ds_input: DatasetV2) DatasetV2[source]#

Create a dataset with unbatched elements.