sleap.nn.data.dataset_ops#
Transformers for dataset (multi-example) operations, e.g., shuffling and batching.
These are mostly wrappers for standard tf.data.Dataset ops.
- class sleap.nn.data.dataset_ops.Batcher(batch_size: int = 8, drop_remainder: bool = False, unrag: bool = True)[source]#
Batching transformer for use in pipelines.
This class enables variable-length example keys to be batched by converting them to ragged tensors prior to concatenation, then converting them back to dense tensors.
See the notes in the
Shuffling
andRepeater
transformers if training. If using in inference, this transformer will be used on its own without dropping remainders.- The ideal (training) pipeline follows the order:
shuffle -> batch -> repeat
- batch_size#
Number of elements within a batch. Every key will be stacked within their first axis (with expansion) such that it has
batch_size
length.- Type:
int
- drop_remainder#
If True, final elements with fewer than
batch_size
examples will be dropped once the end of the input dataset iteration is reached. This should be True for training and False for inference.- Type:
bool
- unrag#
If False, any tensors that were of variable length will be left as `tf.RaggedTensor`s. If True, the tensors will be converted to full tensors with NaN padding. Leaving tensors as ragged is useful when downstream ops will continue to use them in variable length form.
- Type:
bool
- property input_keys: List[str]#
Return the keys that incoming elements are expected to have.
- property output_keys: List[str]#
Return the keys that outgoing elements will have.
- transform_dataset(ds_input: DatasetV2) DatasetV2 [source]#
Create a dataset with batched elements.
- Parameters:
ds_input – Any dataset that produces dictionaries keyed by strings and values with any rank tensors.
- Returns:
A
tf.data.Dataset
with elements containing the same keys, but with each tensor promoted to 1 rank higher (except for scalars with rank 0 will be promoted to rank 2).The keys of each element will contain
batch_size
individual elements stacked along the axis 0, such that length (shape[0]
) is equal tobatch_size
.Any keys that had variable length elements within the batch will be padded with NaNs to the size of the largest element’s length for that key.
- class sleap.nn.data.dataset_ops.LambdaFilter(filter_fn: Callable[[Dict[str, Tensor]], bool])[source]#
Transformer for filtering examples out of a dataset.
This class is useful for eliminating examples that fail to meet some criteria, e.g., when no peaks are found.
- filter_fn#
Callable that takes an example dictionary as input and returns True if the the element should be kept.
- Type:
Callable[[Dict[str, tensorflow.python.framework.ops.Tensor]], bool]
- property input_keys: List[str]#
Return the keys that incoming elements are expected to have.
- property output_keys: List[str]#
Return the keys that outgoing elements will have.
- transform_dataset(ds_input: DatasetV2) DatasetV2 [source]#
Create a dataset with filtering applied.
- Parameters:
ds_input – Any dataset that produces dictionaries keyed by strings and values with any rank tensors.
- Returns:
A
tf.data.Dataset
with elements containing the same keys, but with potentially fewer elements.
- class sleap.nn.data.dataset_ops.Prefetcher(prefetch: bool = True, buffer_size: int = -1)[source]#
Prefetching transformer for use in pipelines.
Prefetches elements from the input dataset to minimize the processing bottleneck as elements are requested since prefetching can occur in parallel.
- prefetch#
If False, returns the input dataset unmodified.
- Type:
bool
- buffer_size#
Keep
buffer_size
elements loaded in the buffer. If set to -1 (tf.data.experimental.AUTOTUNE
), this value will be optimized automatically to decrease latency.- Type:
int
- property input_keys: List[str]#
Return the keys that incoming elements are expected to have.
- property output_keys: List[str]#
Return the keys that outgoing elements will have.
- transform_dataset(ds_input: DatasetV2) DatasetV2 [source]#
Create a dataset with prefetching to maintain a buffer during iteration.
- Parameters:
ds_input – Any dataset.
- Returns:
A
tf.data.Dataset
with identical elements. Processing that occurs with the elements that are produced can be done in parallel (e.g., training on the GPU) while new elements are generated from the pipeline.
- class sleap.nn.data.dataset_ops.Preloader[source]#
Preload elements of the underlying dataset to generate in-memory examples.
This transformer can lead to considerable performance improvements at the cost of memory consumption.
This is functionally equivalent to
tf.data.Dataset.cache
, except the cached examples are accessible directly via theexamples
attribute.- examples#
Stored list of preloaded elements.
- Type:
List[Any]
- property input_keys: List[str]#
Return the keys that incoming elements are expected to have.
- property output_keys: List[str]#
Return the keys that outgoing elements will have.
- transform_dataset(ds_input: DatasetV2) DatasetV2 [source]#
Create a dataset that generates preloaded elements.
- Parameters:
ds_input – Any
tf.data.Dataset
that generates examples as a dictionary of tensors. Should not be repeating infinitely.- Returns:
A dataset that generates the same examples.
This is similar to prefetching, except that examples are yielded through a generator and loaded when this method is called rather than during pipeline iteration.
- class sleap.nn.data.dataset_ops.Repeater(repeat: bool = True, epochs: int = -1)[source]#
Repeating transformer for use in pipelines.
Repeats the underlying elements indefinitely or for a number of “iterations” or “epochs”.
If placed before batching, this can create mini-batches with examples from across epoch boundaries.
If placed after batching, this may never reach examples that are dropped as remainders if not shuffling.
- The ideal pipeline follows the order:
shuffle -> batch -> repeat
- repeat#
If False, returns the input dataset unmodified.
- Type:
bool
- epochs#
If -1, repeats the input dataset elements infinitely. Otherwise, loops through the elements of the input dataset this number of times.
- Type:
int
- property input_keys: List[str]#
Return the keys that incoming elements are expected to have.
- property output_keys: List[str]#
Return the keys that outgoing elements will have.
- class sleap.nn.data.dataset_ops.Shuffler(shuffle: bool = True, buffer_size: int = 64, reshuffle_each_iteration: bool = True)[source]#
Shuffling transformer for use in pipelines.
The input to this transformer should not be repeated or batched (though the latter would technically work). Repeating prevents the shuffling from going through “epoch” or “iteration” loops in the underlying dataset.
Though batching before shuffling works and respects epoch boundaries, it is not recommended as it implies that the same examples will always be optimized for together within a mini-batch. This is not as effective for promoting generalization as element-wise shuffling which produces new combinations of elements within mini- batches.
- The ideal pipeline follows the order:
shuffle -> batch -> repeat
- shuffle#
If False, returns the input dataset unmodified.
- Type:
bool
- buffer_size#
Number of examples to keep in a buffer to sample uniformly from. If set too high, it may take a long time to fill the initial buffer, especially if it resets every epoch.
- Type:
int
- reshuffle_each_iteration#
If True, resets the sampling buffer every iteration through the underlying dataset.
- Type:
bool
- property input_keys: List[str]#
Return the keys that incoming elements are expected to have.
- property output_keys: List[str]#
Return the keys that outgoing elements will have.
- transform_dataset(ds_input: DatasetV2) DatasetV2 [source]#
Create a dataset with shuffled element order.
- Parameters:
ds_input – Any dataset.
- Returns:
A
tf.data.Dataset
with elements containing the same keys, but in a shuffled order, if enabled.If the input dataset is repeated, this doesn’t really respect epoch boundaries since it never reaches the end of the iterator.