Contents Menu Expand Light mode Dark mode Auto light/dark mode
Streaming
Light Logo Dark Logo
Star

Overview

  • 🚀 Quick Start
  • 🧠 Main Concepts
  • 🤔 FAQs and Tips

Preparing Datasets

  • Dataset Format
  • Basic Dataset Conversion
  • Parallel dataset conversion
  • Spark DataFrame to MDS

Dataset Configuration

  • Shard Retrieval
  • Shuffling
  • Mixing Datasets
  • Replication and Sampling

Distributed Training

  • Requirements for Distributed Training
  • Using Launchers
  • Elastic Determinism
  • Fast Resumption
  • Performance Tuning

How-to Guides

  • Configure Cloud Storage Credentials
  • Image Data: CIFAR10
  • Text Data: Synthetic NLP

API Reference

  • streaming
    • CSVWriter
    • JSONWriter
    • LocalDataset
    • MDSWriter
    • Stream
    • StreamingDataLoader
    • StreamingDataset
    • TSVWriter
    • XSVWriter
  • streaming.base.compression
    • compress
    • decompress
    • get_compression_extension
    • get_compressions
    • is_compression
  • streaming.base.format
    • get_index_basename
    • reader_from_json
    • FileInfo
    • Reader
  • streaming.base.hashing
    • get_hash
    • get_hashes
    • is_hash
  • streaming.base.partition
    • get_partitions
    • get_partitions_orig
    • get_partitions_relaxed
    • NDArray
  • streaming.base.shared
    • get_shm_prefix
    • SharedArray
    • SharedBarrier
    • SharedMemory
    • SharedScalar
  • streaming.base.shuffle
    • get_shuffle
    • get_shuffle_naive
    • get_shuffle_py1br
    • get_shuffle_py1e
    • get_shuffle_py1s
    • get_shuffle_py2s
    • NDArray
  • streaming.base.storage
    • AlipanDownloader
    • AzureDataLakeDownloader
    • AzureDataLakeUploader
    • AzureDownloader
    • AzureUploader
    • CloudDownloader
    • CloudUploader
    • DBFSDownloader
    • DatabricksUnityCatalogDownloader
    • GCSDownloader
    • GCSUploader
    • HFDownloader
    • HFUploader
    • LocalDownloader
    • LocalUploader
    • OCIDownloader
    • OCIUploader
    • S3Downloader
    • S3Uploader
    • SFTPDownloader
  • streaming.base.util
    • bytes_to_int
    • clean_stale_shared_memory
    • get_import_exception_message
    • get_list_arg
    • merge_index
    • number_abbrev_to_int
    • retry
    • wait_for_file_to_exist
  • streaming.base.world
    • get_worker_info
    • World
  • streaming.multimodal
    • StreamingInsideWebVid
    • StreamingOutsideDTWebVid
    • StreamingOutsideGIWebVid
  • streaming.text
    • StreamingC4
    • StreamingEnWiki
    • StreamingPile
  • streaming.vision
    • StreamingADE20K
    • StreamingCIFAR10
    • StreamingCOCO
    • StreamingImageNet
Back to top

NDArray#

streaming.base.partition.NDArray#

alias of numpy.ndarray[Any, numpy.dtype[numpy._typing._array_like._ScalarType_co]]

Next
streaming.base.shared
Previous
get_partitions_relaxed
Copyright © 2022, MosaicML, Inc.
Made with Sphinx and @pradyunsg's Furo