- streaming.base.util.merge_index(*args, **kwargs)#
Merge index.json from partitions to form a global index.json.
This can be called as
merge_index(index_file_urls, out, keep_local, download_timeout)
merge_index(out, keep_local, download_timeout)
The first signature takes in a list of index files URLs of MDS partitions. The second takes the root of a MDS dataset and parse the partition folders from there.
index.json from all the partitions. Each element can take the form of a single path string or a tuple string.
index_file_urlsis a List of local URLs, merge locally without download.
index_file_urlsis a List of tuple (local, remote) URLs, check if local index.json are missing, download before merging.
index_file_urlsis a List of remote URLs, download all and merge.
folder that contain MDS partitions and to put the merged index file
A local directory, merge index happens locally.
A remote directory, download all the sub-directories index.json, merge locally and upload.
A tuple (local_dir, remote_dir), check if local index.json exist, download if not.
keep_local (bool) – Keep local copy of the merged index file. Defaults to
download_timeout (int) – The allowed time for downloading each json file. Defaults to 60.