A very BIG ML dataset un-TAR GZIP command

I have learned that none of my GUI Mac programs were able to expand the 13 GB dataset, however, the command line had no problem with it.


$ tar xvzf BIG_DATASET_MANY_THOUSANDS_FOLDERS.tar.gz

It would be great is it was this simple!

The command has failed as I run out of 41 GB of free disk space before I was able to expand it.

Alternatively, I considered going one directory at the time,

$ tar xvfz BIG_DATASET_MANY_THOUSANDS_FOLDERS.tar.gz /directory_path


with a script that traverses the directories. This way I can keep track which directories were correctly expanded.

At this point, I ended up with multiple directories on various disks, a directory merging tool is very useful:

# parameters:
# -a --archive; look at everything recursively
# -i; --itemize-changes; print update about each file
# -h; --human-readable
# -W; --whole-file; avoid file deltas
# --progress; show progress in terminal
# --log-file=XYZ.log; log the progress to file, this might be useful when resuming
$ rsync -aW source_directory/ destination_directory/


References:

  • https://www.thegeekstuff.com/2010/04/unix-tar-command-examples/
  • https://medium.com/@sethgoldin/a-gentle-introduction-to-rsync-a-free-powerful-tool-for-media-ingest-86761ca29c34











As an Amazon Associate I earn from qualifying purchases.

No comments:

Post a Comment

Please be polite.

Post Scriptum

The views in this article are mine and do not reflect those of my employer.
I am preparing to cancel the subscription to the e-mail newsletter that sends my articles.
Follow me on:
X.com (Twitter)
LinkedIn
Google Scholar

My favorite quotations..


“A man should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.”  by Robert A. Heinlein

"We are but habits and memories we chose to carry along." ~ Uki D. Lucas


Popular Recent Posts

Most Popular Articles