Hugging Face Downloads
OMG what a POS this thing is.
So I got the new workstation, got a working GPU, got GPU passthrough on Proxmox so that I have a Docker VM, and then it’s got vLLM running inside it. If you are not building something horizontally scalable, just stick to Ollama or LMStudio.
Once you’ve got vLLM running, you need models. These come from Hugging Face and their downloader has great nostalgia for the 14.4k/28.8k modems of the late 1990s. A bare invocation of the download will run for all of fifteen seconds in my environment before it freaks out.
They should either clean the process up, or rebrand to Grabbing Ass, because this is the very soul of what Sergeant Hartmann meant when he referred to his recruits as “unorganized grabasstic pieces of amphibian shit”.
The following incantation will actually steadily pull that golf ball of a model through their garden hose of a CDN.
export HF_HUB_DISABLE_PROGRESS_BARS=0
export HF_HUB_VERBOSE=1
export TQDM_MININTERVAL=0.1
export HF_HUB_DISABLE_XET=1
export HF_HUB_ENABLE_HF_TRANSFER=0
export HF_HUB_DOWNLOAD_TIMEOUT=300
export HF_HUB_DOWNLOAD_CHUNK_SIZE=1048576 # 1MB
export HF_HUB_MAX_WORKERS=4 # fewer threads
time /opt/anaconda3/bin/python -m huggingface_hub.cli.hf download \
unsloth/Qwen2.5-Coder-7B-bnb-4bit --local-dir ./Qwen2.5-Coder-7B-bnb-4bit


