Class HuggingFaceHub

java.lang.Object
smile.util.HuggingFaceHub

public class HuggingFaceHub extends Object
Utility for downloading files from the Hugging Face Hub with local disk caching.

This class reproduces the same on-disk cache layout as official Python function hf_hub_download so that downloads performed by this class are interoperable with the Python library and vice versa.

Cache layout

$HF_HOME/hub/
  models--{owner}--{repo}/
    blobs/
      {sha256}          ← actual file content
    snapshots/
      {commit_hash}/
        {filename}      ← relative symlink → ../../blobs/{sha256}
    refs/
      {revision}        ← text file containing the resolved commit hash

Environment variables

  • HF_HOME – base directory for all Hugging Face data (default: ~/.cache/huggingface).
  • HUGGINGFACE_HUB_CACHE – override cache root (default: $HF_HOME/hub).
  • HF_ENDPOINT – Hugging Face endpoint (default: https://huggingface.co).
  • HF_TOKEN – API token for private repositories (also read from ~/.cache/huggingface/token).

Supported repo types

  • model (default)
  • dataset
  • space
  • Field Details

  • Method Details

    • download

      public static Path download(String repoId, String filename) throws IOException
      Downloads a single file from a Hugging Face Hub repository and caches it locally.
      Parameters:
      repoId - the repository identifier in owner/name format (e.g. "google/bert-base-uncased").
      filename - the path of the file inside the repository (e.g. "config.json" or "data/train.csv").
      Returns:
      the local Path to the cached file.
      Throws:
      IOException - if a network or filesystem error occurs.
    • download

      public static Path download(String repoId, String filename, HuggingFaceHub.RepoType repoType, String revision, String subfolder, Path cacheDir, boolean forceDownload, boolean localFilesOnly) throws IOException
      Downloads a single file from a Hugging Face Hub repository and caches it locally. The function checks for the HF_TOKEN environment variable or HUGGING_FACE_HUB_TOKEN system property. If neither is set, it checks the token file written by `huggingface-cli login`.
      Parameters:
      repoId - the repository identifier ("owner/name").
      filename - the file path inside the repository.
      repoType - the repository type (HuggingFaceHub.RepoType.MODEL, HuggingFaceHub.RepoType.DATASET, or HuggingFaceHub.RepoType.SPACE).
      revision - the git revision to download from (branch, tag, or full commit SHA). Defaults to "main".
      subfolder - an optional subdirectory prefix prepended to filename. May be null.
      cacheDir - override for the local cache root directory. When null, the value of the HUGGINGFACE_HUB_CACHE / HF_HOME environment variables is used.
      forceDownload - when true, bypass the local cache and always re-download the file.
      localFilesOnly - when true, raise an IOException instead of making any network request if the file is not already cached.
      Returns:
      the local Path to the cached file.
      Throws:
      IOException - if a network or filesystem error occurs, or if localFilesOnly is true and the file is not cached.
    • resolveCacheDir

      public static Path resolveCacheDir(Path cacheDir)
      Resolves the local cache root directory, respecting environment variables:
      1. cacheDir argument (if non-null)
      2. HUGGINGFACE_HUB_CACHE environment variable
      3. HF_HOME environment variable + "/hub"
      4. ~/.cache/huggingface/hub
      Parameters:
      cacheDir - explicit override; may be null.
      Returns:
      the resolved cache root path.
    • resolveEndpoint

      public static String resolveEndpoint()
      Returns the Hugging Face Hub API endpoint, from the HF_ENDPOINT environment variable, falling back to DEFAULT_ENDPOINT.
      Returns:
      the endpoint URL (no trailing slash).
    • resolveToken

      public static String resolveToken()
      Returns the API token to use for authenticated requests, checking in priority order:
      1. HF_TOKEN environment variable
      2. HUGGING_FACE_HUB_TOKEN environment variable (legacy)
      3. ~/.cache/huggingface/token file (written by huggingface-cli login)
      Returns:
      the token string, or null if none is configured.
    • download

      public static Path download(String repoId, String filename, String token) throws IOException
      Convenience overload that uses HuggingFaceHub.RepoType.MODEL and "main" revision, and accepts an explicit token argument instead of reading environment variables.
      Parameters:
      repoId - the repository identifier ("owner/name").
      filename - the file path inside the repository.
      token - the Bearer token for private repositories, or null.
      Returns:
      the local Path to the cached file.
      Throws:
      IOException - if a network or filesystem error occurs.
    • tryLoadFromCache

      public static Optional<Path> tryLoadFromCache(String repoId, String filename, HuggingFaceHub.RepoType repoType, String revision, Path cacheDir)
      Returns the expected local path for a cached file without making any network request. Returns an empty Optional if the file is not currently cached.
      Parameters:
      repoId - the repository identifier ("owner/name").
      filename - the file path inside the repository.
      repoType - the repository type.
      revision - the git revision (branch, tag, or commit SHA).
      cacheDir - explicit cache root override; null to use the default.
      Returns:
      an Optional containing the cached path, or empty.
    • deleteRepoCache

      public static void deleteRepoCache(String repoId, HuggingFaceHub.RepoType repoType, Path cacheDir) throws IOException
      Deletes all cached data for a given repository.

      This removes the entire repo cache directory, including all blobs, snapshots, and refs. Use with care.

      Parameters:
      repoId - the repository identifier ("owner/name").
      repoType - the repository type.
      cacheDir - explicit cache root override; null to use the default.
      Throws:
      IOException - if a filesystem error occurs.