web2vec.utils module
- web2vec.utils.create_directories(*directories: str)[source]
Create directories if they do not exist.
- web2vec.utils.fetch_file_from_url(url, directory=None, headers=None, timeout=86400) str[source]
Check if the file exists in the directory and is newer than the timeout. If not, downloads the file from the URL, saves it in the directory, and returns the path.
- Parameters:
directory – Directory where the file should be saved.
url – URL of the file to download.
timeout – Timeout in seconds (default is 86400 = day).
- Returns:
File path.
- web2vec.utils.fetch_file_from_url_and_read(url, directory=None, headers=None, timeout=86400) str[source]
Return the content of the file for the given URL.
- web2vec.utils.fetch_url(url, headers=None, ssl_verify=False)[source]
Fetch the given URL and return the response.
- web2vec.utils.get_file_path_for_url(url, directory=None, timeout=86400) str[source]
Return the path to the file for the given URL.
- web2vec.utils.get_github_repo_release_info(repo: str) dict[source]
Return the latest release information for the given GitHub repository.
- web2vec.utils.get_ip_from_domain(domain: str) str[source]
Return the IP address for the given domain.
- web2vec.utils.is_numerical_type(obj: object) bool[source]
Check if the given object is a simple type.
- web2vec.utils.sanitize_filename(filename)[source]
Sanitize the filename by replacing invalid characters.