Downloader

class parfive.Downloader(max_conn=5, progress=True, file_progress=True, loop=None, notebook=None, overwrite=False)[source]

Bases: object

Download files in parallel.

Parameters:
max_conn : int, optional

The number of parallel download slots.

progress : bool, optional

If True show a main progress bar showing how many of the total files have been downloaded. If False, no progress bars will be shown at all.

file_progress : bool, optional

If True and progress is true, show max_conn progress bars detailing the progress of each individual file being downloaded.

loop : asyncio.AbstractEventLoop, optional

The event loop to use to download the files. If not specified a new loop will be created and executed in a new thread so it does not interfere with any currently running event loop.

notebook : bool, optional

If True tqdm will be used in notebook mode. If None an attempt will be made to detect the notebook and guess which progress bar to use.

overwrite : bool or str, optional

Determine how to handle downloading if a file already exists with the same name. If False the file download will be skipped and the path returned to the existing file, if True the file will be downloaded and the existing file will be overwritten, if 'unique' the filename will be modified to be unique.

Attributes Summary

queued_downloads The total number of files already queued for download.

Methods Summary

download(self[, timeouts]) Download all files in the queue.
enqueue_file(self, url[, path, filename, …]) Add a file to the download queue.
retry(self, results) Retry any failed downloads in a results object.

Attributes Documentation

queued_downloads

The total number of files already queued for download.

Methods Documentation

download(self, timeouts=None)[source]

Download all files in the queue.

Parameters:
timeouts : dict, optional

Overrides for the default timeouts for http downloads. Supported keys are any accepted by the aiohttp.ClientTimeout class. Defaults to 5 minutes for total session timeout and 90 seconds for socket read timeout.

Returns:
filenames : parfive.Results

A list of files downloaded.

Notes

The defaults for the 'total' and 'sock_read' timeouts can be overridden by two environment variables PARFIVE_TOTAL_TIMEOUT and PARFIVE_SOCK_READ_TIMEOUT.

enqueue_file(self, url, path=None, filename=None, overwrite=None, **kwargs)[source]

Add a file to the download queue.

Parameters:
url : str

The URL to retrieve.

path : str, optional

The directory to retrieve the file into, if None defaults to the current directory.

filename : str or callable, optional

The filename to save the file as. Can also be a callable which takes two arguments the url and the response object from opening that URL, and returns the filename. (Note, for FTP downloads the response will be None.) If None the HTTP headers will be read for the filename, or the last segment of the URL will be used.

overwrite : bool or str, optional

Determine how to handle downloading if a file already exists with the same name. If False the file download will be skipped and the path returned to the existing file, if True the file will be downloaded and the existing file will be overwritten, if 'unique' the filename will be modified to be unique. If None the value set when constructing the Downloader object will be used.

kwargs : dict

Extra keyword arguments are passed to aiohttp.ClientSession.get or aioftp.ClientSession depending on the protocol.

retry(self, results)[source]

Retry any failed downloads in a results object.

Note

This will start a new event loop.

Parameters:
results : parfive.Results

A previous results object, the .errors property will be read and the downloads retried.

Returns:
results : parfive.Results

A modified version of the input results with all the errors from this download attempt and any new files appended to the list of file paths.