ITKarma picture

Sometimes you need to download a file in batches. The reasons can be different, for example, the file is too “large”, the channel width is not sufficient or the server limits the amount of data to download.

In this article I will describe how to implement downloading a file in small portions in Java using the HTTP protocol.

About HTTP


For such purposes, HTTP provides the CDMY0CDMY header for the request. Which indicates the range of bytes to download. The title CDMY1CDMY refers only to the body of the request, the headers are not included here.

The specification defines the following formats for specifying header values:

Range: bytes=first-byte-pos "-" [last-byte-pos] 


first-byte-pos - the initial byte offset from which you want to start (continue) the download, it must be greater than or equal to 0, and less than or equal to last-byte-pos;

last-byte-pos - the final byte offset to which you want to download the file, it must be greater than or equal to first-byte-pos and at the same time less than or equal to the downloaded file size minus one (because it is the offset, that is, an index in an array of bytes).

Examples


Exclusively on the specified range

bytes=0-255 

bytes=256-512 

Download from first-byte-pos to the end

Range: bytes=first-byte-pos "-" 

bytes=512- 

Download last-byte-pos from the end

Range: bytes="-"last-byte-pos 

bytes=-32 

The server will return two possible statuses to such a request


  • CDMY2CDMY - partially downloaded file;
  • CDMY3CDMY - Unsatisfactory download range.

Of course, there may be more answers. They don’t interest us in the context of the article.

And the heading CDMY4CDMY which indicates the requested range and total size.

Content-Range: bytes 256-512/1024 

This header reports that a response has been received from a 256-512 position in a byte array of 1024 bytes.

Implementation in Java 14


As an HTTP client, let's take the standard from the JDK, available with Java 11 - CDMY5CDMY.

To implement the logic of executing a request in portions, we will write a wrapper class - CDMY6CDMY.

Let's describe the interface of this class

  • CDMY7CDMY - downloads a file by the specified portion of bytes;
  • CDMY8CDMY - downloads a file within the specified range.

If the passed URI is not valid, then the method throws an CDMY9CDMY exception. An CDMY10CDMY exception is thrown if there is any unexpected I/O error.

WebClient and Response Classes


package art.aukhatov.http; import java.io.BufferedInputStream; import java.net.http.HttpClient; import java.net.http.HttpHeaders; import java.time.Duration; public class WebClient { private final HttpClient httpClient; public WebClient() { this.httpClient=HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(10)) .build(); } public static class Response { final BufferedInputStream inputStream; final int status; final HttpHeaders headers; public Response(BufferedInputStream inputStream, int status, HttpHeaders headers) { this.inputStream=inputStream; this.status=status; this.headers=headers; } } } 

As a representation of the response, we describe CDMY11CDMY with the fields BufferedInputStream, HTTP Status, HTTP Header. This data is necessary for forming the resulting array of bytes and understanding whether to continue to download or not.

Response download method (final String uri, int firstBytePos, int lastBytePos)


import java.io.BufferedInputStream; import java.io.IOException; import java.io.InputStream; import java.net.URI; import java.net.URISyntaxException; import java.net.http.HttpClient; import java.net.http.HttpHeaders; import java.net.http.HttpRequest; import java.net.http.HttpResponse; private static final String HEADER_RANGE="Range"; private static final String RANGE_FORMAT="bytes=%d-%d"; public Response download(final String uri, int firstBytePos, int lastBytePos) throws URISyntaxException, IOException, InterruptedException { HttpRequest request=HttpRequest .newBuilder(new URI(uri)) .header(HEADER_RANGE, format(RANGE_FORMAT, firstBytePos, lastBytePos)) .GET() .version(HttpClient.Version.HTTP_2) .build(); HttpResponse<InputStream> response=httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream()); return new Response(new BufferedInputStream(response.body()), response.statusCode(), response.headers()); } 

This method downloads the specified data range. But before we start, we need to know how much data we should expect. To do this, you need to make a request without receiving content. We use the CDMY12CDMY method.

method long contentLength (final String uri)


import java.util.OptionalLong; private static final String HTTP_HEAD="HEAD"; private static final String HEADER_CONTENT_LENGTH="content-length"; private long contentLength(final String uri) throws URISyntaxException, IOException, InterruptedException { HttpRequest headRequest=HttpRequest .newBuilder(new URI(uri)) .method(HTTP_HEAD, HttpRequest.BodyPublishers.noBody()) .version(HttpClient.Version.HTTP_2) .build(); HttpResponse<String> httpResponse=httpClient.send(headRequest, HttpResponse.BodyHandlers.ofString()); OptionalLong contentLength=httpResponse .headers().firstValueAsLong(HEADER_CONTENT_LENGTH); return contentLength.orElse(0L); } 

Now we have the expected file length in bytes.

Method byte [] download (final String uri, int chunkSize)


We can start writing a method that controls downloading a file in batches. For convenience, we agree that the portion size will be passed as the second argument to this method. Although you could come up with a smart way to determine the size of servings.

Determine File Size

final int expectedLength=(int) contentLength(uri); 

Initial Offset

int firstBytePos=0; 

Final Displacement

int lastBytePos=chunkSize - 1; 

Data

The downloaded data for each iteration must be accumulated, we will create an array for this, the size of the array is already known to us.

byte[] downloadedBytes=new byte[expectedLength]; 

Size of downloaded data

In addition to the array itself, it is necessary to determine the total amount of data downloaded.

Therefore, we will consider this length separately.

int downloadedLength=0; 

Download cycle


The condition of the cycle is simple: we continue to download until we reach the expected size. After you have successfully downloaded the next batch of data, you need to read it and save it in the resulting array, we will use the CDMY13CDMY array system copy method. Then you need to increase the amount of data read and the next range of downloaded data. When increasing the range you need to be careful, you can not go beyond. Therefore, we will take the minimum value from CDMY14CDMY.

private static final int HTTP_PARTIAL_CONTENT=206; while (downloadedLength < expectedLength) { Response response; try { response=download(uri, firstBytePos, lastBytePos); } try (response.inputStream) { byte[] chunkedBytes=response.inputStream.readAllBytes(); downloadedLength += chunkedBytes.length; if (isPartial(response)) { System.arraycopy(chunkedBytes, 0, downloadedBytes, firstBytePos, chunkedBytes.length); firstBytePos=lastBytePos + 1; lastBytePos=Math.min(lastBytePos + chunkSize, expectedLength - 1); } } } return downloadedBytes; } private boolean isPartial(Response response) { return response.status == HTTP_PARTIAL_CONTENT; } 

Everything looks good. What is wrong?

When something goes wrong when downloading or reading, throw an I/O exception and download will stop. Missing fallback. Let's write a simple fallback as the number of attempts made.

Define a field for the web client containing the maximum number of valid attempts to download a file.

private int maxAttempts; public int maxAttempts() { return maxAttempts; } public void setMaxAttempts(int maxAttempts) { this.maxAttempts=maxAttempts; } 

We will catch each exception separately and increment the local attempt counter. The download cycle should stop if the number of attempts exceeds the permissible. Therefore, we supplement the condition of the cycle.

private static final int DEFAULT_MAX_ATTEMPTS=3; int attempts=1; while (downloadedLength < expectedLength && attempts < maxAttempts) { Response response; try { response=download(uri, firstBytePos, lastBytePos); } catch (IOException e) { attempts++; continue; } try (response.inputStream) { byte[] chunkedBytes=response.inputStream.readAllBytes(); downloadedLength += chunkedBytes.length; if (isPartial(response)) { System.arraycopy(chunkedBytes, 0, downloadedBytes, firstBytePos, chunkedBytes.length); firstBytePos=lastBytePos + 1; lastBytePos=Math.min(lastBytePos + chunkSize, expectedLength - 1); } } catch (IOException e) { attempts++; continue; } attempts=1; } 

We supplement the method with logs. The final version looks like this:

package art.aukhatov.http; import java.io.BufferedInputStream; import java.io.IOException; import java.io.InputStream; import java.net.URI; import java.net.URISyntaxException; import java.net.http.HttpClient; import java.net.http.HttpHeaders; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.time.Duration; import java.util.OptionalLong; import static java.lang.String.format; import static java.lang.System.err; import static java.lang.System.out; public class WebClient { private static final String HEADER_RANGE="Range"; private static final String RANGE_FORMAT="bytes=%d-%d"; private static final String HEADER_CONTENT_LENGTH="content-length"; private static final String HTTP_HEAD="HEAD"; private static final int DEFAULT_MAX_ATTEMPTS=3; private static final int HTTP_PARTIAL_CONTENT=206; private final HttpClient httpClient; private int maxAttempts; public WebClient() { this.httpClient=HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(10)) .build(); this.maxAttempts=DEFAULT_MAX_ATTEMPTS; } public WebClient(HttpClient httpClient) { this.httpClient=httpClient; } private long contentLength(final String uri) throws URISyntaxException, IOException, InterruptedException { HttpRequest headRequest=HttpRequest .newBuilder(new URI(uri)) .method(HTTP_HEAD, HttpRequest.BodyPublishers.noBody()) .version(HttpClient.Version.HTTP_2) .build(); HttpResponse<String> httpResponse=httpClient.send(headRequest, HttpResponse.BodyHandlers.ofString()); OptionalLong contentLength=httpResponse .headers().firstValueAsLong(HEADER_CONTENT_LENGTH); return contentLength.orElse(0L); } public Response download(final String uri, int firstBytePos, int lastBytePos) throws URISyntaxException, IOException, InterruptedException { HttpRequest request=HttpRequest .newBuilder(new URI(uri)) .header(HEADER_RANGE, format(RANGE_FORMAT, firstBytePos, lastBytePos)) .GET() .version(HttpClient.Version.HTTP_2) .build(); HttpResponse<InputStream> response=httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream()); return new Response(new BufferedInputStream(response.body()), response.statusCode(), response.headers()); } public byte[] download(final String uri, int chunkSize) throws URISyntaxException, IOException, InterruptedException { final int expectedLength=(int) contentLength(uri); int firstBytePos=0; int lastBytePos=chunkSize - 1; byte[] downloadedBytes=new byte[expectedLength]; int downloadedLength=0; int attempts=1; while (downloadedLength < expectedLength && attempts < maxAttempts) { Response response; try { response=download(uri, firstBytePos, lastBytePos); } catch (IOException e) { attempts++; err.println(format("I/O error has occurred. %s", e)); out.println(format("Going to do %d attempt", attempts)); continue; } try (response.inputStream) { byte[] chunkedBytes=response.inputStream.readAllBytes(); downloadedLength += chunkedBytes.length; if (isPartial(response)) { System.arraycopy(chunkedBytes, 0, downloadedBytes, firstBytePos, chunkedBytes.length); firstBytePos=lastBytePos + 1; lastBytePos=Math.min(lastBytePos + chunkSize, expectedLength - 1); } } catch (IOException e) { attempts++; err.println(format("I/O error has occurred. %s", e)); out.println(format("Going to do %d attempt", attempts)); continue; } attempts=1;//reset attempts counter } if (attempts >= maxAttempts) { err.println("A file could not be downloaded. Number of attempts are exceeded."); } return downloadedBytes; } private boolean isPartial(Response response) { return response.status == HTTP_PARTIAL_CONTENT; } public int maxAttempts() { return maxAttempts; } public void setMaxAttempts(int maxAttempts) { this.maxAttempts=maxAttempts; } public static class Response { final BufferedInputStream inputStream; final int status; final HttpHeaders headers; public Response(BufferedInputStream inputStream, int status, HttpHeaders headers) { this.inputStream=inputStream; this.status=status; this.headers=headers; } } } 

Testing


Now we can write a test for Junit 5 to check the file download. For example, take a random file on the Internet from those available without authentication: file -examples.com/wp-content/uploads/2017/10/file-example_PDF_1MB.pdf

Save the file to a temporary directory. And check the file size.

class WebClientTest { @Test void downloadByChunk() throws IOException, URISyntaxException, InterruptedException { WebClient fd=new WebClient(); byte[] data=fd.download("https://file-examples.com/wp-content/uploads/2017/10/file-example_PDF_1MB.pdf", 262_144); final String downloadedFilePath=System.getProperty("java.io.tmpdir") + "sample.pdf"; System.out.println("File has downloaded to " + downloadedFilePath); Path path=Paths.get(downloadedFilePath); try (OutputStream outputStream=Files.newOutputStream(path)) { outputStream.write(data); outputStream.flush(); assertEquals(1_042_157, Files.readAllBytes(Paths.get(downloadedFilePath)).length); Files.delete(path); } } } 

Conclusion


This article examined how to implement downloading a file in predetermined portions. For greater flexibility, you can think of a dynamic portion size, which expands and narrows depending on the behavior of the server. Also, the possible exceptions that can be handled differently are not completely covered. For example, the error CDMY15CDMY or CDMY16CDMY.

As a homework, the code can be modified so as not to fulfill the first request to get the file size. You can also implement suspension and resumption.

Source