#development #java #kotlin

Recently, one of my friends asked me to download some pictures from a website. Instead of doing it manually (there were 90 images to download), I used the opportunity to automate it with Kotlin.

First, let's start with creating an empty project:

 1$ mkdir test-jsoup
 2$ cd test-jsoup
 3$ gradle init --dsl kotlin \
 4              --project-name test-jsoup \
 5              --type kotlin-application \
 6              --package be.yellowduck.testjsoup
 7
 8> Task :init
 9Get more help with your project: https://docs.gradle.org/7.0.2/samples/sample_building_kotlin_applications.html
10
11BUILD SUCCESSFUL in 716ms
122 actionable tasks: 2 executed

We now have an empty project which we can build and run.

1$ ./gradlew run
2
3> Task :app:run
4Hello World!
5
6BUILD SUCCESSFUL in 5s
72 actionable tasks: 2 executed

Now, let's first start with adding the needed dependencies. In the app/build.gradle.kts file, update the dependencies to:

 1dependencies {
 2    implementation(platform("org.jetbrains.kotlin:kotlin-bom"))
 3    implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
 4    implementation("org.jsoup:jsoup:1.13.1")
 5    implementation("com.squareup.okhttp3:okhttp:4.9.1")
 6    implementation("org.slf4j:slf4j-api:1.7.30")
 7    implementation("ch.qos.logback:logback-classic:1.2.3")
 8    implementation("ch.qos.logback:logback-core:1.2.3")
 9    testImplementation("org.jetbrains.kotlin:kotlin-test")
10    testImplementation("org.jetbrains.kotlin:kotlin-test-junit")
11}

We'll be using the following libraries:

After adding the dependencies, the first thing I do it to configure logging. For that, I change the app/src/main/kotlin/be/yellowduck/testjsoup/App.kt file to:

 1package be.yellowduck.testjsoup
 2
 3import ch.qos.logback.classic.Level
 4import ch.qos.logback.classic.Logger
 5import org.slf4j.LoggerFactory
 6
 7object App {
 8
 9    init {
10        val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
11        rootLogger.level = Level.INFO
12    }
13
14    val log = LoggerFactory.getLogger(App::class.java)
15
16    @JvmStatic
17    fun main(args: Array<String>) {
18        log.info("Hello world")
19    }
20
21}

This does a couple of things:

  • It creates a singleton App containing a main function which will be the entry point of our app.
  • It configures the root logger so that info, warning and error messages are shown
  • It configures a logger for the App class

Don't forget to update the main class name in app/build.gradle before you run it:

1application {
2    mainClass.set('be.yellowduck.testjsoup.App')
3}

When you now run the app, you'll get:

1$ ./gradlew run
2
3> Task :app:run
414:46:21.612 [main] INFO be.yellowduck.testjsoup.App - Hello world
5
6BUILD SUCCESSFUL in 1s
72 actionable tasks: 1 executed, 1 up-to-date

Next up is to use Jsoup to download the HTML and parse it. We'll download the HTML using Jsoup and get a list of all images which have a class .image. Let's change the main function to:

 1@JvmStatic
 2fun main(args: Array<String>) {
 3
 4    val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"
 5
 6    log.info("Parsing: ${sourceUrl}")
 7    val doc = Jsoup.connect(sourceUrl).get()
 8
 9    val urls = mutableSetOf<String>()
10    doc.select("img.image").forEach {
11        val url = it.attr("src").replace("thumbnail", "preview")
12        urls.add(url)
13    }
14
15    if (urls.size == 0) {
16        return
17    }
18
19    log.info("Downloading ${urls.size} image(s)")
20
21}

The select function on the Jsoup document allows you to use CSS queries to get the elements. In our case, we're taking all the src attribute values, replace the URL and save them in a list.

The next step is to create a function which downloads an URL to a file. For that, I'll add the downloadFile function in the App class:

 1val client = OkHttpClient.Builder().build()
 2
 3fun downloadFile(url: String, toDir: String) {
 4
 5    val request = Request.Builder().url(URL(url)).get().build()
 6
 7    val response = client.newCall(request).execute()
 8    if (response.code == HttpURLConnection.HTTP_OK) {
 9
10        val body = response.body?.bytes()
11
12        val outDir = File(toDir)
13        outDir.mkdirs()
14
15        val outPath = File(outDir, File(URL(url).path).name)
16
17        if (body != null) {
18            log.info("Saving: ${outPath}")
19            outPath.writeBytes(body)
20        }
21
22    }
23
24}

Note that I'm adding a property to the App object containing the HTTP client as well as a new function. This function uses OkHttp to download and save the file. It takes the URL as the argument as well as the path to where the image should be saved. If the directory doesn't exist, it will be created automatically.

The last step is to download the images and save them:

1val outPath = "/Users/me/Desktop/out"
2
3urls.forEach {
4    downloadFile(it, outPath)
5}

All done and when you run it, it will save all images:

 1./gradlew run
 2
 3> Task :app:run
 414:59:17.413 [main] INFO be.yellowduck.testjsoup.App - Parsing: https://www.yellowduck.be/documents/2/001.html
 514:59:17.799 [main] INFO be.yellowduck.testjsoup.App - Downloading 30 image(s)
 614:59:18.039 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01218.JPG
 714:59:18.093 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01146.JPG
 814:59:18.145 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01149.JPG
 914:59:18.205 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01144.JPG
1014:59:18.253 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01151.JPG
1114:59:18.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01147.JPG
1214:59:18.376 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01145.JPG
1314:59:18.432 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01148.JPG
1414:59:18.488 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01150.JPG
1514:59:18.542 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01161.JPG
1614:59:18.600 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01220.JPG
1714:59:18.657 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00437.JPG
1814:59:18.719 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00441.JPG
1914:59:18.778 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00440.JPG
2014:59:18.832 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00469.JPG
2114:59:18.892 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00468.JPG
2214:59:18.952 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00472.JPG
2314:59:19.020 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00473.JPG
2414:59:19.076 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00422.JPG
2514:59:19.129 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00425.JPG
2614:59:19.175 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00424.JPG
2714:59:19.223 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00426.JPG
2814:59:19.272 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00446.JPG
2914:59:19.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00445.JPG
3014:59:19.373 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00449.JPG
3114:59:19.424 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00450.JPG
3214:59:19.476 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01334.JPG
3314:59:19.535 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01340.JPG
3414:59:19.583 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01339.JPG
3514:59:19.633 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01343.JPG
36
37BUILD SUCCESSFUL in 3s
382 actionable tasks: 1 executed, 1 up-to-date

If you followed along, your app/src/main/kotlin/be/yellowduck/testjsoup/App.kt should now look like this:

 1package be.yellowduck.testjsoup
 2
 3import ch.qos.logback.classic.Level
 4import ch.qos.logback.classic.Logger
 5import okhttp3.OkHttpClient
 6import okhttp3.Request
 7import org.jsoup.Jsoup
 8import org.slf4j.LoggerFactory
 9import java.io.File
10import java.net.HttpURLConnection
11import java.net.URL
12
13object App {
14
15    init {
16        val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
17        rootLogger.level = Level.INFO
18    }
19
20    val log = LoggerFactory.getLogger(App::class.java)
21
22    val client = OkHttpClient.Builder().build()
23
24    fun downloadFile(url: String, toDir: String) {
25
26        val request = Request.Builder().url(URL(url)).get().build()
27
28        val response = client.newCall(request).execute()
29        if (response.code == HttpURLConnection.HTTP_OK) {
30
31            val body = response.body?.bytes()
32
33            val outDir = File(toDir)
34            outDir.mkdirs()
35
36            val outPath = File(outDir, File(URL(url).path).name)
37
38            if (body != null) {
39                log.info("Saving: ${outPath}")
40                outPath.writeBytes(body)
41            }
42
43        }
44
45    }
46
47    @JvmStatic
48    fun main(args: Array<String>) {
49
50        val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"
51
52        log.info("Parsing: ${sourceUrl}")
53        val doc = Jsoup.connect(sourceUrl).get()
54
55        val urls = mutableSetOf<String>()
56        doc.select("img.image").forEach {
57            val url = it.attr("src").replace("thumbnail", "preview")
58            urls.add(url)
59        }
60
61        if (urls.size == 0) {
62            return
63        }
64
65        log.info("Downloading ${urls.size} image(s)")
66
67        val outPath = "/Users/me/Desktop/out"
68
69        urls.forEach {
70            downloadFile(it, outPath)
71        }
72
73    }
74
75}

In a next blog post, we'll be adding coroutines to speed things up.