I Got Code #5: Downloading linked images

There was actually a question that got me thinking – how would I implement a program that downloads pictures from a web page, that are pointed by some links?

Here is a sample console application I came up with:


using System;
using System.Collections.Generic;
using System.Net;
using System.Threading;
using System.IO;
using System.Text.RegularExpressions;
using System.Drawing;

namespace ConsoleApplication
{
    class Program
    {
        static int totalFiles = 0;
        static int currentFiles = 0;

        static void Main(string[] args)
        {
            GetImages("<a href="http://www.textureking.com/index.php/category/all-textures%22);">http://www.textureking.com/index.php/category/all-textures");</a>
        }

        static void GetImages(string url)
        {
            string responseString;
            HttpWebRequest initialRequest = (HttpWebRequest)WebRequest.Create(url);
            using (HttpWebResponse initialResponse = (HttpWebResponse)initialRequest.GetResponse())
            {
                using (StreamReader reader = new StreamReader(initialResponse.GetResponseStream()))
                {
                    responseString = reader.ReadToEnd();
                }
            }

            List<string> imageset = new List<string>();
            Regex regex = new Regex(@"f=""[^""]*jpg|bmp|tif|gif|png",RegexOptions.IgnoreCase);
            foreach (Match m in regex.Matches(responseString))
            {
                if (!imageset.Contains(m.Value))
                    imageset.Add(m.Value);
            }

            for (int i = 0; i < imageset.Count; i++)
                imageset[i] = imageset[i].Remove(0, 3);

            totalFiles = imageset.Count;
            currentFiles = totalFiles;

            Console.WriteLine(totalFiles.ToString() + " images will be downloaded.");

            foreach (string f in imageset)
            {
                ThreadPool.QueueUserWorkItem(new WaitCallback(DownloadImage), f);
            }

            Console.Read();
        }

        static void DownloadImage(object path)
        {
            currentFiles--;
            Console.WriteLine("Downloading " + Path.GetFileName(path.ToString()) + "... (" + (totalFiles - currentFiles).ToString() + "/" + totalFiles + ")");
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(path.ToString());
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                Image image = Image.FromStream(response.GetResponseStream());
                image.Save(@"D:\Temporary\" + Path.GetFileName(path.ToString()));
            }
            Console.WriteLine(Path.GetFileName(path.ToString()) + " downloaded.");
        }
    }

}

The sample URL provided in the method call is used to download several textures linked on the webpage.

I am using regex to actually find the URLs. The case is ignored since I am not sure whether the file extensions are written with in caps or not. Since there is a chance for the same URL to be mentioned twice on the same page, I am making sure that there are no duplicates, so before adding the regex match to the List, I am checking if that already contains an entry for the match.

The final saving path also can be modified, but I decided to leave it hardcoded like this for testing purposes. In case you want to make the path dynamic, you can pass a generic collection or an array as the parameter for the DownloadImage method and then explicitly convert it and read the needed values (identified by an index, for example).

NOTE: I am using ThreadPool here so all threads are automatically set as background – if the application is closed, the download process will be canceled. To avoid this and wait for all downloads to complete (which is probably not a good idea but still a possibility), the Thread class should be used with IsBackground set to false.

Leave a Reply