Thursday, 9 January 2014

Download and Decompress a GZIP file with C Sharp

Using C# to download a remote GZIP file and then decompress it

By Strictly-Software

Recently I had the task of writing a program in C# that would obtain a list of proxies from various sources and then run code to ensure that they were working so I could then use the useful ones.

In this project I had a window that showed the proxy IP address, Country it came from and whether it was Anonymous, High Anonymous, Transparent etc.

I also had a check button which once the proxy list had loaded would run a test against each IP address and Port No to see the time that it took to do the following:
  • Ping the proxy if possible e.g 408 ms
  • HTTP ping the proxy by using the details to request one of a randomly selected number of fast loading pages e.g www.google.com, www.bing.com etc.
Personally I think the HTTP ping is more important when dealing with proxies than a normal PING.

A simple ping to an IP address could respond very quickly or not at all but when you are using Proxies in computing to request HTML pages you want to know how fast it takes to return such a page.

Anyway the whole point of the exercise was that I needed to have a list of countries that I could check the IP addresses against.

Luckily the great site http://geolite.maxmind.com have a free GeoIP.dat.gz file that you can download and use that is pretty accurate (but not as accurate as the paid for version). However the free version was good enough for what I needed.

The issue was that the .dat file came as a GZipped file and once I had downloaded it I needed to decompress it. This wasn't the normal .zip decompress but in .NET 4.5 it is pretty easy to accomplish.

I have shown you a basic example of the class at the bottom of the page but the most important function is the method which does the Gzip decompression.


/// 
/// Decompress a gzipped file to compress we can just use the CompressionMode.Compress parameter instead
/// 
/// 
public static void Decompress(FileInfo fileToDecompress)
{
    using (FileStream originalFileStream = fileToDecompress.OpenRead())
    {
 string currentFileName = fileToDecompress.FullName;
 string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

 using (FileStream decompressedFileStream = File.Create(newFileName))
 {
     using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
     {
  decompressionStream.CopyTo(decompressedFileStream);                        
     }
 }
    }
}
The three libraries you will need to accomplish all this apart from anything else you intend to do will be:

using System.Net
using System.IO;
using System.IO.Compression;

System.Net is required for the WebClient class to do it's work downloading the remote file to our computer and the System.IO one is required for checking that files and folders exist.

The last one is the most important, System.IO.Compression as it's the library that lets us decompress the file.

You might have to add this in as a reference in Visual Studio. Just go to: Project > Add Reference > Framework > and tick the box next to System.IO.Compression.

Also note that I am using .NET 4.5 on a Windows 7 64 bit machine. In Windows7 for security sake (I presume) most applications that need to write and read to a file, or download and hold data of some sort is done in the new C:\ProgramData folder.

You will notice that this directory is full of well known names like Microsoft, Skype, Sun, Apple and any many other software producers that needs somewhere to log data in a safe place.

In the old days people could just write programs that saved files all over the place which obviously wasn't safe. Especially if you were the admin of the computer and hit a button on a program that you thought was going to do one thing but was actually adding or deleting files all over your computer's hard drive.

Anyway the whole code is below. Make of it what you will but it's pretty simple and I found it very useful.


using System;
using System.Linq;
using System.Text;
// we need this to download the file from the web
using System.Net
// these are the two we need to do our decompression job
using System.IO;
using System.IO.Compression;

// this will hold any error message incase we get one and need to return it to the calling program
private string ErrorMessage = "";

/// 
/// Ensure our special folder in C:\ProgramData\ exists e.g C:\ProgramData\MyProgram
/// Then check the file we need to get countries related to IP's exists from http://geolite.maxmind.com and if it doesnt' download it
/// and copy it to this folder. Then we need to decompress it as its a gzip file e.g GeoIP.dat.gz so we need the uncompressed GeoIP.dat file to work with
/// 
public SetUp()
{
    // ensure a folder we can write to exists - named after my program
    dataFolder = Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData) + @"\MyProgramName";

    if (!Directory.Exists(dataFolder))
    {
 try
 {
     // The folder doesn't exist so try and create it now
     Directory.CreateDirectory(dataFolder);

 }
 catch (Exception ex)
 {
     // set a global error message we can return to the calling object
     this.ErrorMessage = "The data folder could not be created: " + ex.Message.ToString();

     return;
 }
    }

    // we have a folder but do we have an uncompressed .dat file?
   
    // set up the paths
    // first the path of the uncompressed .dat file in case we already have it
    string geoCityDataPath = dataFolder + @"\" + this.GeoLiteCityDataFile;

    // then the path of the .dat.gz compressed file in case we need to uncompress
    string zipFilePath = dataFolder + @"\" + this.ZippedGeoLiteCityDataFile;

    // check for an uncompressed data file
    if (!File.Exists(geoCityDataPath))
    {
 try
 {

     // we don't have a file so download it from the website and copy it to our folder
     // we could schedule this behaviour to get the latest file by checking the dates or just doing a download once a week/month
     WebClient webClient = new WebClient();
     webClient.DownloadFile("http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz", zipFilePath);

     // now we have our file create a FileInfo object from it to pass to our gzip decompress method
     FileInfo gzFileInfo = new FileInfo(zipFilePath);

     // Call the method to decompress the gzip file
     this.Decompress(gzFileInfo);
 }
 catch (Exception ex)
 {
     // set a global error message we can return to the calling object
     this.ErrorMessage = "The GeoIP.dat.gz file could not be downloaded or decompressed: " + ex.Message.ToString();

     return;
 }
    }
}


/// 
/// Decompress a gzipped file to compress we can just use the CompressionMode.Compress parameter instead
/// 
/// 
public static void Decompress(FileInfo fileToDecompress)
{
    using (FileStream originalFileStream = fileToDecompress.OpenRead())
    {
 string currentFileName = fileToDecompress.FullName;
 string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

 using (FileStream decompressedFileStream = File.Create(newFileName))
 {
     using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
     {
  decompressionStream.CopyTo(decompressedFileStream);                        
     }
 }
    }
}

1 comment:

Pawan Yadav said...

You can use Long Path Tool, it works for such problems......