Showing posts with label GZIP. Show all posts
Showing posts with label GZIP. Show all posts

Thursday, 9 January 2014

Download and Decompress a GZIP file with C Sharp

Using C# to download a remote GZIP file and then decompress it

By Strictly-Software

Recently I had the task of writing a program in C# that would obtain a list of proxies from various sources and then run code to ensure that they were working so I could then use the useful ones.

In this project I had a window that showed the proxy IP address, Country it came from and whether it was Anonymous, High Anonymous, Transparent etc.

I also had a check button which once the proxy list had loaded would run a test against each IP address and Port No to see the time that it took to do the following:
  • Ping the proxy if possible e.g 408 ms
  • HTTP ping the proxy by using the details to request one of a randomly selected number of fast loading pages e.g www.google.com, www.bing.com etc.
Personally I think the HTTP ping is more important when dealing with proxies than a normal PING.

A simple ping to an IP address could respond very quickly or not at all but when you are using Proxies in computing to request HTML pages you want to know how fast it takes to return such a page.

Anyway the whole point of the exercise was that I needed to have a list of countries that I could check the IP addresses against.

Luckily the great site http://geolite.maxmind.com have a free GeoIP.dat.gz file that you can download and use that is pretty accurate (but not as accurate as the paid for version). However the free version was good enough for what I needed.

The issue was that the .dat file came as a GZipped file and once I had downloaded it I needed to decompress it. This wasn't the normal .zip decompress but in .NET 4.5 it is pretty easy to accomplish.

I have shown you a basic example of the class at the bottom of the page but the most important function is the method which does the Gzip decompression.


/// 
/// Decompress a gzipped file to compress we can just use the CompressionMode.Compress parameter instead
/// 
/// 
public static void Decompress(FileInfo fileToDecompress)
{
    using (FileStream originalFileStream = fileToDecompress.OpenRead())
    {
 string currentFileName = fileToDecompress.FullName;
 string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

 using (FileStream decompressedFileStream = File.Create(newFileName))
 {
     using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
     {
  decompressionStream.CopyTo(decompressedFileStream);                        
     }
 }
    }
}
The three libraries you will need to accomplish all this apart from anything else you intend to do will be:

using System.Net
using System.IO;
using System.IO.Compression;

System.Net is required for the WebClient class to do it's work downloading the remote file to our computer and the System.IO one is required for checking that files and folders exist.

The last one is the most important, System.IO.Compression as it's the library that lets us decompress the file.

You might have to add this in as a reference in Visual Studio. Just go to: Project > Add Reference > Framework > and tick the box next to System.IO.Compression.

Also note that I am using .NET 4.5 on a Windows 7 64 bit machine. In Windows7 for security sake (I presume) most applications that need to write and read to a file, or download and hold data of some sort is done in the new C:\ProgramData folder.

You will notice that this directory is full of well known names like Microsoft, Skype, Sun, Apple and any many other software producers that needs somewhere to log data in a safe place.

In the old days people could just write programs that saved files all over the place which obviously wasn't safe. Especially if you were the admin of the computer and hit a button on a program that you thought was going to do one thing but was actually adding or deleting files all over your computer's hard drive.

Anyway the whole code is below. Make of it what you will but it's pretty simple and I found it very useful.


using System;
using System.Linq;
using System.Text;
// we need this to download the file from the web
using System.Net
// these are the two we need to do our decompression job
using System.IO;
using System.IO.Compression;

// this will hold any error message incase we get one and need to return it to the calling program
private string ErrorMessage = "";

/// 
/// Ensure our special folder in C:\ProgramData\ exists e.g C:\ProgramData\MyProgram
/// Then check the file we need to get countries related to IP's exists from http://geolite.maxmind.com and if it doesnt' download it
/// and copy it to this folder. Then we need to decompress it as its a gzip file e.g GeoIP.dat.gz so we need the uncompressed GeoIP.dat file to work with
/// 
public SetUp()
{
    // ensure a folder we can write to exists - named after my program
    dataFolder = Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData) + @"\MyProgramName";

    if (!Directory.Exists(dataFolder))
    {
 try
 {
     // The folder doesn't exist so try and create it now
     Directory.CreateDirectory(dataFolder);

 }
 catch (Exception ex)
 {
     // set a global error message we can return to the calling object
     this.ErrorMessage = "The data folder could not be created: " + ex.Message.ToString();

     return;
 }
    }

    // we have a folder but do we have an uncompressed .dat file?
   
    // set up the paths
    // first the path of the uncompressed .dat file in case we already have it
    string geoCityDataPath = dataFolder + @"\" + this.GeoLiteCityDataFile;

    // then the path of the .dat.gz compressed file in case we need to uncompress
    string zipFilePath = dataFolder + @"\" + this.ZippedGeoLiteCityDataFile;

    // check for an uncompressed data file
    if (!File.Exists(geoCityDataPath))
    {
 try
 {

     // we don't have a file so download it from the website and copy it to our folder
     // we could schedule this behaviour to get the latest file by checking the dates or just doing a download once a week/month
     WebClient webClient = new WebClient();
     webClient.DownloadFile("http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz", zipFilePath);

     // now we have our file create a FileInfo object from it to pass to our gzip decompress method
     FileInfo gzFileInfo = new FileInfo(zipFilePath);

     // Call the method to decompress the gzip file
     this.Decompress(gzFileInfo);
 }
 catch (Exception ex)
 {
     // set a global error message we can return to the calling object
     this.ErrorMessage = "The GeoIP.dat.gz file could not be downloaded or decompressed: " + ex.Message.ToString();

     return;
 }
    }
}


/// 
/// Decompress a gzipped file to compress we can just use the CompressionMode.Compress parameter instead
/// 
/// 
public static void Decompress(FileInfo fileToDecompress)
{
    using (FileStream originalFileStream = fileToDecompress.OpenRead())
    {
 string currentFileName = fileToDecompress.FullName;
 string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

 using (FileStream decompressedFileStream = File.Create(newFileName))
 {
     using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
     {
  decompressionStream.CopyTo(decompressedFileStream);                        
     }
 }
    }
}

Sunday, 30 August 2009

Compression Comparison

Comparing Compressor Tools

There are many Javascript compressor tools online and they all do wonderful jobs. What makes the tool I made slightly different is that it allows you to customise a number of compression options which can aid you in getting the best compression rate possible. Whilst most tools offer a simple crunch method and maybe a pack method (changing the code to run through an eval statement to obfuscate the code) they don't allow you to do some simple things that can make a lot of difference such as:

Renaming global objects that are used frequently to reduce size e.g window, document, navigator.

Renaming your own global objects that you may use frequently.

Renaming your own commonly accessed functions to short names.

Replacing calls to document.getElementById with a call to a single letter function e.g $ or G.

These 4 options used together could drastically alter the compression rate of your script.

Also if you have a small script then choosing to pack it as well as crunch or minify it will most likely increase the size of the output rather than compress it. Packing maybe worthwhile if you really want to hide your codes purpose from any user but its totally pointless as anyone can reverse engineer a packed script with a simple line of Javascript either within the Error Console in Firefox or by using one of the unpacker tools available online.


Different Outputs

Also I note that on a few compressors the output may give a misleading impression of success to the user. If a file has been reduced in size by 30% it has been compressed by 30%. Some tools however will show the new file size as a ratio of the old size which would be 70% which is fine. However having a label that just says "Compressed" next to the figure of 70% may lead some people to believe their file has been compressed by 70% when in fact its only been compressed by 30%.

For example take this silly example of a function:


function tested(var2){
var fuv = "hello "
+ "mr smith "
+ "how are you ";
var pid = 1000
var ola = 3343

if(var2==fuv){

var rob = function(){

addEvent(pid,"click",function(e){

var donkey = function(e){
if(fuv == pid){
return true;
}
}
})
}
}
}

Now run it through my compressor


Which outputs the following which has reduced the size by 40.65%


function tested(a){var b="hello mr smith how are you ";var c=1000,d=3343;if(a==b){var rob=function(){addEvent(c,"click",function(e){var donkey=function(e){if(b==c){return true}}})}}}


And now this other compressor tool.


Which outputs the following code and in a box labelled "Compression" it has the value 63.67%.


function tested(A){var B="hello "+"mr smith "+"how are you ";var C=1000var D=3343if(A==B){var E=function(){addEvent(C,"click",function(e){var F=function(e){if(B==C){return true;}}})}}}

Now this is actually the size of the new code in relation to the old and not how much the code has been reduced by which is 36.33%. This is not the only tool that does this and I am sure most people will be aware what the figures actually mean. However because my tool does the opposite and shows the percentage that the new code is reduced by it may lead people to believe one tool has a better compression rate than another when in fact it doesn't.

I am not claiming my tool is perfect as it will never be as it uses regular expressions instead of a Java Engine however most other tools do this as well and I have spent a lot of time handling issues such as missing terminators which other tools like the one above misses. Douglas Crockfords JSMin will just add a new line when a missing terminator is found whereas my tool adds the missing terminator. Other tools will just assume the code has been validated and checked before submission which of course is the best way to avoid any errors at all.


Compression with GZIP

Whats the benefit of minimising your script as compared to using GZIP?

Well GZIP may offer superior compression but its a way of delivering the content to the browser which then uncompress it. It uses server side processing to generate the compressed files which may or may not be a problem depending on the load and the type of file being served.

With minification the code can stay compressed on the server as well as the client plus it will run on older browsers. You also have the benefit that certain minification techniques should actually aid client side performance by reducing processing power e.g reducing the amount of variable declarations or reducing string concatenation by combining the strings together into one variable (see my compressed output)

There is nothing stopping you minifying as well as using GZIP. So if you haven't looked into compression then you should do as its a very simple and easy way of increasing your sites performance.