Monday, 22 March 2010

Write your own Proxy Checker Tool

Creating your own Proxy Checker Tool

Finding good reliable proxies is a hard task and although many sites offer free lists the majority of the proxies on them will be out of date or not working when you get round to testing them. If you are happy with using WebProxies then there are plenty about but they don't help if you are wanting to access content that utilises complex AJAX to deliver movies or other country specific goodies you want to view that is being blocked.

Therefore its a good idea to have a proxy checker tool that can be run on demand to find working proxies. There are many tools you can buy or download for free such as Proxyway that will do this but the problem with any executable is that you don't know what the code is doing behind the scenes.

Any executable that you run on your PC that is contacting multiple servers in Russia and China should be used with caution as these are well known hotspots for hackers and utilising hidden malware inside otherwise useful tools is a well known tactic.

Therefore its always a good idea if you can to write your own code. I have just knocked up a Proxy Checker tool for my own use that not only finds useful proxy lists at the click of a button but also checks the proxies within those lists to see if they are working.

1. The code is written in PHP and uses an HTML form and multiple AJAX requests to load the results into the DOM. This makes the script very usable as your not waiting around to see the results and the page is updated as each result comes in giving it the look and feel of a real time app.

2. If you don't have access to a webserver to run PHP from then install WAMP Server on your PC. This will allow you to run scripts from your localhost plus you can enable all the extensions that a webserver may not let you use such as CURL or FOPEN. I do like running "personal apps" from my localhost as it means I get all the flexibility of a webserver plus no-one else can use the code!

3. The HTML page contains a form with a textarea. Paste in all the URL's containing the ProxyList sites you want to scrape. If you don't have a list of proxy sites then you can use the "Find Proxy List" button to scan the web for some. This is not an extensive search but it will return some lists. Remember good quality proxy lists are hard to come by and a quiet proxy is a quick proxy therefore if you find a good reliable proxy server keep it quiet and don't destroy it!

4. On hitting the "Check Proxies" button the form loops through the Proxy Lists URL's making AJAX calls to a helper script that scrapes any proxy details it can find. I am using a basic scraper function utilising file_get_contents but you can use CURL or fsockopen if you wish or like I do on other sites a custom function that utilises all 3 in case the server has blocked the use of one or more options or if your CURL settings don't allow you to use Proxy Tunnelling.
// A very simple function to extract HTTP content remotely. Requires fopen support on server.
// A better function would check for CURL use and fallback on fsockopen
function getHttpContent($url, $useragent="",$timeout=10, $maxredirs=3, $method="GET", $postdata="",$proxy="") {

$urlinfo = null;

// simple test for a valid URL
if(!preg_match("/^https?:\/\/.+/",$url)) return $urlinfo;

$headers = "";
$status = "";

// create http array
$http = array(
'method'=>$method,
'user_agent'=>$useragent,
'timeout'=>$timeout
);

// add proxy details if required
if(!empty($proxy)){
$http["proxy"] = $proxy;
}

// if we want to POST data format it correctly
if($method=="POST"){
$content_length = strlen($postdata);

$http["content"] = $postdata;
$headers .= "Content-Type: application/x-www-form-urlencoded\r\nContent-Length: $content_length";
}

// now add any headers
$http["header"] = $headers;

// set options
$opts = array('http'=>$http);

// create stream context
$context = stream_context_create($opts);

// Open the file using the HTTP headers set above
$html = @file_get_contents($url, false, $context);

// check global $http_response_header for status code e.g first part is HTTP/1.1 200 OK
if(isset($http_response_header[0])){
// Retrieve HTTP status code by splitting this into 3 vars
list($version,$status,$msg) = explode(' ',$http_response_header[0], 3);
}

// if we have a valid status code then use it otherwise default to 400
if(is_numeric($status)){
$urlinfo["status"] = $status;
}else{
$urlinfo["status"] = "400"; //bad request
$msg = "Bad Request";
}

// only return the HTML content for 200=OK status codes
if($status == "200"){
$urlinfo["html"] = $html;
//put all other headers into array in case we want to access them (similar to CURL)
}elseif(isset($http_response_header)){
$urlinfo['info'] = $http_response_header;
}

// return array containing HTML,Status,Info
return $urlinfo;
}



5. The content is decoded to get round people outputting HTML using Javascript or HTML Encoding it to hide the goodies. It then looks for text in the format of IP:PORT e.g
// call function to get content from proxy list URL
$content = getHttpContent($url, "",10, 3, "GET");

// did we get a good response?
if($content["status"]=="200" && !empty($content["html"])){

// extract content and decode it to get round people using Javascript to hide HTML
$content = urldecode(html_entity_decode($content["html"]));

// now look for all instances of IP:PORT
preg_match_all("/(\d+\.\d+\.\d+\.\d+):(\d+)/",$content,$matches,PREG_SET_ORDER);

6. I then return the list of IP's to the front end HTML page which outputs them into a table with a "TESTING" status. As each unique IP:PORT is inserted into the report another AJAX call is made to test the Proxy Server out.

7. The Proxy test utilises the same HTTP scraper function but this time it uses the IP and PORT details from the Proxy we are wanting to test. The page it calls is one of the many IP Checker tools that are available on the web. You can change the URL it calls but I am using a page that returns the Country after doing a reverse IP check. This way if the proxy is working I know the country details.

8. Once the reverse IP test is carried out on the Proxy the results are returned to the HTML report page and the relevant Proxy listing is updated in the table with either GOOD or BAD.

I have found that a lot of other Proxy checker scripts are only validating that a proxy is working by giving it a PING or opening a socket. Although this may show whether a server is accessible it doesn't tell you whether using it as proxy will work or not.

Therefore the best way to test whether a Proxy is working is to check for a valid response by requesting a page and if you are going to call a page you might as well call a useful page. One that will return the IP's location or maybe one that shows any HTTP_VIA, FORWARDED_FOR headers so you can detect whether the Proxy is anonymous or not.

Remember when you find some good quality proxies store their details as they are worth their weight in gold!


Removed. I'm going to sell this bad boy!

3 comments:

Anonymous said...

Are you still selling this, where can I buy it?

Anonymous said...

Are you still selling this where can I buy it?

Robert Reid said...

Nope didn't sell it, lost use of my LINUX VPS, so doing less PHP work now more C# plus I spent so long doing the webpage version, decoding some well known proxy sites, working out how they were obfusicating the IP:Port by using JavaScript variables for each number used (IP and Port) e.g var a=1,b=2,c=3,d=4,e=5....w=8,x=0,y=0,z=8 etc and how they then used document.write(a+b+c+d+':'+w+x+y+z) etc, and converting it all to the IP:PORT that I can then test.