PHP for obtaining the follower count of a Twitter Account
It seems like overkill to me and is a mixture of regular expressions, string parsing, callback functions and a lot of head scratching.
The user obviously knows that the follower count HAS to reside within an element within the DOM with the id of follower_count so why not just use one single regular expression to target that element and return it's guts instead of all the DOM loading, callbacks and string parsing?
// Get the number of twitter followers
function string_getInsertedString($long_string,$short_string,$is_html=false){
if($short_string>=strlen($long_string))return false;
$insertion_length=strlen($long_string)-strlen($short_string);
for($i=0;$i<strlen($short_string);++$i){
if($long_string[$i]!=$short_string[$i])break;
}
$inserted_string=substr($long_string,$i,$insertion_length);
if($is_html && $inserted_string[$insertion_length-1]=='<'){
$inserted_string='<'.substr($inserted_string,0,$insertion_length-1);
}
return $inserted_string;
}
function DOMElement_getOuterHTML($document,$element){
$html=$document->saveHTML();
$element->parentNode->removeChild($element);
$html2=$document->saveHTML();
return string_getInsertedString($html,$html2,true);
}
function getFollowers($username){
$x = file_get_contents("http://twitter.com/".$username);
$doc = new DomDocument;
@$doc->loadHTML($x);
$ele = $doc->getElementById('follower_count');
$innerHTML=preg_replace('/^<[^>]*>(.*)<[^>]*>$/',"\\1",DOMElement_getOuterHTML($doc,$ele));
return $innerHTML;
}
// To display it
<?php echo getFollowers("username"); ?>
Here is the much shorter version I wrote. It still works just as well returning the follower count of the Twitter Account username passed into it.
function getFollowers($username){
$url = "http://twitter.com/".$username;
$count=0;
$x = file_get_contents($url);
preg_match("@<([a-z][^ ]*) id=\"follower_count\"[^>]+?>([0-9,]+)\s*</\\1>@i",$x,$match);
if($match){
$count = $match[2];
}
return $count;
}
// lets check how poorly my twitter account is followed!
echo "StrictlyTweets: " . getFollowers("StrictlyTweets") . "<br><br>";
As you can see from the regular expression I am matching HTML tags (they all start with < and then a letter) and storing that tag to be used in the backtrack reference later on so that if the HTML changes from a SPAN to a DIV as long as it has the id="follower_count" with the element there will be a match.
I could have loaded up the DOM, targeted the ID and then done some regex but why bother when you can go straight for the juggular!
Labels: hacking, PHP, RegEx, regular expression, Scrape, Twitter, Wordpress



0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
Links to this post:
Create a Link
<< Home