Automating Wordpress posts with WP-O-MaticIf you use
wordpress you should really look into the great
WP-O-Matic plugin that allows you to automate postings by
importing content from RSS or XML feeds. You can set up schedules to import at regular times or import on demand from the admin area.
One issue however which I have just spent ages getting to the bottom of is the use of HTML such as
OBJECT and EMBED tags. As a lot of content feeds contain multimedia files nowadays and you want this content to be imported directly into your site.
The problem with WP-O-Matic and Wordpress in their default mode is that you will only get this content imported when you run the import from the admin menu or the page that the CRONJOB calls directly whilst logged in as an admin or publisher.
If you try to run the page the cronjob calls e.g /wp-content/plugins/wp-o-matic/cron.php?code=XXXXX whilst logged out or allow the job to run by itself you will find that certain
HTML tags and attributes are removed including OBJECT and EMBED tags.The reason is for security
to prevent XSS hacks and its possible to get round this if you require to. This took me quite a long time to get to the bottom of as I am very new to Wordpress but I managed it in the end.
1. WP-O-Matic makes use of another object called
SimplePie which is a tool for extracting content from XML and RSS. This object has a number of settings for stripping out HTML and the behaviour depends on how the feed import is called.
When running the import from the admin menu a setting called
set_stupidly_fast is set to true which bypasses all the normal formatting and HTML parsing. When the CRONJOB runs this is set to false so the reformatting is carried out. In reality you want to run the reformatting as it does much more than just parse the HTML such as remove excess DIV's and comment tags and ordering the results by date.
If you don't care about this formatting you need to find the
fetchFeed method in the \wp-content\plugins\wp-o-matic\wpomatic.php file and force it to be false all of the time:
$feed->set_stupidly_fast(false);
If you do want to
keep the benefits of the stupidly_fast function but allow OBJECT and EMBED tags then you can override the
strip_htmltags property in Simplepie that defines the tags to remove. You can do this in the same
fetchFeed method in the wpomatic.php file just before the init method is called by passing in an array of tags that you do want Simplepie to remove from the extracted content.
// Remove these tags from the list
$feed->strip_htmltags(array('base', 'blink', 'body', 'doctype', 'font', 'form', 'frame', 'frameset', 'html', 'iframe', 'input', 'marquee', 'meta', 'noscript', 'script', 'style'));
$feed->init();
So that takes care of the WP-O-Matic class but unfortunatley we are not done yet as
Wordpress runs its own sanitisation on posts in a file called
kses.php found in the wp-includes folder. If you are logged in as admin or a publisher you won't get this problem but your CRONJOB will run into it so you have two choices.
1. Comment out the hook that runs all the
kses sanitisation which isn't recommended for security reasons but if you wanted to do it the following line should be commented out in the
kses_init_filters function e.g
function kses_remove_filters() {
// Normal filtering.
remove_filter('pre_comment_content', 'wp_filter_kses');
remove_filter('title_save_pre', 'wp_filter_kses');
// Post filtering
// comment out the hook that sanitises the post content
//remove_filter('content_save_pre', 'wp_filter_post_kses');
remove_filter('excerpt_save_pre', 'wp_filter_post_kses');
remove_filter('content_filtered_save_pre', 'wp_filter_post_kses');
}
Commenting out this line will ensure no sanitisation is carried out on your posts whoever or whatever does the posting. Obviously this is bad for security as if you are importing a feed that one day contained an inline script or an OBJECT that loaded a virus you could be infecting all your visitors.
2. The other safer way is to add the tags and attributes that you want to allow into the list of acceptable HTML content that
the kses.php file uses when sanitising input. At the top of the kses file is an array called
$allowedposttags which contains a list of HTML elements and their allowed attributes.
If you wanted to
allow the playing of videos and audio through OBJECT and EMBED tags then the following section of code can just be inserted into the array.
'object' => array(
'id'=>array(),
'classid'=>array(),
'data'=>array(),
'type'=>array(),
'codebase'=>array(),
'align'=>array(),
'width'=>array(),
'height'=>array()),
'param' => array(
'name'=>array(),
'value'=>array()),
'embed' => array(
'id'=>array(),
'type'=>array(),
'width'=>array(),
'height'=>array(),
'src'=>array(),
'bgcolor'=>array(),
'wmode'=>array(),
'quality'=>array(),
'allowscriptaccess'=>array(),
'allowfullscreen'=>array(),
'allownetworking'=>array(),
'flashvars'=>array()
),
Obviously you can add whichever tags and attributes you like and this is the preferred way in my opinion of getting round this problem as you are still whitelisting content rather than allowing anything.
It took me quite a while to get to the bottom of this problem but I now have all my automated feeds running correctly importing media content into my blog. Hopefully this article will help some people out.