Tuesday 27 March 2012

Fixing The Postie Plugin for Wordpress to use Categories in the Subject line

Fixing the Postie WordPress plugin problem with categories in the Subject line

I needed a way of sending posts into a Wordpress site by email and I was looking at their default POP3 email format for email publishing but I soon realised that this wasn't good enough as I wanted to add categories to my article at the same time. The default way didn't seem to offer a solution so I looked for some plugins before writing my own.

I searched the web and came across a popular Wordpress plugin that lots of people seemed to rave about called Postie. There were a good few articles on how to set it up and get it working which were just what I needed and before long I was sending emails to a unique address to post articles on my site.


However when it came to getting the categories in the subject line working the plugin fell over like a drunk Irishman on St Patricks day! I was not happy.

The plugin was supposed to support the following formats which all relied on passing the categories in the subject line. If no categories were passed then the default category specified in the admin page was used instead.

The plugin allowed you to pass in either CategoryID's, partial Category Descriptions and multiple Categories which all sounded wonderful. The formats were supposed to be:


  • Subject: This is my title
  • Subject: Rac: This is my title
  • Subject: -Racing- -Horse Racing- This is my title
  • Subject: [1] [Racing] [Horse Racing] This is my title


The first format is not passing in any category and any post would use the default setup for the plugin e.g "horse racing".

The second format would look for the first category in the system starting with the letters "Rac" e.g Racing.

The last two formats would allow you to pass in multiple categories in two different formats e.g Racing and Horse Racing either wrapped in hyphens -Racing- or square brackets [Racing].

In all instances the title for the blog posting would be the last part of the subject line e.g 'This is my title'.

However when I started testing the category formats it seemed that whatever format I tried I would always end up with a post title exactly the same as the subject line minus the word "Subject" e.g: from a subject line of "Subject: -Racing- -Horse Racing- This is my title" I would get a post title in my blog of "-Racing- -Horse Racing- This is my title". Not good and not what is advertised on the tin!

From a cursory look online it seemed other people were having the same issue and I I posted a message to the forum but got no response. So yet again I delved into the code to fix the issue myself.

I could have written a new plugin myself as the logic isn't that hard, but why bother if only one method was screwed? However I find myself more and more disliking WordPress for the pure reason that relying on someone else's code is a pain for multiple reasons.

Not only do you have to wait ages or forever (if they have died, or stopped supporting it) for a response when bugs are found but as with all code that isn't your own you have no clue how it works unless you spent considerable effort and time finding out.

If I had searched Google a bit harder first I might have found another solution which relied on totally rewriting the function which you can find here: Fixing Postie but alas I only spotted this after my own fix was in.

Therefore as I couldn't be arsed writing a whole new plugin I thought I would give half an hour over to debugging it on my local WAMP Server setup. Luckily it didn't take too long to find the problem.

The issue is in the following function within the file called postie-functions.php. This is the function which parses the subject line and returns any categories it can find in an array otherwise returning the default category.

function GetPostCategories(&$subject, $defaultCategory) {

From looking at the GetPostCategories function which runs a number of regular expression tests to retrieve the categories from the subject line I could see that the problem was down to the order they were in and that the first test which was to find single categories before a colon was always being matched whatever format was used.

This was down to the basic nature of the regular expression which split everything up into content before and after the colon. As the subject line always contained a colon whether you used multiple, single or no categories at all it meant this first test was always being matched.

So to fix this I did the following.

  • Change the order of the tests within the function so that the most complex regular expressions were first and matched multiple categories either as -category- or [category].
  • Fix the first (now last) regular expression so that it didn't match the word "subject".
  • Ensure the word "Subject" is never returned as part of the title or as a category.
  • Test the code to prove it worked.
  • Publish it and test it on my live site.


To prove the fault and test the fix we can extract the GetPostCategories function from the postie-functions.php. file and make a simpler version of it along with some test harnesses to call it with all the various subject formats. We can run this test PHP page either on our webserver or our local machine. As I have a Windows 7 box I used WAMP Server to test it.

The following test page code calls the function GetPostCategories multiple times with all the different subject formats e.g no category, a single category, multiple categories using both hyphens and square bracket formats.

To test it just create a testpostie.php page and paste in this code:

<?php


// should be no categories - so use default
$subject = "Subject: This is my title";

echo "call GetPostCategories with $subject <br>";

$cats = GetPostCategories($subject,"Default Category");

print_r($cats);
echo "<br>";

// should use the category MyCategory
$subject = "Subject: MyCategory: This is my title";

echo "call GetPostCategories with $subject <br>";

$cats = GetPostCategories($subject,"Default Category");

print_r($cats);
echo "<br>";


// should use the category MyCategory1 and MyCategory2
$subject = "Subject: -MyCategory1- -MyCategory2- This is my title";

echo "call GetPostCategories with $subject <br>";

$cats = GetPostCategories($subject,"Default Category");

print_r($cats);
echo "<br>";

// should use the category MyCategory1 and MyCategory2
$subject = "Subject: [MyCategory1] [MyCategory2] This is my title";

echo "call GetPostCategories with $subject <br>";

$cats = GetPostCategories($subject,"Default Category");

print_r($cats);
echo "<br>";

function GetPostCategories(&$subject, $defaultCategory) {   
    $post_categories = array();
    $matches = array();

    //try and determine category by running the most complicated tests first to look for multiple categories
    if (preg_match_all('/\[(.[^\[]*)\]/', $subject, $matches)) {
 echo "matched on first [cat]<br>";
        preg_match("/](.[^\[]*)$/",$subject,$subject_matches);
        $subject = trim($subject_matches[1]);
    }
    else if ( preg_match_all('/-(.[^-]*)-/', $subject, $matches) ) {
 echo "matched on second -cat-<br>";
        preg_match("/-(.[^-]*)$/",$subject,$subject_matches);
        $subject = trim($subject_matches[1]);
    }else if ( preg_match('/Subject\:\s*(.+): (.*)/i', $subject, $matches))  {
 echo "matched on third :cat<br>";
 $subject = trim($matches[2]);
        $matches[1] = array($matches[1]);
    }else{
 $subject = preg_replace('/Subject\:\s*(.*)/i', '$1',  $subject);
 echo "matched on last no cat<br>"; 
 $subject = trim($subject);        
    } 
     
    if (count($matches)) {
        foreach($matches[1] as $match) {
            $match = trim($match);
            $category =  $match;
           
     echo "Working on $match<br>"; 

          // we have removed the SQL that looks up the categories here

   // just add the category straight into the result for this test so comment out the if statment as category is always set
          //  if ($category) {
                $post_categories[] = $category;
          //  }
        }
    }
 echo "we have " . count($post_categories) . " cats from our subject<br>";
    if (!count($post_categories)) {
  echo "use default <br>";
        $post_categories[] =  $defaultCategory;
    }
 echo "subject is now '$subject'<br>";

    return($post_categories);
}

Things to notice in this test function.

  1. Debug statements to output what is being matched to the screen
  2. The Regular Expression that was the first test is now the third test.
  3. Any regular expression that mentions Subject has the i flag to make it case insensitive in case "subject:" is passed instead of "Subject:"
  4. Any WordPress dependent code has been removed including references to the $wpdb global object that runs database queries.This is not needed in our test function as:


  • a) We don't have the database object instantiated or any other WordPress code included
  • b) I am keeping the test page simple and.....
  • c) the SQL is not the problem.


If you run this test page you should get the following output:

call GetPostCategories with Subject: This is my title
matched on last no cat
we have 0 cats from our subject
use default 
subject is now 'This is my title'
Array ( [0] => Default Category ) 
call GetPostCategories with Subject: MyCategory: This is my title 
matched on third :cat
Working on MyCategory
we have 1 cats from our subject
subject is now 'This is my title'
Array ( [0] => MyCategory ) 
call GetPostCategories with Subject: -MyCategory1- -MyCategory2- This is my title 
matched on second -cat-
Working on MyCategory1
Working on MyCategory2
we have 2 cats from our subject
subject is now 'This is my title'
Array ( [0] => MyCategory1 [1] => MyCategory2 )
call GetPostCategories with Subject: [MyCategory1] [MyCategory2] This is my title 
matched on first [cat]
Working on MyCategory1
Working on MyCategory2
we have 2 cats from our subject
subject is now 'This is my title'
Array ( [0] => MyCategory1 [1] => MyCategory2 ) 

As you can see this test function now works with all the Category formats that Postie supports.

You can now just replace the regular expressions in the function with the correct code by copying the following function over the original and uploading to your server before running a test.


/**
  * This function determines categories for the post
  * @return array
  */
function GetPostCategories(&$subject, $defaultCategory) {
    global $wpdb;
    $post_categories = array();
    $matches = array();
    //try and determine category
    if (preg_match_all('/\[(.[^\[]*)\]/', $subject, $matches)) {  
        preg_match("/](.[^\[]*)$/",$subject,$subject_matches);
        $subject = trim($subject_matches[1]);
    }
    else if ( preg_match_all('/-(.[^-]*)-/', $subject, $matches) ) {
        preg_match("/-(.[^-]*)$/",$subject,$subject_matches);
        $subject = trim($subject_matches[1]);
    }else if ( preg_match('/Subject\:\s*(.+): (.*)/i', $subject, $matches))  {
  $subject = trim($matches[2]);
        $matches[1] = array($matches[1]);
    }else{
  $subject = preg_replace('/Subject\:\s*(.*)/i', '$1',  $subject);
  $subject = trim($subject);        
 } 
    if (count($matches)) {
        foreach($matches[1] as $match) {
            $match = trim($match);
            $category = NULL;

   // this code is a bit ropey but I am not re-writing the whole plugin
            $sql_name = 'SELECT term_id 
                         FROM ' . $wpdb->terms. ' 
                         WHERE name=\'' . addslashes($match) . '\'';
            
   $sql_id = 'SELECT term_id 
                       FROM ' . $wpdb->terms. ' 
                       WHERE term_id=\'' . addslashes($match) . '\'';
            
   $sql_sub_name = 'SELECT term_id 
                             FROM ' . $wpdb->terms. ' 
                             WHERE name LIKE \'' . addslashes($match) . '%\' limit 1';
                
            if ( $category = $wpdb->get_var($sql_name) ) {
                //then category is a named and found 
            } elseif ( $category = $wpdb->get_var($sql_id) ) {
                //then cateogry was an ID and found 
            } elseif ( $category = $wpdb->get_var($sql_sub_name) ) {
                //then cateogry is a start of a name and found
            }  
            if ($category) {
                $post_categories[] = $category;
            }
        }
    }
    if (!count($post_categories)) {
        $post_categories[] =  $defaultCategory;
    }
    return($post_categories);
}


And that should be that. We now have a Postie plugin that works and supports the category formats it said it did.

8 comments:

Dave Williams said...

Thanks for this code, it really was a pain not being able to get categories onto my WordPress blog but now with this fix I can!

Well done!

Anonymous said...

Thanks! I was so excited for this fix because I need the same thing and it wasn't working. BUT...the fix doesn't work for me. I have Postie 1.4.4 -- not sure if that's the issue. Has anyone tried this fix?

Thank you,
Mark

Rob Reid said...

Hi I wrote this for version 1.4.3 which I use on 2 sites and it works fine.

I don't know if the author of the plugin has tried fixing the problem in the latest version in a different way but if you are using the new version then I cannot be sure the code is the same AND the fix would work as I don't know what he has changed.

I am keeping with the old version and this fix as it works for me an at least 2 other people I know.

Maybe try the message board on Wordpress or email the author to see what he changed in the new version or revert back to the old one and try this fix.

Wayne said...
This comment has been removed by a blog administrator.
Wayne said...

Thanks for the fix, I am including it in v1.4.5

Rob Reid said...

Are you the developer of the plugin then?

Wayne said...

Yes I am. See http://postieplugin.com/

Rob Reid said...

Oh right, I did check on http://wordpress.org/extend/plugins/postie/ but it showed a Rob Felty as being the programmer which is why I didn't think it was you.

You might also be interested in the fix I did for the XSS hack as it was preventing emails going onto my site due to the content-type: base64 and the regular expression you were using e.g only checking for that word not a function e.g base64( - the same would go for META and the others.

You can see the article and fix here >> http://blog.strictly-software.com/2012/10/fixing-postie-wordpress-plugin-for-xss.html

Hope it helps.

Thanks for your plugin by the way it has been useful.