Since I’m building my own homepage, I recently learned how to scrape an RSS feed in order to dynamically create content for a website. The idea is that I would have separate feeds from my twitter, tumblr, and this blog all in one place. The tumblr and twitter feeds are offered by those companies in the form of API calls, so using those two is very straightforward in both cases. When self-hosting a wordpress blog, though, as I do, there is no readymade option for a feed that one can just call and have ready to go. So I had to make my own.
An RSS feed is essentially just an XML markup document that browsers interpret and show you in some type of feed form. What I wanted to do with this document (which wordpress produces for me) is essentially the same as what the browser does with it when you click an RSS link – it parses the tags in the XML document and applies predefined visual styles to make the information accessible to humans.
PHP can perform this process quite simply, via the simplexml_load_file function, which provides a simple framework for parsing XML documents.
$feedUrl = 'http://emmettbutler.com/threestegosaurusmoon/?feed=rss2';
$ret = array();
// retrieve search results
if($xml = simplexml_load_file($feedUrl)) { //load xml file using simplexml
$result["item"] = $xml->xpath("/rss/channel/item"); //divide feed into array elements
foreach($result as $key => $attribute) {
$i=0;
foreach($attribute as $element) {
if($i < 3){
$ret[$i]['title'] = (string)$element->title; //assign the desired elements to array entries
$ret[$i]['timestamp'] = (string)$element->pubDate;
$ret[$i]['summary'] = (string)$element->description;
$ret[$i]['link'] = (string)$element->guid;
$i++;
}
}
}
}
After the initial call, this code examines each unit of the divided document and assigns the contents of certain tags to elements of the $ret array. For example, there is a line in each item of the feed that is denoted by the pubDate tag, which contains the date that a certain post was published. The line $ret[$i]['timestamp'] = (string)$element->pubDate; finds those tags and assigns their contents to the $ret array. Once this loop is complete, you’ll have an array full of all the pertinent data for your feed. You can loop through the array and print each element between the appropriate tags, style with a bit of CSS, and you have yourself a homemade and very professional-looking RSS feed widget on your website.


