It turns out that my grand idea of pulling content from all over the web into one central page isn't new, fresh or unique. There's even a crappy term for it; lifestreaming. I hate that almost as much as I hate the word blog. Oh well, what can you do except whinge about it on your blog and update your lifestream with rants....
I'd been having some issues with the "merge multitple items from different sources into one post" set up that I had hacked into place on my lifestream page. The logic behind it seemed solid, if two consecutive items from different feeds had the same title, they'd get merged. This fell over if I added things to multiple feeds in a short period of time. As such it broke very often since I use spurl to bookmark interesting things and picasa to grab photos / screenshots as I surf the web and drink my morning coffee.
Now I've changed the system to trawl through all current items in the merged rss and grab any with the same title. This works perfectly for the front page, but I'm going to have to do some more code hax to get it working for the archives (which are pulled from a database rather than an rss feed).
Update: I spent some time this afternoon trying to work out a solution and ended up ditching the old method and approaching it from a completely different angle. The sticking point was dealing with the archives; how do you tell how many merged items are newer than than the content you're pulling from the archives to display on a certain page? What do you do when two items you want to be merged straddle either side of an archive page 'block'?
I've set up a photobucket account and will now be using that to store images which I want to tack onto things that I spurl. I have my online rss aggregator sucking in the rss feed from photobucket and the main one from yahoo pipes. The php page runs through the items in my lifestream and looks for any items in the photobucket feed which have matching titles. The photobucket system also saves me from having to download images from web pages, resize and upload them to picasa. I just use a nifty firefox plugin which lets me upload online images into my photobucket account.
I decided to stop doing two different things for the front and the archive; now all content is pulled out of my database. This kind of goes against the whole idea of removing reliance on me running my own host, but having a mysql db is pretty standard with all simple hosting deals. So I don't really need a whole box or root or anything. I just need a small amount of hard drive space, apache + php + mysql and a cron job to run my online rss aggregator and the lifestream page.