Saturday, March 23, 2013

App for Markov Chaining Facebook statuses

As a small but cool programming project, I thought it'd be interesting to Markov chain friend FB statuses. The memory-less nature of Markov chains make it pretty simple to implement with a dictionary and a random function in Python: http://en.wikipedia.org/wiki/Markov_chain.

This project was pretty straightforward but still interesting. It was mainly to experiment with Markov chains. I had originally wanted to do FFT or fast fourier transforms to identify frequencies in music and then Markov chain the different notes together. It would be extra cool if we could simply enter different YouTube links in and have them parsed.


The source is here: https://github.com/deloschang/markov-chain-fb-statuses

Sample result:










In general, a larger corpus yields "higher-quality" results. So I set off to scrape as much as I could from my Facebook friends. Here's how I did it

First, I implemented Facebook Auth with sufficient permissions to check friend statuses. With the access token in hand, my objective was to iterate through each of the statuses and scrape the message. Also, any comments from the user within those status too.

Facebook limits the number of statuses per API call so the offset parameter will need to be looped like so:


    full_data =  graph.get(FB_DESIGNATED+'/statuses?limit=100&offset='+str(offset))
   ....open corpus.... 
    while not not full_data['data']:
       ......  scrape the data here .......
        offset += 100
     
        full_data = graph.get(FB_DESIGNATED+'/statuses?limit=100&offset='+str(offset))
 
    corpus.close()

 The 'not not' is quite pythonic and checks if we've reached the end of all the status update loops. If so, Facebook API will return with empty data.

Once I scraped the API calls, I just needed to save them to an external file. Then Markov chaining them involved iterating through each line, splitting the words and placing them in correct key value pairs.

In the near future, beyond uploading it onto a server, I probably won't be updating this project further. But here are some interesting ideas for anybody that wants to fork this repo.

  • Integrating with twitter. You could go into friend's about mes and look for Twitter IDs. Then scrape the twitter posts for a larger corpus. 
  • Virality. Give every person their own page for Markov chaining their statuses. Then when friends Markov chain each other, they can simply link their friend to that page via automated FB post. And when those friends come check their page out, encourage them to Markov chain their own friends, hopefully leading to a viral coefficient > one.
  • Integrate with photos. As a developer who worked on an Internet memes startup, it'd be sweet to add these Markov chained texts as captions to random friend photos. I don't think this has been done yet and it'd be really interesting to see the results.
Actually.... now that I've listed these ideas out, I'm a little tempted....

No comments:

Post a Comment