Delos Chang's blog: March 2013

Sunday, March 31, 2013

Course Auto-scheduler Webapp

https://github.com/deloschang/auto_class_scheduler

This app uses Google's oAuth2 Calendar API to automatically schedule your classes onto calendar.

It does this by scraping the Timetable for all available courses and cross-references them with the Dartmouth Meeting Time diagram here: http://oracle-www.dartmouth.edu/dart/groucho/timetabl.diagram.

All this information is loaded into a Postgres database and referenced for Calendar API event insertions.

Thursday, March 28, 2013

API Improvements for FoCo Nutrition Scraper App

I built a FoCo (Dartmouth's dining hall) nutrition scraper API app that lets you query by year / month / day and returns a JSON listing of food offered in the dining hall that day, with associated nutrition facts for every single food item.

https://github.com/deloschang/foco-nutrition-api

Integrated this with a mobile app that lets me create my own "food diary" application. I can enter in my food portions and calculate exactly how much macronutrients I'm taking in.

Eventually, we plan to merge this in with Timely as a zero-action feature. Imagine walking into FoCo and Timely pulls up the food offered that day with the associated nutrition facts seamlessly and automatically.

Monday, March 25, 2013

App for watching Courses automatically

For all the Dartmouth students who are struggling to find their Spring courses, I wrote an app that will monitor full courses and tell you when there is an availability via email.

Here is the code on github:

https://github.com/deloschang/course-watch/blob/master/coursemonitor.py

Basically, given the class you want to search, it scrapes Dartmouth's Course timetable online via post requests and then looks for the appropriate cells. Then it will sleep for half a minute, or however long you want, before checking again. If there is an availability, it will notify you via email.

Feel free to fork and improve on it.

Saturday, March 23, 2013

App for Markov Chaining Facebook statuses

As a small but cool programming project, I thought it'd be interesting to Markov chain friend FB statuses. The memory-less nature of Markov chains make it pretty simple to implement with a dictionary and a random function in Python: http://en.wikipedia.org/wiki/Markov_chain.

This project was pretty straightforward but still interesting. It was mainly to experiment with Markov chains. I had originally wanted to do FFT or fast fourier transforms to identify frequencies in music and then Markov chain the different notes together. It would be extra cool if we could simply enter different YouTube links in and have them parsed.

The source is here: https://github.com/deloschang/markov-chain-fb-statuses

Sample result:

In general, a larger corpus yields "higher-quality" results. So I set off to scrape as much as I could from my Facebook friends. Here's how I did it

First, I implemented Facebook Auth with sufficient permissions to check friend statuses. With the access token in hand, my objective was to iterate through each of the statuses and scrape the message. Also, any comments from the user within those status too.

Facebook limits the number of statuses per API call so the offset parameter will need to be looped like so:

full_data = graph.get(FB_DESIGNATED+'/statuses?limit=100&offset='+str(offset))
....open corpus....

while not not full_data['data']:
...... scrape the data here .......
offset += 100

full_data = graph.get(FB_DESIGNATED+'/statuses?limit=100&offset='+str(offset))

corpus.close()

The 'not not' is quite pythonic and checks if we've reached the end of all the status update loops. If so, Facebook API will return with empty data.

Once I scraped the API calls, I just needed to save them to an external file. Then Markov chaining them involved iterating through each line, splitting the words and placing them in correct key value pairs.

In the near future, beyond uploading it onto a server, I probably won't be updating this project further. But here are some interesting ideas for anybody that wants to fork this repo.

Integrating with twitter. You could go into friend's about mes and look for Twitter IDs. Then scrape the twitter posts for a larger corpus.
Virality. Give every person their own page for Markov chaining their statuses. Then when friends Markov chain each other, they can simply link their friend to that page via automated FB post. And when those friends come check their page out, encourage them to Markov chain their own friends, hopefully leading to a viral coefficient > one.
Integrate with photos. As a developer who worked on an Internet memes startup, it'd be sweet to add these Markov chained texts as captions to random friend photos. I don't think this has been done yet and it'd be really interesting to see the results.

Actually.... now that I've listed these ideas out, I'm a little tempted....

Thursday, March 21, 2013

Concerning Privacy Concerns

I'm seeing many "data-learning" apps shy from speaking up about the data-collection portion of their application and how that affects user privacy. How much data are you collecting? Where is it all going? It's intuitive: people like privacy and if you collect more data about them, they'll protest and leave. Right?

This isn't a post about privacy concerns.
Rather, it's a post about how we should be asking for more data.

What?

Like Timely, these are the kinds of apps that gather data about you and process it to offer you some service, hence some value. To clarify, these are not apps that collect data they don't necessarily need (doesn't that remind you of spyware?). That's a big distinction that I think people conflate so often that in most people's minds, data collection = bad.

I believe that apps like Timely that can leverage more data to provide more value shouldn't fear. Of course, it needs to be a sufficient amount of value or people wouldn't risk their personal data to even use it.

The privacy status-quo is changing. Look at how people gave up their privacy for Facebook. Sure, there are privacy concerns about where our data is going and how people are using it, but overall, it seems that becoming more socially interconnected was worth the price. The common response now to privacy evangelists is "if you want to keep your data, don't use Facebook."

Leveraging more high-quality data spurs innovation. It unlocks doors. Big Data, though hyped, is real and it's powerful. From an application point of view, data is data is data. Higher quality data intelligently processed means more value, although this isn't always the case.

The problem for developers is that talking about how you want more user data isn't just eccentric, it's creepy by current social standards. So many app developers tiptoe around this by choosing not to talk about it or obfuscating the topic.

It's obvious they aren't following through with their convictions because they fear negative backlash. That's fair; people tend to be conservative with their personal data. But I'm curious.

What if they had followed through on this insight?

What's possible?

Tuesday, March 19, 2013

Playing with Bashrc

Recently discovered how time-saving aliases are. For git, the following are what I use the most:

alias gp='git push'
alias glog='git log'
alias gs='git status'
# for easy git committing
function gc() {
git commit -m "$*"
}

Because alias doesn't accept parameters, function gc() allows you to type in something like:

gc this is my commit

Or for a bash script that commits and pushes to server, you could write:


echo "Please type commit message"

read commit_message

git commit -am "$commit_message"

git push 

"Pushing onto repo"

Also, essential for aliasing is:

alias bashrc='mvim ~/.bashrc && source ~/.bashrc'

This will let you edit bashrc and then source it automatically afterwards.

As I commit pretty often, these aliases have saved me quite a bit of time.

Saturday, March 16, 2013

GDB Set disassembly

Recently saw that many linux books are referencing "set dis intel" or "set disassembly intel" when looking at breakpoints and registers. In OSX, this spits out the ambiguous response "Ambiguous set command."

This can be fixed using "set disassembly-flavor intel." And to preserve it on startup, simply set it in ~/.gdbinit via sudo echo "set disassembly-flavor intel" > ~/.gdbinit

Tuesday, March 12, 2013

Seed funding for Timely!

It's been a busy but great week so far. We spent about 1.5 weeks building a beta product and received $2000 in seed funding.

What is Timely?

At Timely, we believe that time management is broken.

When we think about time management, it means inputting events in the future and then trying to budget our time based on how long we think our tasks will take. But this is crazy! Why are we budgeting our most valuable asset based on guessed estimations? It'd be unwise to budget our money that way.
There's no easy way to look back and evaluate how you've spent your time. You could open your calendar, but you'd really only see fixed events. There are some applications like RescueTime (which is a great solution for tracking time on your computer) to other automatic time-tracking applications. But many applications we've used are pretty disconnected from our lives. We're changing that.

We're incredibly excited about Timely, not only because it's an idea that solves our own problems, but because it's also an interesting programming challenge. It seems to me that smartphones are still in their nascency -- location-tracking and memory capabilities are limited and have a long way to go. In fact, Paul Graham notes that getting to the edge of smartphone programming could take a year.