Wednesday, March 28, 2007

Wasting time at Microsoft

Ryan's Tech Blog (good name) recently had a nice ranting post about how Microsoft wasted 6 months of developer time re-creating a source control client for Codeplex.

"This problem is ingrained at Microsoft, which feels the need to brand everything, but it is in no way limited to them. A search on Sourceforge for “issue tracker” gives 585 results. Sifting through those to pick a winner is difficult."

It's hard to argue with his logic. He makes a strong argument about the need to work with extensible tools that can plug into each other. I think this exemplifies one of the basic differences between Windows and Unix tools that I've noticed: Windows tools are designed as "solutions", whereas Unix tools are designed to be extensible. It's almost unheard of to see Windows tools that are designed to be pluggable, but in the Unix world, the opposite is true.

Monday, March 26, 2007

Collaborative Filtering: An introduction

Recommendation algorithms are hot right now. NetFlix, Amazon, and StumbleUpon all have examples of working recommendation algorithms that are not only interesting, but useful. Collaborative filtering is the method of prediction that has been widely adopted across these sites.

The underlying assumption with collaborative filtering is that your preferences in the past will help predict your preferences in the future. Assuming that this is true, CF makes itself unique by finding users with similar preferences as you. Once this neighborhood of users is found, it can be reasonably assumed any items they like, you will also like, and hence, an intelligent recommendation can be made.

Using the NetFlix analogy, if you and another user named "Jack", who is completely unknown to you, have rated the same movies in the same way, it can be assumed that you have similar tastes. Therefore, if Jack rates a new movie very highly, the likelihood that you will also like that movie is very high, and it can be recommended.

There are issues with this algorithm, of course. First, you have to build up a reasonable list of preferences for the algorithm to make successful matches of other users. If you only rate 1 item ever, you can't expect great recommendations to come your way. Second, if your tastes change, the algorithm will be able to pick it up, but only with time. There must be time for the new change to affect the weight of your past selections. Last, this algorithm doesn't scale very well. For massive sites like NetFlix and Amazon, I'm sure there are some sort of caching or iterative methods involved so that the recommendations aren't recomputed everytime you login to the site. With millions of users and millions of products, it would simply take too long to load any sort of user-specific recommendation.

There is another method way to implement CF outside of the user-centric CF model described above. You can do an item-centric recommendation too. Rather than find a neighborhood of users with similar tastes, you can look at the specific items you've rated well and find a neighborhood of items with that share similar traits. If you're in a world where the number of items doesn't grow as fast as the number of users, this algorithm would scale much better.

Monday, March 19, 2007

Adobe Apollo Launches

Adobe Lab's project Apollo has officially launched today. It's an interesting project that lies in the realm of creating "Rich Internet Applications", a buzzword I've been seeing a lot lately.

Apollo is essentially a runtime, similar to Flash. What makes this new and different is that the Apollo technologies are focused on bringing web functionality back to the desktop. It tries to solve the online-offline data problem that people currently have with mail, contacts, news, etc. Gmail is a great application, but when I'm offline (on a place, perhaps), I can't write emails in Gmail to send later. If, however, there was a Desktop app that synced with Gmail when you were back online, that would be pretty sweet.

Apollo is a recognition that as wired as we are, and as popular as the web has become, some major functionality and synchronization issues were not addressed in the jump from the Desktop to the web. Some smooth transition apps that allow you to host your data online on central nodes, but then have that data available locally on distributed nodes, would be very nice to have.

Read more about it here.

Wednesday, March 14, 2007

Bash has a debugger

One of my biggest complaints with interpreted languages in general is that the debugging tools available are rather limited. When trying to find the root of a problem, having some sexy surgical tools available to cut into your program are nice. No, more than nice - sometimes they're necessary.

To my surprise, I discovered that Bash has a debugger! Bashdb, where have you been all my life? You can set breakpoints in your code, and step through the script line by line if desired. In terms of basic functionality, it's matches gdb fairly closely. Amazing!

I've used the PHP debugger XDebug before, but I found it to be a bit intrusive into the source code. You have to add PHP commands to the source to enable XDebug. I've never used Python or JS debuggers. Anyone know if good debuggers exist for those languages?

Wednesday, March 7, 2007

WiX: Windows Installer XML Toolset

While reading "The Build Master" this week, I was introduced to the tool that I find quite interesting: WiX. It's a MSFT produced and supported tool that has been open-sourced (gasp!). Open-source? Microsoft? Useful? My head hurts.

The WiX toolset allows you to create an MSI rather painlessly using an XML format that they've defined. It's actually quite intuitive and useful. The WISE and InstallShield guys have had the market on creating Windows Installers for the longest time. In addition, the horror stories I've heard from people who have the strength to use the WISE / InstallShield tools really caused me to do a double-take when I saw this tool.

In addition, there is no "setup" to speak of with this toolset. It's literally a set of binaries that you can execute from the command-line. There is no installation overhead, making these tools easy to store and version for build escrow purposes.

There is another open-source tool called Votive that ties in to Visual Studio and allows you to create and control the MSI build using WiX from within the GUI. As Borat would say, Niiice.

Sunday, March 4, 2007

The Comfort Zone

Steve Yegge had a really interesting blog post last week about the average programmer. Take a looksey:

"But how do programmers compete? Generally, they just don't. Not in the way chess players or golfers compete, anyway. The reason? You can't compare programmers quantitatively, so you can't compute a score or a rank. Competitions and competitors have to be scored. Sure, you can set up scored programming competitions, but they're so tightly controlled that they don't resemble real-world software development anymore. Professional programmers basically just don't compete with each other.


Hence, you're probably not pushing yourself. Even if you're trying to improve your programming skills, you're probably just doing it in areas you're already comfortable in. And your improvements probably still aren't happening as fast as they would if you were competing to improve them."


Very true. He goes on to discuss how programmers are ranked relative to each other, since there is no quantitative method of ranking programmers, which in itself is completely subjective (and hence, his complaint). Programmers are ranked according to the context they are in.

His main complaint is about the comfort zone we all find ourselves in. He believes that since it is difficult to quantify the skill of a programmer, incentive is provided for pigeonholing yourself into an area you feel comfortable with.

I think this is true in most professions, not just programming, but I find this very true in testing as well. It is very easy to turn to the same tools and oracles to help with your problems and give advice.

"See, I always thought I was a perfectly competent programmer: as good as you can get, basically. I was building cool stuff, doing seemingly complicated things, and I felt I knew a tremendous amount of lore about the art of programming. I had won or placed in programming competitions, could program in Java for weeks on end without referring to the API docs, and pretty much felt on top of things.

Every few years, I would read some critical book, or have some weighty flash of insight, and realize that I'd been operating all this time in what could only be termed "clueless mode", and that I hadn't really known what I was doing after all. Amusingly, I was always relieved that now I could consider myself to be a good programmer, since I now knew whatever it was I'd been missing before.


Last year it finally dawned on me, after 16 or 17 years of this, that I just might possibly still be clueless about something important that I really ought to know, something that would make me a much better programmer."


Word. Hopefully I'll be that inspired and insightful after 16 or 17 years of working.