Tuesday, August 7, 2007

Experiences and Observations in Network Visualization

I built a web app called g.licious that uses graph layouts to visualize relationships in del.icio.us data. I built the tool mainly to test our graphing package in Bridgeworks together with AJAX and web services. Visualization of relationships using graphs is certainly not new. In fact, there has been plenty of criticism of this approach. Indeed, the criticism is not without basis. Regardless, people keep doing it. The reason we keep doing it is because the approach itself is quite sound. Results happen. How people choose to use graph viz is another story. So what is it that makes good graph viz and bad graph viz? Well, I have to say that I can't pretend to know even a tiny fraction of what Jeffrey Heer and Stuart Card know. They defined graph viz. They've studied it more than anyone I know of and that's how I learned. Still, I think my experience has taught me a few things that are worth recording. In the spirit of visualization, I thought I'd start with a few observations from g.licious.

Starting with the basics...here is my del.icio.us network:
The graph layout is called Radial Tree. Radial Tree is a prototypical "degrees of separation" view shown as concentric circles about a center point of interest. Here I'm in the middle ("prestidigital"). The first circle of names around me are the friends in my immediate network. They are connected to me by the green lines. Beyond that are friends of my friends. They are connected by blue lines. The gray lines are called the "back graph." It shows which of my friends and friends' friends know each other. My networks has some quirks. At least one person has two del.icio.us identities. Another name was added as a case of mistaken identity. (Those are topics for another discussion.) This view shows my extended network in one place. It shows me relationships among my friends. That's somewhat useful and interesting. It's more than the del.icio.us network view provides. But that kind of value added is a nice-to-have, not a must-have. The real value of a view like this comes when I use it as a foundation on which to add layers of information, when I use it to ask questions. Here is my network with "traffic":


In this view nodes representing my friends are sized according to how many links I have sent them using the del.icio.us for: tag. But then...why do I really care to see where traffic in my network is going? Well, this view answers the question "how much?" So maybe I and my are placeholders for distributor and supplier, producer and consumer, seller and buyer, caller and callee. Conveniently, one of the keys to success with this view happens to be the fact that I have a small network. What happens when I have a HUGE network? Hold that thought. Suffice it to say now that what's important isn't necessarily how much I can see at once, but rather what I can know based on what I see. Navigability through levels of detail is also important. But for now, I want to keep it simple. There's a lot I can learn, pro and con, from this tiny little set of relationships.


This view of my network shows communities of interest:



At the bottom half of the view are researchers and librarians I know that work in the same library. At the top, those who are closest to me are in fact office colleagues, two co-founders of my company, my wife, and one of my closest friends. That seems to say something important. (The graph layout is called a Force-Directed Graph. The general idea is that the edges act like springs.) Now, the thing with this bit of knowledge about my graph is that it seems hard to generalize beyond a specific set of circumstances that occur here. The reason I know how to interpret his view is because it's my network. People in my network might also know how to read it. But what if the relationship is not people?

Here is a view of my del.icio.us tags:


Now, this isn't the greatest view in the world, but I have to keep mind a few things: 1) I made it in a hurry, 2) the picture is really a screen capture of an interactive 3D visualization that is easily manipulated with a mouse, keyboard, or clever software, and most importantly 3) there are plenty of people that can easily identify clusters of keywords that belong together. WSP+music+band+setlist+Virginia locates the setlists from Widespread Panic concerts I attended (in my home state). The dates 2006 and 2007 describe "when" and each points to the show I saw here that year. DoD-.mil-software-certification points directly to information about Department of Defense policies and procedures for getting software certified on .mil networks. There are several other clear relationships in the larger spline to the right.

Ok...so there's a lot to consider here and I'm getting tired. More later... The images here aren't great because they are small. I tried to link to my Picasa web albums. Ironically that didn't work in Blogger (both are owned by Google...again, another story). So here's a direct link.

No comments: