Monday, September 29, 2008

A Portable Language for Real Time, Interactive, Rich Media

Following is a draft section from an in-progress whitepaper on ETL-V:

The key to enabling visualization as a core capability is robust description of functionality that is machine portable and parsable through a universal processor that enables the serialization and deserialization of any object, attribute, and behavior in a standard way. Such a description captures the language of visualization in a way that can be understood by any program or person.

Our machine portable language is not intended to express low-level artifacts like vertex points and surface normal vectors, although it can. Most often we follow the model of HTML, which uses URLs to reference images, movies, and other digital multimedia. A 3D Model is not unlike an image:

<img src="images/car.jpg" />

<Model url="objects/car.lwo"/>

Moreover, our language expresses high level artifacts that describe things a person without prior subject-matter expertise might easily recognize. Camera, Model, and Light are but a few examples of rudimentary artifacts in our language. Yet, to establish a robust platform for visualization we must be able to describe artifacts of greater complexity. We must describe the attributes of, for example, ColumnChart, AreaChart, FlowChart, TreeMap, and RadialTree.

Some objects have implied behaviors such as a Sequencer that iterates through a list of items or a Directive that traverses a graph.

Our language must contain more than nouns; it must also contain verbs. Therefore we have command objects like Append, Remove, Interpolate, Morph, Notify, and many others (see also Command schema, below).

Our language must be capable of describing attributes having multiple dimensions, for example a 3-dimensional position. We classify attributes as belonging to one of four basic categories: primitive, complex, compound, and collection. These are defined as follows:

  1. Primitive Attribute – an attribute that is expressed using a single value; ex., integer, floating point number, character; we also consider string to be primitive
  2. Complex Attribute - an attribute that is expressed using multiple primitives; ex., 3D vector of floats
  3. Compound Attribute – an attribute that is itself an Object containing other attributes; Animation has Keyframe attribute, Keyframe is an Object with its own attributes
  4. Collection Attribute - an attribute that represents a data structure, such as a list, vector, or hash map. A Collection attribute can hold any other kind of attribute.

Given this language, we can express any visualization technique, behavior, or interaction style as a human and machine readable document capable of being interpreted at runtime. To demonstrate our concept for this machine portable language, we have created Bridgeworks Markup Language (BML). BML is defined using Extensible Schema Definition language (XSD). Documentation and source schemata are available online in the following locations:

Documentation of BML:

http://dev.bridgeborn.com/Bridgeworks/Schemata/doc/Bridgeworks.html

Schemata: (relative to http://dev.bridgeborn.com/Bridgeworks)
/Schemata/Bridgeworks/Bridgeworks.xsd
/Schemata/Common/Command/Command.xsd
/Schemata/Common/Directive/Directive.xsd
/Schemata/Common/Evaluator/Evaluator.xsd
/Schemata/Common/Node/Node.xsd



[Update 12.02.08: realized the schemata listed above do not contain all of the elements documented...not sure what happened...see BwSchools for full list. Schemata are best viewed in XMLSpy.]

We use Bridgeworks to provide our customers with the largest and most flexible set of visualization capabilities available to them on the market. We deliver these capabilities through a single lightweight runtime that scales between personal devices and large facilities. The runtime engine operates through simple input/output mechanism for parsing text-base, well-formed data, making it easy to integrate without a lot of API programming.

Platform Functional Requirements:

  1. The capability must provide graph layouts (ex., radial tree, cone tree, force-directed graph)
  2. The capability must provide chart layouts (ex., bar charts, line charts)
  3. The capability must provide geospatial layouts (ex., maps, globe, terrains)
  4. The capability must provide high-fidelity 3D modeling and animation
  5. The capability must provide video and audio
  6. The capability must provide hybridized views (ex., Extrude Map: geospatial + chart)
  7. The capability must provide split views (ex., side-by-side comparison)
  8. The capability must provide layered views
  9. The capability must provide a temporal dimension
  10. The capability must provide a mechanism for user-defined attributes
  11. The capability must provide a mechanism for user-defined events, and triggers

Platform Technical Requirements:

  1. The capability must provide multi-dimensional graphics (2D, 3D, 4D+)
  2. The capability must provide an engine that can change views and behaviors at runtime using only a text based, interpreted language that does not require compilation in order to add/modify/delete content in real time
  3. The capability must provide serialization and de-serialization of data and objects into and out of rich media views using:
    1. XML
    2. JSON
    3. CSV
  4. The capability must provide serialization and de-serialization of standard object notations (SONs, ex., JSON) into and out of views
  5. The capability must provide Object/Attribute Reflection
  6. The capability must provide storage of domain data as Objects’ attributes loaded in memory
  7. The capability must provide an extensible API
  8. The capability must provide ECMA script language binding
  9. The capability must provide COM language binding
  10. The capability must provide Java language binding
  11. The capability must be able to render from multiple 2D and 3D file formats, including but not limited to:
    1. ESRI ArcView Shape
    2. LightWave Object
    3. Keyhole Markup Language (KML)
    4. CAD Models (AutoDesk, SolidWorks)
    5. JPEG
    6. Bitmap
    7. Extensible to other formats as needed

  12. The capability must be able to render from data located in local and remote URLs
  13. The capability must embeddable in the following software applications:
    1. Internet Explorer versions 6 & 7
    2. Microsoft Office 2003 Products: Word, Excel, PowerPoint
    3. Mozilla Firefox
    4. Win32 Graphical User Interfaces (GUIs)
    5. .Net/C# GUIs
    6. Java GUIs
  14. The capability must provide an XML schema and a document object model (DOM)

Visualization as a Service


Visualization as a service is an extension of our core capability concept of visualization as a platform. Because we can express views, behaviors, and interaction models without deviating from our standards for reasonable portability we can publish and share views, behaviors, and interaction models just like any other content. This means that web services and other data servers can produce human and machine readable documents that describe visualization, behavior, and interaction. Furthermore, these documents can be parsed to create or modify any visualization, behavior, or interaction technique at runtime without intermediate compilation. Depending on transfer frequency, volume of data, and security constraints, data describing visualizations, behavior, and interaction can also be cached for use in “offline-mode” with so-called “rich Internet applications” (RIAs).


[Update 03.14.2009 - Just watched Tim Berners-Lee's TED Talk on Linked Open Data and recently watched Vint Cerf's Feb 09 lecture to FAA. Want to add here that a document-based approach is the best way to ensure that visualization follows the important lessons taught by these great teachers. Binary objects can be be linked using HTTP and metadata, but binary objects are not open. Binary objects also always require a specialized software reader to be interpreted. Documents, on the other hand, are what the Web was designed to link and text will always be readable by readily available software on the web. Even if one rendering engine disappears it is always possible for another to come along; ex., if rendering an old visualization document as graphics ever becomes important hundreds of years in the future. I realize I am contradicting this principle by encouraging the use of existing binary formats for images and 3D models, but this is a compromise I am willing to make at this point. Check out BML Hello World. KML is important and wonderful, but it is just scratching the surface.]

Tuesday, September 23, 2008

Friday, September 5, 2008

Flare gets DDV

I have learned much from the work Stuart Card and Jeffrey Heer. It was their research on graph layout algorithms that made possible the modularization of graphing capabilities in software. Heer's premier software implementation of this research is Prefuse. Computer Science taught us Graph Theory and the structure, manipulation, and traversal of nodes and edges. Prefuse taught us Radial Tree, Force-directed Graph, and Degree-of-Interest Tree. And Prefuse is excellent not just because it good at graphics and layouts but because it is also document-driven, i.e., it runs on XML. Sure, there's a Java API, and that's probably what most developers focus on. But the fact that it parses XML, that it recognizes a vocabulary and language for visualization, and that this language can be expressed as documents is what I think makes Prefuse special.

After Prefuse came Flare. Flare is a wonderful application of prefuse ported to ActionScript.

Flare really gets document-driven visualization. I was just browsing through the source when I unfolded the data/converters package:


See also (emphases added by me):


ClassDescription

ConvertersFactory class for looking up the appropriate IDataConverter for a given data format.

DelimitedTextConverterConverts data between delimited text (e.g., tab delimited) and flare DataSet instances.

GraphMLConverterConverts data between GraphML markup and flare DataSet instances.

JSONConverterConverts data between JSON (JavaScript Object Notation) strings and flare DataSet instances.
This is certainly familiar.

Thursday, September 4, 2008

Tufte on document-based platforms

This just posted at the end of a new thread on Ask E.T. about Google Chrome:

(Emphasis added by me)

"How's the browser?

Thanks to the many contributors who pointed out Scott's kind link in the example of amazon search.

David Pogue's report in The New York Times suggests this browser leads to an integrated platform with non-proprietary formats (very good) but is still application-based rather than document-based (not very good)."

-- Edward Tufte, September 3, 2008

This is something I think constantly about - document-based platforms can serve end-to-end needs from data extraction, transformation, portability, and visualization. See also, the flow on Document Driven Visualization