Paper ReviewPolaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases

November 22, 2020
Visualization

📖 Link to the Paper - Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases
Stolte, Chris, Diane Tang, and Pat Hanrahan. "Polaris: A system for query, analysis, and visualization of multidimensional relational databases." IEEE Transactions on Visualization and Computer Graphics 8, no. 1 (2002): 52-65.

Main Contribution

The research problem in this paper is data visualization for exploring large multidimensional relational databases. With emerging commercial data warehousing and massive data collections, extracting meaning from data has become an unavoidable and challenging task for many fields including business intelligence. Driven by the unpredictable nature of exploratory analysis, the proposed tool must be able to switch visualizations rapidly in the process of hypothesis, experimentation, and discovery.

This paper introduced Polaris, an interactive system for exploring and analyzing large multidimensional databases. Polaris introduces a unified way to specify visualization, including a UI for generating graphical display and relational queries from the visual specifications, enabling fast and intuitive visual feedback to users.

Method

In the overview, the authors talked about the key characteristic to support the exploratory analysis of large multidimensional databases: data-dense displays, multiple display types, and exploratory interface. These are addressed in Polaris by providing an interface that progressively generates table-based displays consisting of rows, columns, and layers, as tables are multivariate, naturally comparative, and familiar to most users.

In Polaris, the users can drag-and-drop fields from the schema onto shelves, where the configuration of analysis and visualization operations is referred to as a visual specification. The interface is then interpreted as visual specifications, including table configurations, graphic configurations for each pane (table entry), and configurations for the visual encodings.

Firstly, table algebra is formally used for table configuration, which is generated when a drag-and-drop is done to place a field on the shelf. X,y, and z axes are specified using columns, rows, and layers. A valid expression is an ordered sequence of operators (concatenation, cross, nest), and operands (ordinal and quantitative fields of the database). Using multiple data sources is supported in a single visualization through layer mapping. Secondly, the next step is to specify the type of graphic in each pane. Here, the system categorized graphics into 3 families by the dependency relationship between the axis variables, including ordinal-ordinal graphics (e.g. table), ordinal-quantitative graphic (e.g. bar chart), and quantitative- quantitative graphic (e.g. map).

Lastly, the visual or retinal properties of the selected mark from the last step also needed to be specified, where the system will generate an effective mapping from the domain of the field to the range of the visual property. Building on this interface foundation, the authors then discussed additional features enabling a highly interactive process of data transformation and queries.

What do I think?

This project banks on the idea of automation (compared to Pivot Table) of human tasks, where it visually expresses data by translating drag-and-drop user actions into logic and queries. Some improvements could be made to further automate the process of visual specification generation and maybe extending formalism to 3D. This paper introduced two case studies where Polaris systems could be used in the exploratory analysis of data visualization.

It’s convincing that this system has a promising outlook in real-world scenarios, though it is lacking in user and performance evaluation. As the authors point out in the discussion, effectiveness is valued over efficiency in designing the exploratory interface, and it could take even “several tens of seconds” for a query to happen.

However, this trade-off could be better balanced, as the datasets are getting exponentially large and the expectation for software interactivity is increasing. Nevertheless, this is clearly a research project that leads to huge commercial success in data visualization and pushes the industry forward. Seeing the development of Tableau from 2003 to now, it’s quite amazing to see how a research project has turned into a $15.7 billion business, evaluated at the time of the Salesforce acquisition in 2019.

On the side: This is a few years old, but still really insightful: People, Data, and Analysis by Pat Hanrahan (author of Polaris, and later on co-founder at Tableau). It’s not directly related to the paper, but he talked about some relevant grand ideas around data visualization. Also, this was from a public lectures series hosted by SFU!