POIViz: a fast interactive method for visualizing a large collection of Open datasets

  1. INTRODUCTION

    • Reason:

      • In general, the access to a huge amount of information is performed with standard search engines. No specific or advanced visual interfaces are used. As a consequence, having an overview of the content of a large collection of ODS(Open Datasets) is difficult, both for a citizen or a data analyst.
      • With text mining techniques, we extracted 800 features to describe a collection of 300,000 ODS from the French governmental OD site (data.gouv.fr). Then, we built a proximity graph, by connecting together OD that are similar in the feature space.
      • Unfortunately, we were not able to find a graph visualization software that can represent this graph (about 300,000 nodes and 700,000 edges) with fast interactions. Therefore, we were not able to visualize the complete collection but only half of it (i.e., a graph with about 150,000 nodes and 340,000 edges).
    • Aim:

      • reach similar objectives, i.e., to provide the user with an overview and details about a large dataset, highlighting clusters, outliers, and more generally, the data topological properties.
      • fast interactions even for such a large data matrix, i.e. user queries should be processed in a few seconds only.