Decoding IUDX: Data Descriptors

Data silos present formidable roadblocks in the journey of our cities towards becoming data smart. Data silos are isolated islands of data which make it prohibitively difficult to exchange data for its use in other application areas. This prevents active participation of application developers and solution providers which is crucial to tackle the complex urban challenges faced by the cities today. IUDX promises to break these innovation stifling data constructs by enabling seamless exchange of data across silos. IUDX neutralizes the data silo threat using the twin firepower of: (a) Standardized and open data access APIs and (b) Linked Data (LD) based data contextualization. In this article we are going to look at the data contextualization part. In particular, we are going to see the Data Descriptor object in action, which is a JSON-LD object, and the role played by it in improving the usability and interoperability of data provided by IUDX.

Data descriptors provide consistent and uniform descriptions of attributes appearing in data objects. Data descriptor, which is a linked data object represented in JSON-LD, is associated with a given resource and is present in the IUDX catalogue item corresponding to a given resource. Linked data allows one to share semantics to the data attributes, by linking attributes to data vocabularies, in a machine interpretable fashion and thus provides a promising way out of isolation for the data in the silos.

 Let us understand this with the help of an example. Let there be a data resource with ‘id’ ‘cityx-itms/bus-live-loc’ and let the data packet associated with such a data resource be:

{
     “route_id”: “1345AD”,
     “location”: {
       “type”: “Point”,
       “coordinates”: [77.5707078, 13.013814]
     },
     “speed”: 28.00,
     “observationDateTime”: “2020-09-16T13:30:00+05:30”
 }

Associated with each resource a meta-information object is present in the IUDX catalogue which contains a data descriptor object that helps in understanding the various attributes associated with the above data. In short, this object helps in providing context in which the data can be understood and interpreted. For the above example, the resource item present in the catalogue for the resource with id ‘cityx-itms/bus-live-loc’ contains the following data descriptor which a JSON-LD object:

{
   “@context”: [
       “https://voc.iudx.org.in/”,
       {
          “qudt-unit”: “http://qudt.org/vocab/unit/”,
          “dataSchema”: { “@type”: “@id” }
       }
   ],
   “type”: [“iudx:DataDescriptor”, “iudx:TransitManagement”],
   “description”: “Data descriptor for live vehicle position of the public buses in city X.”,
   “route_id”: {
          “type”: [“ValueDescriptor”],
          “description”: “Route ID assigned to the route for the bus.”,
          “dataSchema”: “iudx:Text”
   },
   “last_stop_id”: {
          “type”: [“ValueDescriptor”],
          “description”: “Stop ID/Stop name of the previous bus stop corresponding to the bus in this observation.”,
          “dataSchema”: “iudx:Text”
   },
   “location”: {
          “type”: [“ValueDescriptor”],
          “description”: “The coordinates for the current position of the bus corresponding to this observation.”,
          “dataSchema”: “iudx:Point”
   },
   “speed”: {
          “type”: [“ValueDescriptor”],
          “description”: “The Speed of the bus observaed at the last tracked coordinates.”,
          “dataSchema”: “iudx:Number”,
          “unitCode”: “KMH”,
          “unitText”: “kilometre per hour”
   },
   “observationDateTime”: {
          “type”: [“ValueDescriptor”],
          “description”: “The time at which the vehicle was last tracked.”,
          “dataSchema”: “iudx:DateTime”
   }
}

We note that the above data descriptor object provides important meta-information about all the data attributes present in the data for the corresponding resource. In particular, it provides:

  •  Syntactic Information: Defines the structure and formats of values associated with each attribute in an unambiguous fashion. One can easily, using the above information, write scripts to generate schemas to validate any data packet for this resource.
  •  Semantic Information: Provides the semantic meaning of attributes by linking attributes to concepts defined in one or more vocabularies. Linking is provided by using the JSON-LD ‘@context’ directive present in the data descriptor. In the data descriptor above, the attributes are linked to concepts defined in the default IUDX vocabulary and also to external vocabularies such as schema.org (see unitCode and unitText). As is clear, this allows easy reuse of properties and classes defined in well-established vocabularies which improves the understanding and usage of data.
  • Enhanced Meta-information: Provides additional useful information like units, ranges etc. which improves the usability aspect of the data. For example, in the above data descriptor it is mentioned that speed is measured in Kilometres per hour by providing a UN/CEFACT Common Code, KMH, to represent this unit. Other standard vocabularies for units such as ‘qudt’ can also be used. One can also specify the ranges, resolutions, measurement accuracies associated with a data value in case such information is known apriori.

Apart from providing semantic linkages, data descriptors also help achieve interoperability.

JSON-LD features, such as expansion and compaction, enable easy remapping of data attributes from different sources into a common harmonized set that is easy to work with for an application developer. In our example above, if data from another source contains “routeIdentifier” instead of “route_id”, then it can be easily handled by a small modification in the context of the data descriptor of the second source. The context should additionally contain a mapping of “routeIdentifier” to “iudx:route_id” that will enable JSON-LD to automatically translate ‘routeIdentifier’ to ‘route_id’. Such remapping is essential in helping to connect data silos and helps improve data interoperability across independent data verticals which were not necessarily designed with that objective in mind.

The reader is encouraged to export the above example data descriptor object into JSON-LD Playground and experiment with various JSON-LD tools and applications. The reader can also explore data descriptors for resources listed in IUDX catalgoue (using IUDX Catalogue APIs), such as data descriptor for Pune AQM resource, data descriptor for Surat Transit Bus Position resource etc.

Thus, we have seen how the Data Descriptor helps in making data more understandable, usable and inter-operable. Harmonizing the data models by adopting common vocabularies improves the data usability further. In an upcoming article we will look into Smart Data Models, which is one such collaborative initiative to develop and adopt common data models.