Data Marketplace : A system design perspective

11 December Blog

Data Marketplace : A system design perspective

IUDX
0 Comments

A data exchange platform such as IUDX or ADeX curates datasets from various vendors based on how reliable, relevant and accurate the datasets are. A byproduct of this process is that the data that’s hosted on the platform ends up being quite valuable. One of the ways to realise this value is to build a marketplace. This encourages vendors to produce data of good quality. Let’s look at the technical aspects of building such a marketplace.

When it comes to designing a marketplace the primary sources of inspirations are the system architectures of Amazon.com or Flipkart as they cater to the definition of a marketplace. Although, there are a few key differences between marketplaces that host physical products versus those that host data, or a collection of data(dataset henceforth), as products. One of the major differences being that purchasing a physical product means to have complete ownership of said product, i.e., buying a complete unit. Whereas purchasing data as a ‘product’ could have different meanings. For instance, access to data can be temporal where the purchasing units can be in days or months. Another way to quantify access is in bytes of data consumed. Another key difference between the two is that sharing of data could be violating the purchasing policy contrary to a physical product which can be shared or even re-sold.

But what is a marketplace afterall? “A marketplace or market place is a location where people regularly gather for the purchase and sale of provisions, livestock, and other goods.” – Wikipedia

A marketplace is a catalogue of products hosted by various vendors(providers henceforth) to be discovered by buyers(consumers henceforth). A user should be able to create inventory of products, search for products on the storefront and make payment to purchase a product. They should also be able to view their past purchases; in the case of a provider they should be able define their products, the variants available for it and, to identify who brought their products.

Note : In the case of a data marketplace the product could be a dataset or a combination of datasets bundled by the provider. Additionally the purchase of a product implies that the consumer is getting consent from the provider to access the dataset(s) of the product. The consumer is NOT getting the ownership of the data.

For example, let us assume that the bus transit data of a city is provided by its municipal corporation. A consumer, such as a company that aims to provide analytics on this, would purchase the right to access data from the marketplace. This would allow them to go get a token data exchange’s authorization server which ultimately will be used to get the intended data. Using this data the consumer will be expected to perform intended operations but will be expected, rather bound, to not share the raw data with anyone. However this binding will be outside the scope of the marketplace or the data exchange platforms. Organisations providing the data will be expected to implement protocols to ensure right of ownership is maintained (such as the digital rights management, i.e. DRM, protocol for video streams).

Now that we are clear with the technical requirements, we can start designing the system. To keep things simple let’s divide the flows into a consumer view and a provider view.

A provider should be able to create a product, delete a product they own and list the purchases made against them. Also a product can have multiple variants (eg. A shirt listed on amazon can come in various colours and sizes) so the provider should be able to create, update and delete variants within each product they host.

Things to note here:

We need to ensure that only the provider of the datasets should be allowed to host products
A dataset can be part of different products
A product variant would need to be mapped to an existing product

To put things into perspective assume that a city municipality has various air quality sensor datasets from different suburbs. They can host these datasets as a product, with a great many variants based on the time of access, amount of data access, type of data access and other parameters. One variant could be with access for 10 days, with a 10GB limit and only with 100 API calls. Another possibility is that the access can be 100 GB but only of archive files.

The same city municipality can host another product with air quality sensor datasets and traffic datasets combined, with its own variants.

A consumer should be able to discover products either by searching by its name, or a dataset name or provider name. As an added UX feature the marketplace should list popular products and datasets for the Consumer landing page. The consumer ultimately should be able to add products to a cart, purchase the items in the cart. Another thing to consider is refunds, which is out of the scope of this discussion for now.

With the design in place we can start to think of data stores, communication channels, payment gateways and so on. Firstly, we can use a relational data store such as PostgreSQL(PSQL henceforth) or MySQL to store the product and dataset information as in this particular case the structure of the objects is going to be consistent. We can use the same data store as above for the purchase information and payment history as the ACID properties of a relational DB come in handy for it. To handle the popular datasets/products store, a cache layer such as Redis can be introduced.

Note that we need to make sure that the purchase flow is robust and the system is decoupled from any failure on the payment gateway. A cart and order service can be introduced which interacts with the gateway. When an order is placed, a payment is initiated there can be two scenarios.

One, the payment succeeds and the order completes. This should trigger the creation of an invoice and clearing of the cart
Two, the payment fails and the order is pending. This may trigger retry on the gateway or create an invoice indicating failure of purchase upon which payment can be manually retried.

In both situations the state of a user session has to be stored.

Also we need to set up an admin flow to verify the purchase of a product(equivalent to the consumer receiving consent to access the datasets in the product, constraint to the variant specifications). Note that the admin here is the DX platform’s authorization server. The verification will be used to issue tokens when data access is requested. A policy is written for a consumer against all the dataset purchased which acts as the source of truth for issuing tokens. This can be achieved by storing a successful purchase in a message outbox, such as RabbitMQ (RMQ henceforth), to which the authorization server can subscribe to.

Fig 2. shows the complete architecture with all the different players interacting with it. For simplicity the Product variant service has been abstracted into the product service itself.

With a basic design set up the system a few open questions prompt up. Will the marketplace platform be an aggregator of funds flowing in the system (like Amazon.in) or an offsite payment that goes directly to the vendor’s account (like shopify)? Will the marketplace be able to handle multiple invoices on a single order? Will it be able to provide digital rights on the data hosted? How will the system scale when the number of users increases significantly?

Any software system is not designed to be completely optimised, highly scalable from the beginning. This is an incremental approach, the system should scale only when required as the cost of over optimization will be very huge compared to the footfall. Although the design should have contingencies in place if and when it happens.

Author:
Pranav Doshetty
Senior Software Engineer

	Shri S S Rajasekhar	Head Applications at Regional Remote Sensing Centres, NRSC / ISRO.
	Shri Pankaj Mishra	Deputy Surveyor General, NIGST, SOI, Hyderabad
	Shri Sanjeev Jha	Lead Architect – Government, AWS
	Shri Sumit Sen	Chief Executive of the GISE Hub at IIT Bombay
	Shri Prateep Basu	Co-Founder and CEO, SatSure
	Prof PP Majumdar	Professor. Department of Civil Engineering, IISc Bengaluru
	Dr Abhay Sharma	CTO, IUDX

	Ms Ramadevi Lanka	Director, Emerging Technologies, ITE&C Department, Govt.of Telangana
	Shri Naveen Kumar V	Founder of NaPanta®\| Serial Entrepreneur \| Digital Expert in Agri Ecosystem \| REX Karmaveer Global Fellow \| SLPian \| tagged as Social Business Torch Bearer for India
	Shri Timmana Gouda	Founder CEO, WhatsLoan
	Shri Vineet Singh	Building impactful products at Digital Green
	Shri Nipun Mehrotra (moderator)	Co-Founder & CEO, The Agri Collaboratory, Co-creating Digital Public Goods for Agriculture – in Open Source with the Ecosystem & Government

	Shri Mathew Chacko	Partner, Spice Route Legal
	Shri Parminder Jeet Singh	Independent Digital Researcher
	Ms Saranya Gopinath	Director, Government Affairs & Public Policy at Razorpay
	Ms Ramadevi Lanka	Director, Emerging Technologies, ITE&C Department, Govt.of Telangana
	Shri Amlan Mohanty	Independent Technology Lawyer & Policy Advisor
	Shri Rahul Matthan	Partner, Trilegal
	Ms Anjula Gurtoo (Moderator)	Professor – Department of Management Studies Chairperson – Centre for Society and Policy Indian Institute of Science

	Shri Narayan Mishra	CTO & Co-Founder at TUMMOC
	Shri Anucheth, M N	Joint Commissioner of Police, Traffic, Bengaluru City
	Dr Sanjay Kolte	CEO, Pune Smart City Development Corporation Limited
	Shri Rajesh Krishnan	Chief Executive Officer, ITS Planners and Engineers Private Limited
	Shri Munish Moudgil	Special Commissioner (Revenue) BBMP
	Shri Suresh Kumar (moderator)	VP & Head – Platform Deployments & Applications, IUDX

Data Marketplace : A system design perspective

Important links

Policies

Subscribe to Newsletter

Geospatial Data: Infrastructure, Policies and Applications for Public Good

Harnessing the power of data for transforming agriculture

Challenges in creating data policy and governance guidelines in the context of data for public good

How data is driving service delivery efficiency and citizen convenience in the urban setting

Data Marketplace : A system design perspective

Related Posts

What a Waste? Not with IUDX’s Digital Overhaul

FIWARE4Cities Booklet

Enabling Low-Cost Intelligent Traffic Signal System with IUDX

IUDX and IIIT-Hyderabad Join Forces to Revolutionise Data Exchange for Smart Cities

Secure Multi-Party Computation

Important links

Policies

Subscribe to Newsletter