Saturday, December 10, 2011

Are You The 'Knight Rider' of Your Data?

This is re-post of co-authored blog available at Infosys' Manufacturing Talk :  http://goo.gl/wZDP8

Post by
Partha Pratim Dutta, Senior Technology Architect, Manufacturing, Infosys Limited
Ashutosh Agrawal, Senior Consultant, Manufacturing, Infosys Limited


Do you remember the American TV series called "Knight Rider" or the Hindi movie "Tarzaan : The Wonder Car"? KITT(the artificially intelligent Pontiac Trans Am in "Knight Rider") and Tarzaan, both were advanced, artificially intelligent and nearly-indestructible cars.  I always dreamt of having one of these.

What was a dream then, is a reality today. More and more cars today are coming out with hundreds of on-board sensors, capable of measuring everything from tyre pressure to driver's condition. It can alert us in case of emergency and in certain cases, can take the necessary evasive action as well. We call them "Connected Vehicles".  Using the on-board communication network, they are capable of connecting to a central server to dump the data or upgrade themselves with a newer processing algorithm. This data can be mashed up with other contextual and historical data (just like KITT) such as infrastructure data (e.g. road condition), local climate data, historical patterns etc., and can help us in identifying things which our naked eyes can never detect.
BigData_Services.jpg
Insights like these presents a whole new set of opportunities to the value chain providers. For example, some car insurance companies monitor the drivers driving style, routes traveled more often and customize the insurance plans and premium accordingly. Moreover with the "connected vehicle" concept gaining ground, "vehicle to vehicle connectivity" or "vehicle to infrastructure connectivity" scenarios have enabled us to have more safety and discipline around driving. In Japan, for example, at many toll junctions cameras and sensors monitor the traffic pile up at each toll gate and guide the incoming vehicle onto a particular lane where it is less.

Is this an Invasion on Your Privacy?
But unlike Michael Knight (KITT's owner), we as owners of these smart cars, do not own or have control over the data that we generate. All these services uses our own data that we generate and gives us back the relevant portions of it as services at some price. You may be comfortable with these services but how comfortable are you in compromising your privacy for these services and if you are comfortable, how far will you go for that? How comfortable will you be in making your data public, such as details around the last accident that you had? Or how comfortable will you be in letting people know where exactly you are at any point of time?
Every customer views their privacy differently. Currently, there is no control/policy in the auto world which allows a customer to control this data. The customer doesn't know who all have access to this  data or how this data is getting processed. There may be situations when you have no other option but to share your data - such as when your Insurance Company needs it to validate your claims. But in other situations - like your movement getting tracked, you can be selective in letting people know who can see you and till what point of time. And at the same time, it being your data, you should get the desired service and benefit if you decide to share your data. For example, if a customer decides to let the car companies (or its competitor) collect this data to figure out how the newly launched car is performing, the customer may ask for a special discount while buying the car.

Solution: Control your Data
The solution may lay in layering up a Big Data analysis platform with "User Access Control" module. The solution should be able to process a huge amount of data in minimal amount of time, provide pattern identification and give absolute control of those data/findings to the user. And the user decides, what to share (and what not to) and makes sure to get the desired service and benefit when this data is shared. The solution needs few key design dimensions to be addressed:


1.       Data Aggregation from variety of sources - mostly unstructured sources
o   Probable technology choices : EAI and  EII techniques, Search crawlers
2.       Storing and managing the unstructured data set along with structured data
o   Probable technology choices : Column oriented datastore like Apache HBaseApache Cassandra
3.       Usage of a parallel/distributed framework to process the huge data
o   Probable technology choices : Distributed file system like HDFS and framework like Apache Hadoop
4.       Using clustering, classification and collaborative filtering techniques to extract the relevant portions of data
o   Probable technology choices : machine learning libraries like Apache Mahout
5.       Allowing access to the right set of data to the right people for the right amount of time
o   Probable technology choices : Access control framework like User Managed Access or UMA
6.       An elastic infrastructure needed for storing and processing the huge data(E.g. IaaS Platform)

BiGData_Platform.jpg
The concept and some of the technology standards and framework (like UMA) are still at its infancy. But there are instances where applications are using a similar setup to give the control back to the user. Facebook is a good example of it and the "Locker Project" is another initiative to help manage your "Digital Exhaust" or "Digital Footprint".  This kind of platform cannot be owned or controlled by a single company or organization. Rather the need is for a conglomeration where several such organizations can come together and build an ecosystem, something in similar lines like CIBIL. Compared to the maturity of social data and its usage, this scenario may still have some more time to reach a critical stage, but if we can control it before it reaches that stage, adoption of the connected vehicle concept will be much widespread.