Image

BLOG

Image

he current data ecosystem is built around the companies that collect and sell data, and users have no voice nor right to control and demand transparent management. With the PIMS Devel- opment Kit, we simplify experimentation with new user-centric marketplaces and foster a new data economy where users are at the center and have full control over their data. The PDK in- cludes tools for managing consent and personal data, and for creating marketplaces. Using the PDK building blocks, new companies and bodies can quickly enter the PIMS market and imple- ment various use cases to deliver transparent business or social value.

The current data ecosystem is built around the companies that collect and sell data, and users have no voice nor right to control and demand transparent management. With the PIMS Devel- opment Kit, we simplify experimentation with new user-centric marketplaces and foster a new data economy where users are at the center and have full control over their data. The PDK in- cludes tools for managing consent and personal data, and for creating marketplaces. Using the PDK building blocks, new companies and bodies can quickly enter the PIMS market and imple- ment various use cases to deliver transparent business or social value.
We are now working on developing fully- fledged solutions on the top of the PDK. We are developing two pilots to demonstrate the opera- tion of the PDK. Our initiatives aim at dissem- inating our work with end users and enterprises willing to participate in the platform.

The first pilot will be the main demonstrator of the PIMCity project. We will bootstrap the EasyPIMS platform, feeding it with the data already available at the opera- tors participating in the project (Telefonica and FastWeb). Moreover, the platform will also be demonstrated and tested to users in a production- ready environment. The main goal of this main pilot is to demonstrate the potential of EasyP- IMS by testing it with a critical mass of users, bootstrapping that way the personal data platform for its commercial use after the project. The final outcome of this pilot will be a clear business and exploitation plan of the EasyPIMS platform as well as the outreach and engagement strategy to be followed after the project to keep growing the mass of users, using the operators as the catalyst for the success of the project.]

The second pilot is devoted to check the usability and versatility of PDK and EasyPIMs components in the B2B operation of ERMES. To this end, FW and ERMES will explore their customer portfolio to identify about 50 companies interested in products combining preservation of security and privacy with training and increasing of users’ awareness. The main goal of this pilot is to demonstrate the value of some components in the B2B segment. To this end, ERMES will develop solutions for companies that will help them to protect their data while increasing their awareness with augmented information about web services.]
[mt: In conclusion, we believe our PDK offers the first fully-fledged tool set for building user- centric data platforms. Our challenge is now to disseminate our solution and problem perspective, that we are advertising with our demonstration initiatives.]

Image

The PDK modules can be used individually or, better, in cooperation. They can be combined in multiple ways to implement different PIMS archi- tectures or data processing pipelines in general. Here we discuss two possibilities that we consider to be the most common use cases for the PDK.

The PDK modules can be used individually or, better, in cooperation. They can be combined in multiple ways to implement different PIMS archi- tectures or data processing pipelines in general. Here we discuss two possibilities that we consider to be the most common use cases for the PDK.

A Fully-Fledged PIMS
With a combination of our PDK modules, it is possible to start a new data business and build a new PIMS prototype without having to de- velop all the components from scratch. With the PDK, one can easily implement a fully-fledged PIMS that allows (i) users to control, store, and monetize their data, and (ii) stakeholders to buy personal information in a transparent way. In Figure 2, we show how the modules work together. Each user can store her data in the P- DS. This allows them to have structured, well- organized information about the data they provide to the system. With the help of the DPC, they can import/export their data from/to another PIMS company. Through the P-CM, the user can specify what types of data they are willing to share, to what class of data buyers, and in what form (raw, aggregated). The DP module can watermark the datasets before they are sold through the DTE to keep the ownership of the data verifiable in a healthy data economy model. In this way, users have a single interface to manage their data in a clear and consistent framework. When a data buyer is interested in the users’ data, the D-TE handles the request and operations, calculates the data value with the D-VT and collects users’ consent on the P-CM, and offers the user a fair compensation. With this in place, any user can consult the easy-to-understand P-PM to learn about the purpose of the data purchase. This transforms the user from a passive actor in the data exchange ecosystem to a main actor with full control of their data and its use in the open marketplace.

A PIMS for societal benefit
We propose a second use case that can be covered by a different combination of the PDK modules. In this case, illustrated in Figure 3, a PIMS settles in the premises of a company that holds personal data as a consequence of its business. This is the case, for example, with telecommunications companies that have access to mobile customer location data, or online stores that have customer purchase history. It is common for companies to collect these data for technical or marketing reasons. However, there is a lack of standard means and tools to share them with third parties in a controlled and transparent way after cus- tomer consent. The use case outlined in Figure 3 encourages users to share their personal data in exchange for a reward from the company, which can be statically determined (e.g., a discount on the monthly subscription) or dynamically defined using the DVT (not shown in the figure). In this scenario, customers can opt-in using the P-CM, giving the company the right to share their data with third parties. Upon consent, the P-DS stores users’ data using the DA module to perform a bulk transfer from the company’s systems. The P-PPA allows third parties to perform privacy- preserving queries that aggregate data from mul- tiple customers and obtain an anonymized version of a portion of the dataset. The identity and per- sonal information of individual customers are thus protected. Finally, the DKE can enrich the raw data by creating user profiles. Interested stake- holders, such as companies or research bodies, can access the system to collect anonymized data and perform their own analytics. The provisioning can be done in an offline mode (i.e., the data analysis is performed once and published) or in an online mode (i.e., the party interested in the data requests a specific data analysis).

Image

In the PDK, we design and develop basic and generic components that offer fundamental func- tionalities for PIMS. We release them as SDKs with the goal of streamlining PIMS development and integration. We describe them in Figure 1 and discuss them in the following sections.

In the PDK, we design and develop basic and generic components that offer fundamental func- tionalities for PIMS. We release them as SDKs with the goal of streamlining PIMS development and integration. We describe them in Figure 1 and discuss them in the following sections.

Tools to improve users’ Privacy
These PDK modules aim to improve user privacy from various points of view. They are designed to provide users with a simple and intuitive in- terface and enable transparent data management. Users can use Personal Data Safe (P-DS) to securely store their personal data and eventu- ally allow data buyers to access them through the Personal Consent Manager (P-CM). Details about data buyers can be found in the Personal Privacy Metrics (P-PM) dashboard, along with details about the purpose of data buying cam- paigns. Finally, Personal Privacy-Preserving An- alytics (P-PPA) provide data buyers access to ag- gregated and anonymized data by implementing anonymization via well-known approaches such as k-anonymity [10], differential privacy [11], or z-anonimity for streams [12].

Tools for a User-Centric Data Market
Currently, users are not part of the data market. Conversely, they are external actors who merely provide the assets but have no influence or de- cision power. In this scenario, the value of end- users is determined solely by the market, i.e., the price that data buyers are willing to pay for a given end-user’s data. However, in the human- centric data economy envisioned by PDK, this one-sided vision is no longer valid. Conversely, end-users must have control over their data and have the last word on what they are willing to share and with which third parties. Hence, we come into a new scenario with two sides: the market and the users. To this end, we offer the Data Valuation Tools (D-VT), which are able to derive the value of end-user data from the two perspectives mentioned above: Market and End User perspective, i.e. how much the data is worth for the buyer and for the end user, respectively. We also provide a Data Trading Engine (D- TE) that can be integrated as part of the PIMS infrastructure to trade end-user data within the ecosystem. It enables the users to offer their data in a marketplace and data buyers to search for data and make offers that users can accept. Each transaction is recorded using blockchain technology, with eventual payment in the form of (virtual) coins.

Tools for Data Management
Due to the variety of devices and data sources available, it is challenging to import, process and aggregate data in a standardised, scalable and privacy-preserving manner. To this end, we offer the Data Aggregation (DA) tools to bulk insert personal data into a PIMS and the Data Portabil- ity Control (DPC) to allow users to seamlessly migrate to a new PIMS, supporting direct import from Facebook or Google, for example. The Data Provenance tool (DP) allows hard-to-remove watermarks to be inserted into datasets to prove their ownership later. It also supports text. Finally, the Data Knowledge Extraction (DKE) engine builds privacy-preserving models from data by supporting, for example, the creation of user profiles that contain their interests as extracted from their respective users’ browsing history.

Image

PIMS aim to give users back control over their data while creating transparency in the market. However, so far they have failed to gain business maturity and reach a large user base. PIMCity makes the PIMS idea feasible, scalable and flexi- ble. To achieve this ambitious goal, we have care- fully developed a bottom-up methodology that involves all stakeholders at all stages, from design to development to large-scale demonstration and going to market. We strongly believe that an open market for data will only thrive if we stop the arms race between users and services. For this, we have involved advertisers and end-users throughout the process.
As a first tangible result, we offer the PIMS Development Kit (PDK) to commoditize the com- plexity of creating PIMS. This lowers the barriers for companies and SMEs to enter the web data market. The main challenges in designing and developing the PDK can bew summarized as follows.

PIMS aim to give users back control over their data while creating transparency in the market. However, so far they have failed to gain business maturity and reach a large user base. PIMCity makes the PIMS idea feasible, scalable and flexi- ble. To achieve this ambitious goal, we have care- fully developed a bottom-up methodology that involves all stakeholders at all stages, from design to development to large-scale demonstration and going to market. We strongly believe that an open market for data will only thrive if we stop the arms race between users and services. For this, we have involved advertisers and end-users throughout the process.
As a first tangible result, we offer the PIMS Development Kit (PDK) to commoditize the com- plexity of creating PIMS. This lowers the barriers for companies and SMEs to enter the web data market. The main challenges in designing and developing the PDK can bew summarized as follows.

User-centric model.
The implementation a of a user-centric data ecosystem is the biggest chal- lenge of the PDK. PIMS users and data sellers do not know what a reasonable price for their data is. Sellers are usually in charge of setting a price for the data they share. A user-centric data economy requires that individuals are compen- sated by companies for their data in proportion to the benefits that such data produce for the overall economy. To this, end, the PDK offers a data valuation framework backed to the state-of-the- art research in the field CITE something.]

PIMCity is aligned from its inception with the approach promoted by the MyData movement [8] which seeks to change the paradigm of personal data management and processing, moving from a model focused on companies that collect data (with little transparency and very little control) to a transparent system centered on people. Defining a new human-centric data economy delivers high- quality data for businesses while respecting the privacy of users [8].

Interoperability.
PIMCity architecture allows users to integrate new data sources and connect them to new ser- vices. This is a fundamental property to build trust in any PIMS. Interoperability is the biggest advantage offered by the PDK and at the same time the biggest challenge, because it requires a process of standardization of consent mech- anisms, formats and semantics. All PDK com- ponents provide web APIs, which we document using the Open APIs specifications to enable seamless integration. This enables communica- tion and interaction between them and facilitates integration with existing PIMS as well as the design and development of new ones.

Open-Source Software.
We strongly believe in open-source soft- ware as a means to achieve transparency and users’ trust. Despite maintaining a (large) open source project is great challenge in terms of code maintenance and support, it would allow us to collect feedback, bugs and feature request and, ultimately, measure the success of the PDK.] The PDK is open source and available online on the GitLab Project of PIMCity [9]. We encourage its use and invite the community to test and support the project. We use the GitLab collaboration features as a forum for issue tracking, discussing bugs, requesting new features, and providing user support.

Image

The first countermeasures against the collection of user data were solutions to block online ad- vertisements and trackers, usually implemented via browser plugins. AdBlock Plus and Ghostery are notable examples that have become popular in recent years and count millions of users. They block ads and offer the ability to limit common tracking mechanisms and many privacy-invasive practices, such as browser fingerprinting. In re- sponse, services have attempted to circumvent blocking with a variety of more sophisticated tracking techniques. This has led to a continu- ing arms race that is detrimental to the positive potential of data-driven decision making and the Internet economy in general.

The first countermeasures against the collection of user data were solutions to block online ad- vertisements and trackers, usually implemented via browser plugins. AdBlock Plus and Ghostery are notable examples that have become popular in recent years and count millions of users. They block ads and offer the ability to limit common tracking mechanisms and many privacy-invasive practices, such as browser fingerprinting. In re- sponse, services have attempted to circumvent blocking with a variety of more sophisticated tracking techniques. This has led to a continu- ing arms race that is detrimental to the positive potential of data-driven decision making and the Internet economy in general.

Recently, several technological solutions and business models have emerged to balance the above tensions, based on proposals and opinions maturing in the European policy scene and its instruments, such as the European Data Protec- tion Supervisor (EDPS). Similarly, the concept of European Data Spaces was recently introduced by the European Commission to allow citizens to share their data, although its business model is still in its early days, but potentially relevant for citizen data valuation and reward.

In this picture, Personal Information Man- agement Systems (PIMS), also called personal data banks or personal data vaults, appear to be a promising alternative to the uncontrolled collection, processing and use of people’s data, including personal and sensitive information. At a high level, a PIMS can be thought of as a software interoperability layer between end users and data services, responsible for ensuring that data is passed from the former to the latter in a controlled manner.
PIMS look to empower individuals to take control of their personal data. For that purpose, they include capabilities such as: let user collect their personal information from internet service providers; exercise their erasure and modification rights, as granted by data protection laws (GDPR, CPA); help users manage cookie settings and pri- vacy permissions in their devices; provide a fine- grained consent management for sharing personal data to services; allow to revoke such permis- sions; monetize their data by allowing users to negotiate their consent and receive payments for sharing their personal data. Currently, PIMS from academia and industry are attempting to rewrite the rules of the information economy on the In- ternet with various business models, technological solutions and marketing strategies.

Among the wide ecosystem of data platforms, we identified 19 systems that deal with personal data, and hence can be classified as PIMS, in a recent survey of entities trading data in the Internet. We summarize them in Table XX. [mt: Add discussion from D3.1. Discuss: What kind of data? Target? Data Buyer Fee? How they price data? Have test before you buy?]

Despite the impressive number of such at- tempts, none of them has yet reached business or technological maturity nor managed to attract a sizable user base.
Our goal is to bridge this gap by offering a set of open-source building blocks to unlock the potential of data-driven decision making. As part of the EU-funded PIMCity project, we have de- signed, developed and validated a set of reusable, flexible, open and user-friendly components in the form of a PIMS Development Kit (hereafter PDK, effectively an SDK for PIMS). Being aware of the complex and non-standard definition of PIMS, our goal is not to develop a monolithic solution that cannot withstand the ever-changing requirements of business and regulations, but to provide a modular approach that can be flexi- bly improved and refined as needed. In short, the PDK provides the ability to rapidly develop new PIMS solutions and easily experiment with possible alternatives. We make it available to the community as open-source software, which can be found at https://easypims.pimcity-h2020.eu.

Image

Hasta ahora los datos han estado en manos de prestadores de servicios (operadores de energía, telecomunicaciones, transporte, administraciones públicas) que no los compartían con terceros o en manos de los grandes plataformas tecnológicos (buscadores, redes sociales, sistemas operativos, …) que se han constituido en monopolios de facto que los utilizan como base para el desarrollo de servicios personalizados que venden a terceros. La cuestión de fondo es si hay alternativas a estos dos modelos de explotación.

La irrupción del COVID-19 ha provocado una aceleración de los procesos de digitalización que nos obliga a repensar nuestro futuro. Cuestiones como la Educación, Salud, Vivienda y Urbanismo, Medio ambiente, Relaciones laborales, Modelo productivo, Privacidad, Cultura o la Propiedad Intelectual se están revisando en base a las posibilidades que surgen del uso de algoritmos, en esto que hemos dado en llamar Inteligencia Artificial.
Todos sabemos que el éxito de los algoritmos radica en el uso masivo de datos personales que pueden permitir predecir la evolución de una pandemia, acelerar el proceso de desarrollo de una vacuna, reducir la huella de carbono evitando desplazamientos necesarios, evitar la despoblación y facilitar el acceso a la educación y al mundo laboral de colectivos hasta ahora desprotegidos. La necesidad se hace aún más evidente en este momento, en que la pandemia actual y el cambio climático exigen la adopción de nuevos mecanismos, instrumentos e infraestructuras para armonizar y compartir información coherente y útil que permita a las instituciones dar respuestas coordinadas y globales.
Hasta ahora los datos han estado en manos de prestadores de servicios (operadores de energía, telecomunicaciones, transporte, administraciones públicas) que no los compartían con terceros o en manos de los grandes plataformas tecnológicos (buscadores, redes sociales, sistemas operativos, …) que se han constituido en monopolios de facto que los utilizan como base para el desarrollo de servicios personalizados que venden a terceros. La cuestión de fondo es si hay alternativas a estos dos modelos de explotación de datos el primero ineficiente y el segundo que genera beneficios a un número muy reducido de agentes y la respuesta es afirmativa y consiste básicamente en hacer que todos los datos estén a disposición de quién quiera usarlos siempre que se cree un ecosistema de confianza en beneficio de los ciudadanos y empresas que son a la postre quienes generan estos datos.

Hacia la soberanía digital Europea
La Comisión Europea lleva unos años rediseñando radicalmente la forma en que se comparten y gestionan los datos públicos y privados en suelo europeo, proponiendo una serie de importantes iniciativas para garantizar la transparencia y la confianza hacia los ciudadanos y las empresas, asegurando la neutralidad de los grandes monopolios tecnológicos privados y creando un espacio europeo común de datos con un enorme potencial social y económico. Estamos es una oportunidad única y necesaria para equiparnos y tratar de fortalecer la soberanía digital europea, que ahora está en manos de los gigantes tecnológicos radicados básicamente en USA y en China.
En mayo de 2018 se dio el primer gran paso con el Reglamento General de Protección de Datos (RGPD o GDPR), el cual introdujo normas únicas y estrictas en toda la UE para garantizar un mayor control de los datos personales y la igualdad de condiciones para los ciudadanos y las empresas reglamento que ha sido la base utilizada para el desarrollo de otras normativas en el resto del mundo. Ahora la Comisión propone una estrategia ambiciosa para mejorar la gobernanza y la explotación de los datos tanto públicos como privados.

Nuevas reglas europeas para la Gobernanza de los Datos
La Comisión acaba de publicar su propuesta para la gobernanza de los datos, conocida como DGA (Data Governance Act) un nuevo e importante hito para impulsar una economía basada en los datos en Europa, siendo este primer producto resultante de la estrategia de la UE para crear un espacio Europeo de datos, adoptada en febrero de 2020 (justo antes de que llegase la pandemia).
El objetivo crear las condiciones adecuadas para que, si la gente quiere, pueda compartir datos y pueda hacerlo de forma fiable. En los años venideros, la cantidad de datos crecerá exponencialmente. Los datos pueden permitir nuevos servicios y productos, hacer más eficiente la producción y contribuir a mejorar los servicios en muchas áreas diferentes. Pero hoy en día sólo muy pocos de los datos disponibles se utilizan de forma productiva. Teniendo en cuenta todo este potencial, es obvio que los datos deben ser accesibles y que el intercambio de dichos datos debe ser seguro y respetar nuestros valores fundamentales.

Intercambio y reutilización de datos del sector público
La administración pública genera cada día una enorme cantidad de información, gran parte de la cual está constituida por datos públicos, que en el pasado estaban destinados a quedar relegados a los procedimientos administrativos o gubernamentales para los que fueron formados. La Directiva de la UE de 2019 sobre la Datos Abiertos (Open Data) y la reutilización de la información del sector público, que cada Estado miembro deberá introducir antes del verano del año próximo, tiene por objeto promover la reutilización de los activos de información pública fomentando la creación de productos y servicios innovadores basados en datos abiertos.
En particular, se indican los tipos de datos precisos que la administración pública deberá hacer fácilmente accesibles y gratuitos. En la norma se identifican claramente algunas categorías denominadas de "alto valor socioeconómico" a las que se dará prioridad: algunos ejemplos son los datos del registro mercantil, los datos de movilidad y los datos de observación del medio ambiente.
Por consiguiente, la Directiva es uno de los pilares fundamentales en los que se basará la futura gobernanza europea de los datos, un patrimonio de información pública que hoy en día suele permanecer inaccesible y bajo llave, y que en cambio se pondrá en condiciones de alimentar un virtuoso ecosistema compartido en beneficio de todos.


Creación de "intermediarios" fiables y neutrales
Otro objetivo claro e importante del reglamento propuesto es socavar el monopolio de las grandes plataformas tecnológicas que controlan grandes cantidades de datos que generan los ciudadanos de la UE. La Comisión propone un modelo alternativo para preservar la transparencia y la neutralidad de los datos mediante nuevos instrumentos de intercambio y mecanismos de intermediación, neutrales, respetuosos de los valores y derechos fundamentales europeos, a fin de reforzar y renovar la confianza de los ciudadanos y las empresas.
El nuevo reglamento prevé la creación de un nuevo agente, los "intermediarios", organizaciones sin fines de lucro inscritas en un registro especial a nivel europeo, que deberán notificar a la autoridad pública competente su intención de prestar esos servicios y garantizar una clara separación entre los servicios de intercambio de datos y cualquier otro servicio a fin de evitar conflictos de intereses. Las actividades de estas nuevas entidades y el cumplimiento de los requisitos serán garantizados por las autoridades públicas tanto a nivel nacional como comunitario.

Creación de un único espacio europeo de datos
Otro pilar importante es la definición de "espacio único de datos europeos": la propuesta, ya anticipada en febrero en la Estrategia de Datos, prevé la creación de espacios de datos en sectores económicos estratégicos y áreas de interés público, de los que se puedan extraer datos de diferentes sectores de manera más accesible para crear nuevos productos y servicios basados en datos de las empresas, los ciudadanos, el mundo académico y la administración pública.
Los espacios comunes de datos europeos, como fueron primero los canales y luego la red de ferrocarriles y autopistas, deberían trazar la red del mercado único europeo de datos, convirtiéndose en las nuevas infraestructuras digitales europeas para la circulación de servicios y productos basados en datos. Estos espacios, por ejemplo, deberían contribuir a una mejor gestión del consumo de energía y acompañarnos hacia una transición ecológica consciente.
Algunos de los datos europeos propuestos para este espacio único europeo son estos: datos industriales, de movilidad, de salud, financieros, de energía, sobre la agricultura , datos para la administración pública entre otros.
Con una regulación común en toda la Unión Europea se garantizará que las empresas puedan beneficiarse de la escala del mercado único y que las empresas y los organismos de investigación puedan acceder a los datos de los diferentes Estados miembros en condiciones similares.

Fomentar nuevas formas de "altruismo de datos"
La propuesta europea tiene por objeto facilitar y fomentar el suministro voluntario por parte de los ciudadanos, las empresas y las organizaciones de datos de datos para el bien común (Data for Good) y sin ánimo de lucro. Por ejemplo, las personas que padecen enfermedades raras pueden compartir voluntariamente los resultados de sus pruebas médicas para utilizarlos en la mejora de los tratamientos de esas enfermedades. Los nuevos espacios de datos personales asegurarán que las personas puedan mantener el control sobre sus propios datos. Los espacios de datos personales también garantizarán que sólo se utilicen para los fines acordados, en este caso para la investigación médica.
Se creará un registro europeo específico y un conjunto de herramientas para reducir los costos y optimizar la portabilidad y la reutilización de los datos. Esto podría aumentar la conciencia de los ciudadanos, estimular las prácticas de ciudadanía activa y el activismo, dar más voz y nuevos instrumentos a los grupos insuficientemente representados y a los nuevos interesados sociales y públicos.

Desarrollo de herramientas, infraestructuras y estándares comunes
La puesta a disposición de las empresas y de los emprendedores de herramientas, infraestructuras y estándares para implementar nuevos modelos de negocio basados en los datos puede acelerar sin duda el proceso.
La estrategia Europea pasa por el apoyo de iniciativas, proyectos y procesos que aceleren su desarrollo y buscar la sinergias con otros movimientos que comparten la misma filosofía como son MyData, PIMS o SOLID por citar alguno de ellos.

El futuro está en nuestras manos
A modo de conclusión me gustaría remachar en esta idea de que, en materia de datos, el futuro lo podemos decidir los ciudadanos. Somos nosotros quienes generamos nuestros datos y por tanto tenemos en nuestra mano la posibilidad y la responsabilidad de contribuir a que este futuro se diseñe con criterios de equidad y respeto a los derechos fundamentales.

Image

It is the means to extract knowledge from the raw data. One of the biggest challenges in big data and machine learning is the creation of value out of the raw data. When dealing with personal data, this must be coupled with privacy preserving approaches, so that only the necessary data is disclosed, and the data owner keeps the control on it.

It is the means to extract knowledge from the raw data. One of the biggest challenges in big data and machine learning is the creation of value out of the raw data. When dealing with personal data, this must be coupled with privacy preserving approaches, so that only the necessary data is disclosed, and the data owner keeps the control on it.

The DKE consists of machine learning approaches to aggregate data, abstract models to predict future data (e.g., predict user’s interest in recommendation systems), fuse data coming from different source to derive generic suggestions (e.g., to support decision by users, providing suggestions based on decisions taken by users with similar interest).

Image

The objective of the Data Provenance Tools from an End-User-Perspective is to provide stronger data ownership guarantees to data providers as sharing their datasets with the PIMCity platform will also discourage the illegal copying or reselling of datasets.

The objective of the Data Provenance Tools from an End-User-Perspective is to provide stronger data ownership guarantees to data providers as sharing their datasets with the PIMCity platform will also discourage the illegal copying or reselling of datasets. That is because our module provides a watermarking algorithm that allows a data buyer or data owner to verify data ownership offline or a third-party verifier online on behalf of the data owner by reading a data owner secret information online previous agreement with such owner.

The Data Provenance tools is an OpenAPI formatted framework for interoperable transactions with other components in the PIMCity platform. Responsibilities for the trading of the datasets fall outside the scope oof the Data Provenance tool but in future releases we plan to provide metadata information that is valuable to data traders in reassuring operational exchange of such datasets with data buyers and the like.

In particular, we provide the following capabilities to the Trading Engine of PIMCity,

1. Insert a watermark in a dataset to assure to data providers their data ownership conforms to their secret information even if not stored in the PIMCity platform (offline). Data buyers could be additionally provided with hint that a particular dataset is legally sourced from the PIMCity platform by belonging to a specific data provider without data buyer having to know the data provider.
2. Verify a watermark of a dataset by receiving a data provider’s secret information to reassure a data provider that a piece of data found in the wild outside PIMCity belongs to them with a given secret input information only the data provider (offline) and/or PIMCity (online) knows about.
In the first case, data providers or PIMCity platform provide or generate the secret information from which to derive a secure watermarked dataset.
In the second, the secret information to verify a dataset can be strictly held by the data provider that owns the data but it has to be at the very least read by the Data Provenance component in order to return True or False as result of the verification process for a watermarked dataset.

Benefits
Data Provenance tools in WP4 provide greater reassurance to data providers about sharing their private data while discouraging abusive reuse, reselling or simply copying their datasets in the wild without permission. We have implemented a first watermarking algorithm for Strings (browsing history urls) that follows a similar approach to state-of-the-art in VLDB ’02 [1].
Regarding permissions, together with the Data Trading Engine we plan book keeping of the transactions generated in the data trading platform by recording metadata such as source, destination and number of times each dataset has been shared. This will allow us to:

1. Identify if a dataset located in the ‘wild’ belongs to a given data owner.
2. The protocol is aimed at providing traceability or fingerprinting of the data buyer of a dataset in the ‘wild’ too, without disclosing to the public Internet the identity of such data buyer but just serve metadata to the PIMCity platform to process it in a secure manner.
Moreover, the Data Trading Engine provides through a data marketplace in PIMCity, information to buyers and sellers about every dataset and possibly personal transactions of the user. This is in order to offer transparency and trust to data transactions in PIMCity.

Image

The Data Portability Control (DPC) tool implements the right of data portability, a novelty of the EU’s General Data Protection Regulation (GDPR), that allows individuals to obtain and reuse personal data from one environment to another in a privacy-preserving fashion. More specifically, it incorporates the necessary tools to import data from multiple platforms (through the available Data Sources), process the data to remove sensitive information (through the Data Transformation Engine), and outport into other platforms (through the Data Export module).

The Data Portability Control (DPC) tool implements the right of data portability, a novelty of the EU’s General Data Protection Regulation (GDPR), that allows individuals to obtain and reuse personal data from one environment to another in a privacy-preserving fashion. More specifically, it incorporates the necessary tools to import data from multiple platforms (through the available Data Sources), process the data to remove sensitive information (through the Data Transformation Engine), and outport into other platforms (through the Data Export module). Since the tool does not have a dedicated UI for interacting with the users, it provides an interface in a form of a generic Control API for controlling all operations from other systems.

Benefits

By using the DPC tool, an organisation can satisfy the right of data portability to its users. At this stage the tool provides support for the following features:
-  Data aggregation from Banking data through an Open Banking API (TrueLayer).
-  A generic data anonymization that hides specific data categories (or columns) from selected DPC-related imported data that are considered sensitive.
-  Data export in a common data interchange format (e.g., JavaScript Object Notation (JSON)).

Image

The Data Aggregation module allows users to anonymize any kind of dataset, using K-anonymity as the main algorithm, deleting sensitive data and aggregating them, reducing their weight without important data loss. The anonymized data are stored into the component’s database and are accessible at any moment.

The Data Aggregation module allows users to anonymize any kind of dataset, using K-anonymity as the main algorithm, deleting sensitive data and aggregating them, reducing their weight without important data loss. The anonymized data are stored into the component’s database and are accessible at any moment. It is also possible to query for all the anonymized datasets that are available from a specific user.

Benefits
Data anonymization and aggregation.
Can be used for evaluation of applications, sale or share them with other partners in a privacy preserving manner.
Being able to use others’ anonymized data for comparison and correlation with my data without breaching privacy.

Example
A TELCO can use this component in order to anonymize location data or call records in order to share them with partners, collaborators or third parties under specific conditions/agreements.
Marketing purposes, explore possible new partnerships, provide aggregated data to end users in order to compare them with their own profile etc.

0 | 10 | 20