Raising Awareness of Data Sharing Consent Through Knowledge Graph Visualisation

. Knowledge graphs facilitate systematic large-scale data anal- ysis by providing both human and machine-readable structures, which can be shared across diﬀerent domains and platforms. Nowadays, knowl- edge graphs can be used to standardise the collection and sharing of user information in many diﬀerent sectors such as transport, insurance, smart cities and internet of things. Regulations such as the GDPR make sure that users are not taken advantage of when they share data. From a legal standpoint it is necessary to have the user’s consent to collect information. This consent is only valid if the user is aware about the in- formation collected at all times. To increase this awareness, we present a knowledge graph visualisation approach, which informs users about the activities linked to their data sharing agreements, especially after they have already given their consent. To visualise the graph, we introduce a user-centred application which showcases sensor data collection and distribution to diﬀerent data processors. Finally, we present the results of a user study conducted to ﬁnd out whether this visualisation leads to more legal awareness and trust. We show that with our visualisation tool data sharing consent rates increase from 48% to 81.5%.


Introduction
Collecting information about how users consume a service is crucial for the service usage understanding and improvement of its quality. Taking the automotive field as example, the more drivers agree to send usage data from their cars, the more value the data analysis will generate and better services will be offered. However, the success of campaigns for collecting user data is highly dependent on requesting and receiving informed consent for sensor data sharing. It is debatable whether current consent gathering methods really make clear for users what happens in the background once consent is given. According to [1], users tend to agree to most consent requests they are confronted with. Reading through all the agreement specifications is time-consuming. Often such agreements are written in a complex language typical of legal documents. In most cases users try to get through the consent procedure as fast as possible and are not aware of the implications. For this reason most users who give their consent to data sharing agreements do so without understanding many details of the contract. Bechmann [2] defines this as a "culture of blind consent", implying that having one's consent is not equivalent to having awareness.
This paper presents an approach to visualise consent decisions (i.e. what happens to an individual's data after consent is given) in the domain of vehicle sensor data sharing. The work is part of the CampaNeo project 2 , which focuses on vehicle sensor data sharing based on individuals' informed consent as defined by the General Data Protection Regulation (GDPR) 3 . Under GDPR, data processing can only start once an agreement, which GDPR defines as consent, has been set in place. Further, consent has to be freely given, specific, informed and unambiguous (Rec. 32). In order to have informed consent, individuals need to be presented with information about what data is requested, for what purposes it will be used, how it will be processed and by whom (Rec. 32). The GDPR has had a major impact on the way companies deal with personal data, since it has come into effect in May 2018. For many companies adhering to the regulation is not only important to prevent costly legal affairs but also to retain their reputation.
With GDPR compliance in mind, the CampaNeo project aims to create a system that collects and distributes sensor data generated by vehicles with the help of semantic technology, namely knowledge graphs, which can provide a transparent, traceable and centralised record of user data and contractual meta-data (e.g. consent status). On the technical side, knowledge graphs are a state-of-theart solution for building versatile, explainable and machine-readable data modelling solutions [3][4] [5]. Through the use of semantic technologies one can ensure a standardised way of collecting and storing sensor data independently of proprietary formats of different vehicle manufacturers. Knowledge graphs provide the needed level of interpretability due to their ability to represent data in both human and machine-readable formats thus being a suitable solution for representing data sharing between multiple entities.
We introduce a new visualisation which portrays certain aspects of the knowledge graph, mainly how data is processed after consent is given. The visualisation aims to raise individuals' awareness of what it means to consent and the implications that follow. With the help of the visualisation users can inspect data sharing activities at any time after consent was given. We believe that visualisations help end users understand how their data is being shared, with less effort than by reading agreement specifications in textual format, which is the prevalent status quo [6]. Our research focuses on increasing the awareness of data sharing processes through visualisation and has set to answer the following two questions: 1. Are users more willing to share their data if they are fully informed on what exactly they are sharing, when they are sharing it and with whom exactly they are sharing it? 2. Do data visualisations improve comprehension of consent?
The paper is structured as follows. Section 2 presents related work. Section 3 describes the followed methodology. The architecture and implementation details can be found in Section 4. Section 5 presents the evaluation methodology and the discussion of the results. Finally, conclusions and future work are addressed in Section 6.

Related Work
The main aims of imposing GDPR when dealing with the data of European citizens is to ensure transparency of data processing and to make individuals aware of their rights and the implications of giving consent. Individuals should be aware of both the upsides, such as cost benefits or optimized applications and the downsides, such as possible discrimination, spam or even identity theft.
Several studies [2][7] [8] have confirmed that, having consent and making individuals aware of the actual meaning of the consent are two different tasks.
Bechmann [2] compares existing regulations and consent practises that are adopted by social media platforms such as Facebook and notes that the need for user convenience has turned the culture of consent into "a blind, non-informed consent culture". The qualitative study done among Danish students showed that none of the participants had read the privacy policies before directly providing consent for their data to be used and shared. The same issue is confirmed by Joergensen [7], who conducts a survey with 58 high school students aimed at understanding users' perception of data privacy and consent with regards to social media participation. The findings showed that the participants are unaware of their privacy rights and what happens to their data.
In their survey on the economics of privacy, Acquisti et al. [8] show that new technologies are needed to support users in making complex privacy choices, such as giving consent to data processing. Naeini et al. [9] conduct a large study with more than 1000 participants on privacy expectations and preferences in the internet of things domain. They conclude that participants want to be informed about various details of data collection, such as what the data is used for and how long it will be stored.
Information arising in different phases of a consent agreement can be described in a "consent life-cycle" consisting of the four stages request, comprehension, decision and use [10]. Kurteva et al. [10] argue that it is useful to have a semantic model (i.e. a machine-readable ontology) of the whole consent life-cycle. Further, the semantic model can be visualised at every stage to make the process more transparent to users. Especially the visualisation of the "use" stage of consent is something current consent visualisations are mostly lacking. Meanwhile, as pointed out by the surveys, users should be able to get information on their consent contract continually and not just at the time of giving consent. Only then, users can become and stay aware of data processing and privacy implications.
A possible way to help users achieve more awareness over their data sharing activities is through different types of data visualisations. Visual elements such as images, graphs, icons and schemas have proven to be easier to comprehend than text [11][12] [13] [14]. In recent years there have been several attempts to design applications that implement a transparent visualisation of data sharing mechanisms [15][16] [17]. Raschke et al. [15] built a dashboard to visualise data sharing activities and give consent approval and withdrawal functionality. The dashboard is a single page application with a vertical timeline listing the different types of actions. The authors evaluated the tool with a set of tasks for participants to complete. The evaluation showed that the tool was still not making users completely aware as even expert users found it hard to answer questions about their data privacy based on the information available from the dashboard.
The issue of consent awareness is also addressed by Drozd and Kirane in CoRe [16] and CURE [17] user interfaces (UIs), which present users a graphical visualisation of a consent request. The CoRe UI displays a graph that shows what data is sent out, where it is stored, the type of processing that is done on it and which third party companies it is to be shared with. The evaluation of both UIs showed that individuals found the graphical visualisation of a consent request and the provided personalisation of consent useful and helpful. The research of Drozd and Kirrane [16] [17] showed that visualisations such as graphs can be used for easing individuals' comprehension of what it means to consent and the implications that follow. Angulo et al. [18] propose a data sharing transparency visualisation, which shows a user-centred network graph connecting shared data artifacts with websites and third-party companies. Similarly to the work in [16] and [17], Angulo et al. [18] showed that individuals find visualisations of a data sharing processes helpful when trying to comprehend what is happening to their data. However, the research also showed that how information is organised on the screen can affect interactions (i.e. slower user interactions when data is not well organised) [18].
Many tools which try to bring GDPR awareness to data owners still have problems with user acceptance. Complex designs such as [16] and [17] can cause issues such as information overload [19], which can occur when one is presented with information written in formal legal language [20]. The current solutions for requesting consent in compliance with GDPR are somewhat effective in achieving their task of requesting informed consent. CoRe and CURE UIs present detailed consent requests but do not show what happens after consent is given. On the other hand, the visualisation of Angulo et al. [18] focuses on the actual process of data sharing but not in the context of consent.
Knowledge graph visualisation tools such as WebOWL [21], Ontosphere 4 , OntoGraph 5 , Isaviz 6 provide information visualisation with different graphs but are focused on users with expert knowledge in the field of the Semantic Web. Surveys such as [22] provide a detailed overview of existing graph designs and tools, which could be reused and adapted for data sharing consent.
In Bikakis et al. [23], the authors use a a hierarchical type of visualisation as it could help prevent overloading the user with information that is not essential. The authors claim that this visualisation could further be used with large data sets and still perform well. The proposed model takes into consideration the needs of nonexpert users and provides them with interactivity and flexibility. A hierarchical layout of graph visualisation is therefore also fitting for the purpose of visualising consent.
Considering the findings from [ and existing knowledge visualisation solutions such as the ones mentioned in this section, we propose an alternative visualisation approach, which focuses on raising legal awareness of the implications of giving consent by showing the data processing that takes place after consent is given.

Methodology
The aim of this work is to facilitate the handling of consent data by using data visualisation techniques in order to raise individuals' comprehension of consent. To achieve this, we (i) determined the different user requirements of the application in the vehicle sensor data domain, (ii) reviewed existing linked data visualisation solutions that focus on consent, (iii) produced wireframes and (iv) implemented the solution. The use case that we followed are vehicle owners who want to check details of their data sharing activities after having consented to it.
We designed the first prototype of the visualisation using wire-framing techniques [28]. The UI design follows design principles such as the 'Gestalt' laws of grouping [29] and linked data visualisation techniques (e.g. using different visualisation elements such as icons, charts and graphs) as presented in [26]. Having in mind the findings in [16][17] [18] regarding the usefulness of graphs for consent visualisation, we selected graphs as our main visualisation element. We reviewed different graph layouts as presented in [22] and the ones available in the D3.js Gallery 7 itself, which could be suitable for our work. A star layout was selected due to its simplicity and intuitive representation of data flows. Additionally, we used particle flow animations to add a sense of direction to parts of the visualisation (e.g. visualise data flow between the users and their campaigns).
Once the wireframing stage was complete the implementation of the solution began. The D3.js 8 library was used for implementing the graph visualisation of data and the data flows. The data itself was queried from the CampaNeo knowledge graph 9 , which is stored in the GraphDB 10 graph database. Details about the design decisions and the implementation are presented in Section 4.

Implementation
The goal of the implementation is to visualise the flow of data from a user's car to third-party companies on small to medium displays (e.g. tablet, smartphone or the car's built-in infotainment system). The user can get an overview on what data is shared with organisations like governmental agencies, universities or data processing companies, who collect high amounts of data with the aim of solving problems related to mobility and transport. The visualisation focuses on highlighting the data streams, so the user can get information about the type of data that is shared, at what intervals it is sent out and who the receiving party is. In CampaNeo, data is requested via specific campaigns, which must be approved by the user. Following the GDPR, campaigns must state exactly what the purpose of their data collection is and what type of processing they plan to do on it. For example, a campaign dedicated to enhance traffic flow around a city can collect GPS location and speed data from a large number of cars. This data can then help optimise traffic guidance and speed constraints on the roads, which leads to less congested roads and time savings for drivers. We have built a first prototype as a web application, which allows users to access their data sharing (based on the given informed consent) from any device. The source code is publicly available at https://github.com/STIInnsbruck/CampaNeoViz. Figure 1 shows an overview of the main components of the web application. The user interacts with the User Interface at the front-end of the application (step 1). The interface captures the user's intention and initiates the process for retrieving the required data. This is achieved by sending a SPARQL query (Listing 1) to the back-end (steps 2 and 3). As depicted in the figure, the main component of the back-end is the graph database, which is used to store the consent data as a knowledge graph. For this specific implementation we have used an instance of GraphDB, which offers multiple APIs for querying the database, including RDF4J and SPARQL. Since we rely on SPARQL queries, the front-end is totally decoupled from the back-end, and we could replace the data store later if necessary.
Once the result of the SPARQL query is retrieved (step 4), it is passed to the Visualisation Module (step 5). This module is implemented using D3.js, a JavaScript library for manipulating data driven documents. The Visualisation Module prepares the data for the User Interface depending on the specific screen selected by the user (step 6).
In the following, we describe the User Interface in further detail. We then show how the consent data has been modelled in the knowledge graph and how this is consumed by the web application.

User Interface
The goal of the CampaNeo project is to facilitate the collection of campaign-based data with a focus on the data ownership of vehicle owners and the traceability of data processing. Figure 2 shows the entry point of the web application. This screen represents a dashboard that helps visualising all campaigns related to the user once consent is granted.
Campaigns are represented by a rounded rectangle with the corresponding campaign name. The round particles next to each campaign represent data that the user is sharing with the organisation behind the campaign. They are colourcoded to give additional information on the type of data that is sent, e.g. fuel consumption, speed or the GPS location of the car. To make this even more clear, shared data packets are visualised as moving particles between the nodes. This feature gives the visualisation a sense of directional flow and helps users to see at one glance what kind of data they are sharing and at what rate. The meaning of the different colours is encoded in a legend on the left side of the screen.
Further details about a specific campaign can be accessed after clicking on any campaign title (labeled as 2 in Figure 2). Figure 3 shows a screenshot of the campaign details interface. Here we rely on a time series to display the data sharing events according to their timestamp. Additional information about the sensor that retrieved the data and the final organisations is shown.

Knowledge Graph Data Model
We modelled the data shown in the visualisation using a knowledge graph (built with Protégé 11 and stored in GraphDB), which represents informed consent as defined by the GDPR (Art 4.). The knowledge graph represents campaigns and their consent status (given, not given, withdrawn, expired) and purpose, specific vehicle sensor data e.g. GPS location, data provider, data processor, data controller and third-party organisations involved in the data processing.
The knowledge graph reuses the following ontologies: GConsent 12 , namely the classes Consent and Status, Semantic Sensor Network Ontology (SSN) 13 for defining sensor data types and the Financial Business Ontology (FIBO) 14 for representing contracts and agreements. Currently the knowledge graph consists of 425 axioms, 34 classes, 50 object properties and 23 data properties.
The knowledge graph models the data sharing campaigns and the consent relations to users. Associated with a campaign are also the data packets that were retrieved during the duration of consent. Data packets consist of the retrieval time, content and the third-party processors who requested it. Further, they are categorised into well-defined data types such as GPS coordinates, speed, or fuel consumption.
Listing 1 shows the SPARQL query that is used for retrieving the visualisation data for the example user "user1". In the visualisation each of the data packets retrieved in the query will be represented by a small particle moving from the user to the data processor. Additional information about the data packets like the retrieval time and the data processor can be inspected upon interaction with the campaign node or the data stream.

Evaluation and Results
To assess the usefulness of our data sharing visualisation tool, we conducted a user test. The following section explains the methodology that we applied, to find out whether we can increase data sharing consent awareness of the test users. After that, we present the results of the user test. The questionnaire and results are available online 15 .

Evaluation Methodology
For the user tests we provided the web application described in the previous section. Figure 4 gives an overview of the evaluation procedure. The tests could be performed on desktop computers, laptops or other mobile devices. The test subject needed a thorough introduction into the scenario since the use case of the application is very specific. For example, the participants had to understand that this visualisation only shows campaigns which they had already given their consent to.
After the introduction and demographic survey, we asked questions related to data privacy comprehension of the participants, before presenting the visualisation. This allowed us to assess the prior knowledge of participants and to measure the hypothetical improvement of comprehension and trust through the visualisation, by comparing to answers of the same questions after the test.
After the first questions, we revealed the visualisation to the test subjects. They were presented with a series of simple to slightly complex tasks, beginning with interface interaction and basic comprehension. Later we asked them questions which led them to aggregate different information from the visualisation. During the evaluation the test subjects were observed in real time through screen 15 https://github.com/STIInnsbruck/CampaNeoViz/tree/main/evaluation sharing. Using the "think aloud" method, they informed the tester regularly about their thoughts, i.e. what they wanted to achieve next and where they expected to find a specific information in the application.
After the task solving period, the subjects were asked to fill out a questionnaire, in which they rated their experience and explained problems they had or gave suggestions for improvements. We tested the potential improvement in com- prehension and awareness of GDPR rights by asking the users if they felt more confident in their knowledge of data sharing and observed the difference in their stance on data privacy before and after the test, as described above.

Results
The evaluation took place over the course of a week and comprised user tests with 27 participants. The age groups were evenly distributed between 16 and 50+ years. All participants assessed themselves to be competent with internet surfing and most stated that they spend more than 4 hours a day on the internet, which shows that they have probably already come into contact with data sharing agreements in the form of website cookies or similar requests. 80% of the test users claimed to have a valid driver's license. 40% of the participants drive daily and use the car as their main mean of transport.
In the testing phase we noticed that the test users had no problems solving the tasks, that were given to them. There were only a few exceptions, where tasks couldn't be solved without help. The users understood the interface by mostly relying on their intuitions and recognition of symbols.
After the practical tasks, the participants rated the overall design of the application with a mean score of 3.4 on a scale from 1 to 4. Also, they were asked to choose from a set of adjectives, which they found to be describing the interaction with the application best. Among the most chosen ones were "organised" (74%), "innovative" (70.3%) and "effective" (44.3%). The more negatively associated adjectives like "complex", "hard to use" and "useless" were only selected by two people at most. The good general impression test users had with the application was further strengthened by the fact that 74.1% of the participants stated they were "very satisfied" with the user interface.
When looking at the rated understandability, we saw that 76.5% of the participants said that the application improved their understanding of what happens to their personal data, showing a clear increase in the users awareness of the data sharing process. Before starting the test with the tool, the participants were asked if they trust companies in the European Union (EU) to respect their data privacy, which the majority answered with "yes" or "probably" (combined 59.2%). The remaining 40.8% of the participants were more sceptical about GDPR adherence of companies or were not even aware of the GDPR. After the test with the visualisation, 40% of the sceptics changed their mind on this particular question, gaining trust in the regulations. At the same time 28% of the people who trusted companies in the EU before the test, felt less secure after the test.

Yes
Possibly Possibly, rather than without the tool Probably not Never Figure 5. Results of the user evaluation concerning vehicle data sharing. The questions were asked before (1) and after (2) the participants' interaction with the tool.
After finishing the test we asked again if the participants were willing to share vehicle data like GPS location, speedometer data or fuel gauge readings, if they had the tested tool at their disposal. Before having seen the visualisation, 48% of the participants claimed they would share at least selected types of data with campaigns on CampaNeo. After the interaction with the tool, 85.2% of the participants stated that they would possibly share their vehicle data, if they had the tool to control the sharing activities (see Figure 5).
Further we asked the participants: "If such a tool were available to you in any service or application, would you share selected data with a company or institution?". To this 81.5% said "Yes", "possibly" or "possibly, rather than without the tool". Before the test we had asked: "Are you willing to send your user data to companies or institutions, so they can improve user experience more efficiently?". Only 48% of the users responded positively before they had interacted with the visualisation. This indicates that the general approach of our visualisation would also be desirable to users outside of the vehicle sensor data scenario.

Conclusion
This paper proposes an application and associated design approaches to visualise data sharing activities after consent was received by a data owner. We built a tool for the scenario of vehicle sensor data sharing. The visualisation shows an overview of the data sharing campaigns which have user consent. A particle stream animation lets users oversee data sharing activities with one glance. A more detailed overlay is designed to present the user all the organisations behind the data sharing campaigns and the detailed data retrievals in a time series.
The conclusion was drawn from a user case study, which was designed to find out whether the visualisation is helpful to improve the comprehension of data privacy rights. The results showed that 40% of the test candidates, who did not believe that companies in the EU were respecting an individual's data privacy, changed their mind after the test with the visualisation tool. They were afterwards more convinced that user data of EU citizens can not be gathered without consent.
The work shows that the availability of the presented tool can increase consent rates for data sharing campaigns in the automotive sector from 48% to 81%. In the user test, we asked test users whether they can imagine using this tool for any kind of data sharing. More than 40% of the test users stated that they would be more likely to share different kinds of user data in services and applications they use, if they had such a tool. We can therefore conclude that there is a clear need for better visualisation of data sharing streams. Furthermore, the case study shows that users feel more comfortable sharing data, if they can easily oversee the exact activities in a visualisation tool.
Since this is the first design iteration of this tool, the collected feedback ideas and comments will be implemented in future iterations to enhance the interface. Further, the visualisation will be integrated into a bigger application which presents data sharing campaigns and handles consent management. Once the next prototype is built, another user test with a bigger sample group will be scheduled.