Data Privacy on the Move: Securing IoT and Vehicle Networks

Author: Denis Avetisyan

A comprehensive review explores the evolving landscape of privacy-preserving architectures for the increasingly connected worlds of the Internet of Things and vehicular communications.

A system architecture facilitates secure data sharing from Internet of Things and vehicular sources, employing privacy primitives to protect information as it flows to decentralized storage and ultimately reaches authorized consumers.

This survey analyzes techniques like Federated Learning, Homomorphic Encryption, and Blockchain to address the trade-offs between privacy, efficiency, and trust in data sharing.

Achieving robust data privacy alongside computational efficiency and decentralized trust remains a fundamental challenge in the rapidly expanding realms of Internet of Things (IoT) and vehicular networks. This survey, ‘Systematic Survey on Privacy-Preserving Architectures for IoT and Vehicular Data Sharing: Techniques, Challenges, and Future Directions’, systematically analyzes 75 technical papers to reveal a persistent trilemma wherein current architectures excel in only one or two dimensions of privacy, efficiency, and trust. Our analysis, categorized by Decentralized Computation, Cryptography-based, and Distributed Ledger approaches, demonstrates a recent surge in research-with nearly half of all publications appearing in 2024-2025-and suggests that hybrid architectures offer the most promising path forward. Will the integration of these complementary paradigms unlock scalable and secure data sharing for next-generation intelligent transportation systems and IoT ecosystems?

The Delicate Balance of Data Utility

Data sharing is fundamental to deriving meaningful insights across numerous fields, from medical research to urban planning; however, conventional methods of data exchange frequently necessitate trade-offs. Often, maximizing analytical efficiency – the speed and accuracy of results – requires access to granular, identifiable data, which inherently jeopardizes individual privacy. Conversely, stringent privacy protections, such as anonymization or differential privacy, can introduce noise or distortion, diminishing the utility of the data and slowing down analysis. Furthermore, a lack of transparency regarding data handling practices and algorithmic biases erodes trust, potentially leading to reluctance to share data in the first place. This creates a critical tension, where improving one aspect of data analysis often comes at the expense of another, highlighting the need for innovative approaches that can simultaneously safeguard privacy, maintain efficiency, and foster trust.

Modern data science frequently encounters a core challenge: the Privacy-Efficiency-Trust Trilemma. This tension arises because maximizing all three elements simultaneously proves exceptionally difficult; improvements in one area often necessitate compromises in another. A comprehensive survey of 75 privacy-preserving architectures reveals this trade-off is consistently present; techniques prioritizing data privacy, such as differential privacy, can significantly reduce analytical efficiency. Conversely, architectures optimized for speed and accuracy frequently rely on data sharing practices that erode user trust and expose sensitive information. The survey demonstrates that no single architecture currently dominates, highlighting the need for innovative solutions capable of balancing these competing priorities to truly unlock the potential of collaborative data analysis.

Current approaches to data analysis frequently necessitate trade-offs, often excelling in only two aspects of the Privacy-Efficiency-Trust Trilemma. Many systems, for instance, prioritize data utility and computational speed – achieving high efficiency – at the cost of robust privacy safeguards or user confidence. Conversely, architectures focused on stringent privacy, such as those employing extensive data anonymization, often suffer from diminished analytical accuracy and increased computational burden, hindering overall efficiency. This pattern reveals a critical gap in the field: a need for data science frameworks capable of simultaneously upholding privacy, maintaining analytical performance, and fostering trust among data contributors – holistic architectures that move beyond these limited, pairwise optimizations.

Realizing the transformative power of collaborative data analysis hinges on overcoming the inherent challenges posed by the Privacy-Efficiency-Trust Trilemma. Current limitations in balancing these crucial elements restrict the scope and impact of shared datasets, hindering advancements in fields ranging from healthcare to urban planning. A truly robust data science ecosystem demands architectures that don’t force a compromise between safeguarding individual privacy, maintaining computational efficiency, and fostering user trust. Only by simultaneously addressing all three pillars can the full potential of pooled knowledge be harnessed, enabling breakthroughs currently obscured by practical and ethical constraints and unlocking novel insights previously inaccessible due to data silos or concerns over misuse.

Privacy-preserving architectures can be categorized into three paradigms: decentralized computation, cryptography, and distributed ledgers.

Architectures for Protecting Data in Motion

Privacy-Preserving Architectures (PPAs) signify a fundamental change in data handling, moving away from centralized data collection towards distributed analysis. Traditionally, data analysis required aggregating sensitive information in a single location, creating substantial privacy risks and potential for breaches. PPAs address this by enabling analytical processes – such as machine learning and statistical computation – to occur directly on the data source, or on encrypted versions of the data, without requiring raw data to be transferred or exposed. This approach not only minimizes the attack surface for potential data compromises, but also facilitates compliance with increasingly stringent data protection regulations like GDPR and CCPA, allowing organizations to derive value from data while upholding individual privacy rights.

Federated Learning (FL) is a distributed machine learning approach that enables model training on a multitude of decentralized edge devices or servers holding local data samples, without exchanging those data samples. This contrasts with traditional centralized machine learning where all data is aggregated in a single location. In FL, a shared global model is iteratively refined through local computations performed on each participant’s data. Only model updates – such as weight adjustments – are communicated to a central server, where they are aggregated to improve the global model. This minimizes data transfer and preserves data privacy, as raw data remains on the local devices. Communication efficiency is a key consideration in FL, often addressed through techniques like model compression and selective parameter updates.

Homomorphic Encryption (HE) is a form of encryption that enables computation to be performed directly on ciphertext, meaning data remains encrypted throughout the processing lifecycle. This eliminates the need to decrypt data for analysis, significantly enhancing data privacy and security. However, performing computations on encrypted data introduces substantial computational overhead. Benchmarks indicate that HE operations can require between 5.7 and 28.4 times the computational resources compared to equivalent operations performed on plaintext data; this overhead varies depending on the specific HE scheme, data size, and computational complexity of the operations performed.

Blockchain technologies contribute to data security by creating a distributed, immutable ledger. Each transaction or data modification is recorded as a “block” cryptographically linked to the previous block, forming a chain. This structure inherently resists tampering; altering any single block requires modifying all subsequent blocks and controlling a majority of the network nodes, a computationally prohibitive task. The resulting audit trail provides verifiable proof of data lineage and integrity, fostering trust among parties sharing or utilizing sensitive information. Specific implementations often employ consensus mechanisms, such as Proof-of-Work or Proof-of-Stake, to validate transactions and ensure the reliability of the blockchain.

Research activity across architectural paradigms is projected to shift from a focus on traditional methods <span class="katex-eq" data-katex-display="false">
ightarrow</span> towards increased exploration of emerging paradigms between 2021 and 2025. — Research activity across architectural paradigms is projected to shift from a focus on traditional methods $ightarrow$ towards increased exploration of emerging paradigms between 2021 and 2025.

The Foundations of Secure Computation

Trusted Execution Environments (TEEs) utilize hardware isolation to create secure enclaves for sensitive code and data processing. These environments, such as Intel SGX and ARM TrustZone, operate outside the normal operating system kernel, reducing the attack surface. However, TEEs incur performance overhead, with reported values ranging from 2.43% to 60% depending on the workload and specific implementation. Furthermore, despite hardware isolation, TEEs remain vulnerable to side-channel attacks, including timing attacks, power analysis, and electromagnetic radiation analysis, which can potentially leak information about the executed code and processed data. Mitigation strategies are actively being researched and implemented to address these vulnerabilities.

Remote attestation is a process by which a party can verify that a remote computing environment – typically a Trusted Execution Environment (TEE) – has been initialized correctly and is operating as expected. This verification relies on cryptographic proofs generated within the TEE, demonstrating the integrity of the loaded code and configuration. These proofs, often based on cryptographic hashes of the environment’s initial state, are then sent to a verifier. The verifier compares this received proof against a known good configuration, establishing trust in the remote environment’s authenticity and preventing unauthorized modifications or malicious code execution. Successful attestation provides assurance that computations performed within the TEE are conducted on a trustworthy platform, and is a prerequisite for many secure computation protocols.

Secure Multi-Party Computation (SMPC) is a cryptographic protocol that allows multiple parties to jointly compute a function over their private inputs while keeping those inputs confidential. Rather than exchanging data directly, each party provides input to a distributed computation, and the result is revealed without disclosing the individual data used to generate it. This is achieved through techniques like secret sharing, where each input is divided into multiple shares distributed amongst the parties, or through garbled circuits, which allow computation on encrypted data. SMPC protocols are designed to guarantee that no party learns more than what can be inferred from the final output, even if some parties collude. Applications include privacy-preserving data mining, secure auctions, and federated learning where data remains distributed and local.

Attribute-Based Encryption (ABE) is a public-key encryption technique where ciphertext is encrypted with a set of attributes, and a user’s private key is associated with a set of attributes as well. Decryption is only possible if the attributes associated with the user’s key match the attributes required by the ciphertext. Ciphertext-Policy ABE (CP-ABE) is an extension of ABE where the ciphertext creator defines the access policy – a Boolean expression over attributes – that determines who can decrypt the data. This policy dictates the specific combination of attributes a user must possess to gain access, offering fine-grained access control beyond traditional identity-based or role-based methods. CP-ABE is particularly useful in scenarios requiring data sharing with dynamic groups and complex access requirements, as access rights are determined by attribute possession rather than fixed identities.

The Resilience of Distributed Consensus

At the core of blockchain technology lies a distributed ledger, a revolutionary approach to data recording that fundamentally alters traditional centralized systems. Unlike conventional databases managed by a single entity, a blockchain’s ledger is replicated across numerous computers, creating a network of identical records. This distribution isn’t merely about redundancy; it establishes inherent transparency, as any changes to the data require consensus across the network. Crucially, the ledger’s immutability stems from cryptographic hashing; each block of data contains a hash of the previous block, forming a chain where altering any past record would necessitate recalculating all subsequent hashes – a computationally prohibitive task. This combination of distribution and cryptographic security ensures data integrity and builds trust by eliminating single points of failure and making tampering exceptionally difficult, fostering a verifiable and permanent record of transactions or information.

A blockchain’s robustness hinges on its ability to maintain integrity even when faced with compromised components, a characteristic achieved through Byzantine Fault Tolerance (BFT). This sophisticated system doesn’t require absolute trust in all participants; instead, it’s designed to function correctly so long as a majority of nodes operate honestly. BFT algorithms allow the network to reach consensus despite the presence of malicious actors deliberately attempting to disrupt the process or faulty nodes simply failing. By employing techniques like redundant data verification and weighted voting, BFT ensures that erroneous or malicious information is identified and discarded, preserving the accuracy and reliability of the blockchain’s ledger. This inherent resilience is paramount for applications demanding unwavering data integrity, such as financial transactions, supply chain management, and secure voting systems, effectively mitigating the risk of manipulation or failure.

Blockchain security and transaction validation rely heavily on consensus mechanisms, most notably Proof-of-Work (PoW) and Proof-of-Stake (PoS). While both aim to establish agreement and prevent fraudulent activity, they differ significantly in practical application. Proof-of-Work, the original method, demands substantial computational effort to solve complex cryptographic puzzles, securing the network but inherently introducing latency; typical block times often exceed ten minutes, creating a bottleneck for applications requiring near-instantaneous confirmation. Proof-of-Stake offers an alternative, reducing energy consumption and potentially accelerating transaction speeds by selecting validators based on the quantity of cryptocurrency they hold and are willing to ‘stake’ as collateral. This difference in processing time is crucial, as the limitations of PoW restrict blockchain’s usability in scenarios like real-time financial transactions or time-critical data logging, pushing development towards more efficient consensus algorithms.

The InterPlanetary File System (IPFS) bolsters blockchain technology by offering a decentralized method of storing and accessing data, moving beyond simply recording transactions to preserving the underlying content itself. Unlike traditional storage which relies on location, IPFS uniquely identifies files by their content, ensuring data integrity – any alteration results in a different identifier. This content-addressed storage enhances availability, as data is distributed across multiple nodes, reducing the risk of single points of failure. However, utilizing IPFS isn’t without cost; storing 10 megabytes of data currently incurs substantial gas fees, reaching approximately 1.57 million Gwei, representing a trade-off between enhanced security and financial expenditure for applications requiring large data storage.

A Future Built on Privacy-Enhancing Technologies

Differential Privacy (DP) represents a significant advancement in data analysis by intentionally introducing a carefully measured amount of random noise to datasets. This isn’t about obscuring information entirely; instead, DP aims to protect the privacy of individual contributors while still allowing researchers to extract valuable, statistically sound insights. The core principle involves a trade-off: the noise level is calibrated to ensure that the addition or removal of any single individual’s data has a limited impact on the overall analytical results. This prevents the re-identification of individuals and mitigates the risk of privacy breaches, even when sophisticated data mining techniques are employed. Consequently, DP enables analysis on sensitive data – such as medical records or personal financial information – without compromising the confidentiality of those whose data is being used, fostering trust and responsible data science practices.

Combining differential privacy with federated learning represents a powerful strategy for safeguarding sensitive data during the model training process. Federated learning allows algorithms to learn from decentralized datasets residing on individual devices or servers, eliminating the need to centralize data. However, even with this distributed approach, model updates themselves can inadvertently leak information about the underlying data. Integrating differential privacy introduces carefully calibrated noise to these updates, masking individual contributions and ensuring that the learning process doesn’t reveal private details. This combined technique creates a robust system where valuable insights can be extracted from data without compromising the privacy of those who contribute to it, paving the way for responsible and ethical data science practices.

The convergence of differential privacy and federated learning represents a significant stride toward harnessing the power of data while upholding ethical standards. Recent analysis of seventy-five architectural designs demonstrates a marked increase in contributions centered around decentralized computation, comprising nearly half – 48% – of all published works between 2024 and 2025. This trend is further highlighted by the fact that ten out of seventeen papers published in 2025 specifically investigated this paradigm, indicating a growing focus on systems that distribute data processing and preserve individual privacy – a crucial development as data-driven insights become increasingly vital across numerous fields.

Continued research is critically needed to refine the delicate balance between data privacy, computational efficiency, and analytical utility in increasingly complex data systems. While techniques like Federated Learning offer promising avenues for collaborative analysis without direct data sharing, they remain vulnerable to attacks – specifically, Byzantine poisoning can degrade model accuracy significantly, ranging from a substantial 90% reduction to a more moderate 21.35%. Addressing this necessitates exploring novel defense mechanisms, optimizing noise calibration strategies within Differential Privacy, and developing algorithms that maintain high performance even under adversarial conditions, ultimately striving for robust, trustworthy, and ethically sound data-driven solutions.

The survey meticulously details the escalating complexities within IoT and vehicular data sharing, highlighting the inherent tension between robust privacy and practical system efficiency. It posits that achieving true security isn’t about layering on more features, but rather streamlining systems to their essential function. This aligns perfectly with the sentiment expressed by Ken Thompson: “Turn off everything you can.” The article advocates for hybrid architectures – a pragmatic approach mirroring Thompson’s principle. A system overburdened with unnecessary complexity, as often seen in attempts to achieve absolute privacy, ultimately diminishes its usability and trustworthiness. The core concept of minimizing unnecessary elements, thereby maximizing clarity and utility, resonates throughout both the study and Thompson’s enduring philosophy.

What’s Next?

The pursuit of privacy in data sharing, particularly within the increasingly interconnected realms of IoT and vehicular networks, reveals a fundamental tension. Each proposed architecture – be it Federated Learning, Homomorphic Encryption, or Blockchain – functions as a reduction of a more complex problem, introducing its own constraints on efficiency and trust. The surveyed landscape suggests that a singular, elegant solution remains elusive, a testament to the inherent messiness of real-world systems. The field’s trajectory will likely be defined not by the discovery of a perfect paradigm, but by the refinement of pragmatic compromise.

Future work must address the scalability challenges that plague many privacy-preserving techniques. Theoretical gains often diminish rapidly when confronted with the volume and velocity of data generated by connected devices. Hybrid architectures, intelligently combining the strengths of different approaches, appear most promising, though their optimal configuration remains an open question. The focus should shift from maximizing privacy in isolation to optimizing the trade-off between privacy, computational cost, and data utility.

Ultimately, the true measure of progress will not be the complexity of the algorithms employed, but the simplicity with which they integrate into existing infrastructure. A solution that demands a complete overhaul of current systems is destined to remain a thought experiment. The path forward demands a relentless pruning of unnecessary features, a commitment to clarity, and an acceptance that perfection, in this domain, is the disappearance of the architect.

Original article: https://arxiv.org/pdf/2603.01876.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/