本文目录

【阅读笔记】 Blockchain management and machine learning adaptation for IoT environment in 5G and beyond networks: A systematic review
- 负一、问答
- 〇、本文的背景
- 一、本文有哪些贡献
- 二、如何写一篇综述？（本文是怎么写的）
- 三、其他的相关综述文章
- 四、先行知识基础
- - 4.1 Blockchain
  - 4.2 Machine Learning
- 五、BC + ML + IoT
- - 5.1 Blockchain for machine learning
  - - 5.1.1 去信任（trustless）的机器学习合约
    - 5.1.2 ML计算中的分布式信任
    - 5.1.3 用与Ml models上的可验证的开放仓库（Verifiable open repository of ML models）
    - 5.1.4 隐私保护（Privacy preservation）
    - 5.1.5 ML数据上的加密安全（Cryptographic security on ML data）
  - 5.2 ML for blockchain
  - - 5.2.1 Resource management and computational offloading
    - 5.2.2 ML被用来预测电子货币的价格（Predicting cryptocurrency price）
    - 5.2.3 将ML用于区块链上的异常检测/攻击预防
    - 5.2.4 降低网络的匿名性
    - 5.2.5 区块链相关数据的分类
- 六、ML+BC+IoT的挑战
- 七、本文作者的总结
- 八、我的总结
- 参考
- - 文章信息
  - 封面信息

【阅读笔记】 Blockchain management and machine learning adaptation for IoT environment in 5G and beyond networks: A systematic review

本文是一篇CCF C类文章，作者来自印度旁遮普邦帕蒂拉塔帕工程技术学院计算机科学与工程系
🙋‍♂️张同学 📧zhangruiyuan@zju.edu.cn 有问题请联系我~

负一、问答

5G 和 B5G有什么区别？
答：5G主要解决了我们熟悉的高清视频、传输速率等问题；而B5G(Beyond-5G)将解决一些应用场景与技术的完善过程，比如，在远程医疗、智慧交通、工业4.0方面的行业运用。

〇、本文的背景

大数据分析技术 + IoT应用需要安全和隐私保护 = 造就了机器学习与区块链技术的结合(the integration of machine learning and blockchain)

Keeping in view of the constraints and challenges with respect to big data analytics along with security and privacy preservation for 5G and B5G applications, the integration of machine learning and blockchain, two of the most promising technologies of the modern era is inevitable.

IoT设备介绍

IoT设备是什么呢？
Over the last decade, Internet of Things (IoT) has revolutionized the whole world leading to various technological trends starting from Industry 1.0 to Industry 5.0, AR/VR/MR, smart factories, tactile Inter- net, smart transportation, smart plants, etc. It is an interconnection of various devices monitored and controlled using the Internet in order to provide ubiquitous computing services to the end-users.
IoT设备中存在的问题
Because of the constraints such as — heterogeneity of devices, resource constraints, power storage, security, and data management constant revolutions are foreseen in IoT over the years. Among these, the security and privacy are most crucial keeping in view of the data access restric- tions at various levels in different applications [1].
大量的IoT设备的产生
Moreover, with an increase in the number of IoT devices, the data generated by these devices is increasing exponentially in recent years. As per the report [2], the number of IoT devices connected to the Internet at the end of Nov. 2019 was 26.6 billion and is expected to reach 75 billion by the year 2025.
IoT设备中存在隐私问题

Moreover, all IoT applications are having sen- sitive information, for which security and privacy preservation are of utmost important. Also, devices are reluctant to transfer their data for training purposes in an open environment such as the Internet because of privacy concerns [3].
IoT系统中为什么要使用机器学习呢？
1. Also, IoT system needs to be autonomous（自动运行） so that it can learn from the gathered data and make context-based decisions [4]. In such an environment, machine learning (ML) can be an effective tool in understanding the patterns, analyzing, processing, and making intelligent decisions.
2. The ever-growing market for IoT demands the usage of ML-based models for accuracy and precision in the decision-making process. Implementing ML in IoT applications can significantly improve data analytics and real-time decision-making. Applications of ML in various IoT use-cases (e.g., smart transportation, smart grid, etc.) include network optimization, resource allocation, congestion avoidance [6].

机器学习市场的发展变化

Fig. 1© shows the global ML market share from the year 2017 to 2024 [5]. Technology advancements in ML and deep learning (DL) have changed the way a computer can process information automatically.

ai在IoT领域的应用(阅读这一部分的时候，可以看出来，本文的作者文献引用的情况太少了，一些需要引用其他案例的内容并没有引用)

For example, autonomous controllers based upon Artificial Intelligence (AI) can be used to optimize energy us- age [7].
predictive models for energy consumption including Markov’s decision process and NN’s can be incorporated with IoT- enabled devices [8].
补充说明一下，为啥我觉得本文作者写的文章引用量太少；下面这么多篇幅的内容，却没有引用相关文献。

support vector machine (SVM) provides effective data classification for blockchain peers and other transactional entities. Moreover, the supervised ML algorithms such as — random forest, gradient boost, etc. are used to reduce anonymity in the blockchain network. Recently, NN’s are also exploited to predict the price of cryptocurrency. With various computing models, ML can ease data verification, validation process and helps in identification of anomalies and malicious attacks in the blockchain network. Resource management, classification of transactional entities, and managing offloading tasks are some other applications of ML for blockchain.

引出区块链技术

With the centralized authority, threats of privacy preservation, false authentication, data tampering prevails. Also, the reliability of data is very important for ML algorithms in order to obtain accurate results. Even a small security loophole in the ML algorithm can generate high false rate for certain events. Moreover, the computations ofML models are dependent on the trusted third party (TTP) (e.g., a cloud service provider) for many security applications which may raise serious privacy concerns. Hence, there is a demand for decentralized framework based ML.

区块链公司发展、区块链与IoT结合的市场变化

Fig. 1(b) represents the percentage of startups in different industries focusing on blockchain in the year 2021 [9]. As per a report in [10], IoT blockchain 50 based spending is expected to reach $573M by 2023 as compared to $174M in the year 2018 (Refer Fig. 1(a)).

区块链可以用在IoT中的案例

Also, blockchain technology can provide many benefits to 5G IoT networks including secure authen- tication, secure communication, secure network coding, and resource configuration framework [11,12].

区块链对于机器学习的作用？

Moreover, blockchain can improve the performance of ML algorithms as it provides digitally signed data from reliable, trusted, and secure sources. The distributed computing powers can be utilized for developing a better and secure prediction model.

the adoption of ML in blockchain helps to analyze the existing issues in blockchain technology, enabling to enhance the security and privacy of the whole network.

上述图片是本文的总结性贡献，我觉得，1）如果我来绘制这张图片，我会在这张图片的基础上再添加上引用文献；2）ml 和 blockchain应该不属于上下层级的关系吧，应该分开去绘制。

以后我绘制这样的图片的时候，也可以去多找一些这样的基站的信息、图片。我觉得蛮高大上的。

区块链对于5G、B5G的作用？

With blockchain, 5G and B5G services can be more scalable as they support efficient solutions for spectrum sharing and resource management [14].

一、本文有哪些贡献

本文对IoT环境下的区块链和联邦学习结合进行了综合的分类

Then, we presented a comprehensive taxonomy for integration of blockchain and machine learning in an IoT environment.

本文探索了联邦学习、强化学习、深度学习算法在区块链上的应用

We also explored federated learning, reinforcement learning, deep learning algorithms usage in blockchain based applications.

最后，对这些技术在5G and B5G下的应用

Finally, we provide recommendations for future use cases of these emerging technologies in 5G and B5G technologies.

二、如何写一篇综述？（本文是怎么写的）

写作的方法

如何整理每一篇文章

本文的组织结构

文章1.1部分展示的是调研方法；2部分讨论了有关ml和bc的其他调研；3部分讨论了ml和bc；4部分讨论了ml+bc，并将其分类为ML for blockchain and blockchain for ML；5部分给出了挑战；6部分给出了结论。

三、其他的相关综述文章

大部分的ml和bc是不相关的，

Existing literature work reveals that blockchain and ML are surveyed mostly in isolation or with their applications in several vertical domains.

其他相关的综述

ML

Specifically, the survey of ML models for big data analysis can be found in [16–18].
BC

Meanwhile, multiple notable works such as [19–21] provide the concepts, advantages, challenges, and future research directions of blockchain technology.
BC + IoT

The more recent survey articles in the context of blockchain applications for IoT have been presented in [22–25]
ML + IoT

whereas authors of [26–28] discuss the applications of ML models in various fields of IoT
BC + ML + IoT
- Several studies were put forward addressing the integration of Artificial Intelligence (AI) and blockchain.
  - For example, the authors of [29] presented a review article on the integration of AI and blockchain by discussing applications of blockchain for AI as well as AI for blockchain.
  - Likewise, Salah et al. [30] present the review on the literature and sum- marize the existing blockchain applications and protocols facilitating AI domain. Along with this, open research challenges of implementing blockchain for AI are also discussed by the authors.
- However, only a few research efforts have been made on the integration of ML and blockchain, in order to provide decision-making service in an intelligent way while assuring security and privacy.
  - For example, Vyas et al. [31] discussed the role of blockchain in improving the accuracy of ML results for healthcare applications. However, authors presented a short survey article and in-depth knowledge cannot be gained with this article.
  - In the same way, Acheampong [32] presented an overview of the basic concepts of blockchain and ML by discussing the impact of blockchain in ML community.
  - More recently, authors in [33] conducted an inten- sive survey that focuses on a specific application of ML for blockchain, i.e., anomaly detection. Also, this article reviews the application of blockchain for privacy preservation in learning process.
- 将ML应用到BC中
  - In contrast, authors of [15] presented a review to discuss the applications of ML in blockchain technology. Specifically, authors have reviewed ML for blockchain applications such as — transaction entity classification, Bitcoin price prediction, computing power allocations, cryptocurrency price prediction, and portfolio management
  - In another work, Nguyen et al. [34] presented a small section that discusses the efficiency of ML in improving blockchain cloud of things (BCOT) framework.
  - Very recently, Rane et al. [35] presented in-depth survey on available ML algorithms for predicting Bitcoin prices and concluded that existing schemes only achieve accuracy of 60%–70%.
  - Recently, Liu et al. [36] present a survey article that discusses overview, benefits, applications, open issues, and challenges while combining blockchain and ML. （这篇文章，按照本文作者的描述，应该数据ML和BC结合的文章，但是为啥在本段中进行展示呢，就离谱）

作者将上述找到的其他文献进行了下述表格的总结，这样的总结我觉得蛮好的。

四、先行知识基础

4.1 Blockchain

区块链的分类

private
public
consortium

区块链中智能合约的作用及发展

The applications of smart contract are not only limited to cryptocurrency but can be extended to many applications including voting systems, inventory management, automation of payments, automation of claims and blind auctions, etc.

Solidity: Solidity [42] is the most popular high-level programming language used for implementing smart contracts on the Ethereum platform. This language is influenced by C++, python, and javascript.
Serpent [43] is inspired from the Python language which focuses on delivering high productivity and automating tasks
After Solidity, Vyper [44] is the next most popular lan- guage for Ethereum virtual machine (EVM) having syntax in- spired from Python.
LLL (Lisp like language) is the first low-level language devel- oped after the assembler for EVM and it is a tiny wrapper over coding around the assembler itself. LLL provides direct access to memory in an execution environment and can be easily opti- mized for speed.

为什么IoT一定要使用去中心化？

Moreover, with an over-increasing deployment of IoT objects, security is of prime concern. Cloud computing has been widely used to support IoT for management, processing, and storage.

However, its centralized nature raises security questions. Centralized servers manag- ing sensitive IoT data can be shared with anybody without the user’s consent, thus leading to privacy breaches [45].
Also, the intermediaries decrease the efficiency of interactions among system components. Also, with an increase in the number of IoT devices, current centralized devices providing security services including authentication and autho- rization will turn into a bottleneck.
Moreover, the security vulnerability because of centralization is an easy target for Distributed denial-of- service (DDoS) attacks.
Additionally, to ensure data integrity presence of publically verifiable audits without involving a TTP is desirable. In this context, blockchain can mitigate security and privacy risks with its capabilities such as — transparency, immutability, anonymity, decentralization, and operational resilience [4].

如何解决IoT场景下的计算资源、存储问题？

However, to support resource-constrained nature of IoT devices blockchain provides the concept of Simplified payment verification, in which nodes need not to store complete blockchain data rather only block headers. In this context, Le and Mutka [46] proposed a lightweight method to validate blockchain data using bloom filter (probabilistic data structure).
Similarly, authors in [47] presented a proposal that integrates blockchain with constrained IoT devices. The evaluation of the proposal is carried out in terms of memory, processing time, and power consumption.

区块链在IoT场景下应用，需要解决的问题！！（本文作者的总结）

However, high computation, storage costs, high energy demands, communication hurdles, mobility of devices, and latency are some of the challenges faced while integrating blockchain with IoT. In an IoT network, devices generate gigabytes of data in real-time. Due to lack of storage blockchain might appear unsuitable for IoT networks. The limited resource IoT devices are also unsuitable for highly com- putational PoW consensus algorithm. Hence, the scalability issue of integrating blockchain and IoT needs an immediate effective solution. Also, different characteristics of IoT network such as — heterogeneity, wireless communication and mobility complicates the security chal- lenge. Moreover, the transparency supported by IoT can affect the privacy of data. Last but not the least, lack of regulations and standards can influence the future of blockchain and IoT.

4.2 Machine Learning

Machine Learning 介绍，可以瞅一瞅，反正大致就是那一套。

ML is a branch of AI that makes programming machines to perform particular tasks by learning. With time, ML models have been able to exceed humans in various problems. Particularly, previous experience is used to execute assigned tasks. ML algorithms have proved their sig- nificance in various areas such as — transportation, image processing, marketing, etc. ML includes various models to solve different types of problems. The most commonly used ML models involve SVM, Artificial Neural Networks (ANN), decision trees, etc. to name a few. Building a new ML model involves two steps, i.e., training and testing in order to perform tasks of prediction, classification, clustering, etc. on new dataset. Indeed, data is an important source in ML. The data is required in preprocessing and training any ML model. First, the ML model is trained with a training dataset. With the increase in size of training data, the efficiency of ML classifier also increases [48]. Next, after the training phase, the accuracy of the prediction is evaluated with a new dataset. In case of acceptable accuracy, the ML model is deployed otherwise it is trained again. In recent, a popular subcategory of ML named deep learning (DL) has emerged to imitate the human thinking process. The fundamentals of DL have been originated from cognitive theories that are used to create NN structure. Popular applications of DL include object detection, face recognization, and traffic flow prediction to name a few [49]. Supervised learning, unsupervised learning, and reinforcement learning (RL) are three categorizations of learning styles in ML al- gorithms. In supervised learning, the machine is trained with well labeled data, i.e., the data is already mapped with the correct an- swer. Next, the machine is fed with a completely new set of data to generate correct results from analyzing the labeled data from training phase. Furthermore, supervised learning is divided into two categories that include classification and regression. SVM, decision trees, nearest neighbor, etc. are popular algorithms under this category. In contrast, unsupervised learning is training the machine with input data that is not labelled or classified. Specifically, the aim is to group unsorted data as per similarity and difference such as — pattern detection and descriptive modeling. Clustering and association are two categories of unsupervised learning [50]. K-means clustering and Principle Compo- nent Analysis (PCA) are popular algorithms under this category. In RL, an agent is employed to interact with the environment in order to find best outcome by continuously learning from the environment. RL uses trial-and-error method to train itself when exposed to a certain environment. Markov’s decision process is a popular example of RL. Notably, there are vulnerabilities in ML models system with respect to privacy and security.

本文作者认为ML中存在的安全攻击

Security attack in ML mainly includes evasion and poisoning attacks. Evasion attacks disrupt the entire classification process using adversarial examples whereas the poisoning attack destroys the data while training phase, which can decrease model accuracy [51]. On other hand, the privacy attack on ML model comes from service providers and third-party entities. Clearly, the development of ML mod- els empowers to launch new AI services including facial recognition and words suggestion. Nevertheless, the dataset provided to support these applications often includes sensitive and private information

五、BC + ML + IoT

本节按照下面的思维导图的结构来编写

5.1 Blockchain for machine learning

Blockchain for ML can solve the problem of data acquisition

With blockchain, the ML algorithm can be fed with highly reliable data and thus accurate and trusted results can be achieved. Also, training ML models with real-data will enhance the accuracy and efficiency of ML algorithms. The built-in consensus mechanism and fundamentals of blockchain ensure secure and tamper-proof sharing of IoT data.
Moreover, the existing client-master type ML models rely on trusted central servers and consider only privacy issues in linear sharing and ignore privacy in non-linear learning models. In the client-master model, an enormous amount of data generated by IoT devices is collected and stored at one central location whereas, in the distributed multi-party model, data is generated by various parties and stored in a distributed manner. However, the decentralized model incurs high communication costs and raises security and privacy issues. The transparency feature supported by blockchain, assures ML users confidentiality and privacy of data. As discussed, more amount of data available for training improves the overall throughput and produces a more effective and reliable system. Clearly, blockchain in ML can result in much safer data and better ML models.

为什么作者认为「only privacy issues in linear sharing and ignore privacy in non-linear learning models」呢？这个问题我没法自己解答。线性模型与非线性模型的区别？https://zhuanlan.zhihu.com/p/37866896

The transparency feature supported by blockchain, assures ML users confidentiality and privacy of data.

Emmm, for me, 我知道区块链的透明指的是在区块链上的操作是公开透明的，这就避免了数据被篡改；但是，我不明白，为啥能确保ML用户机密性和数据隐私。

Blockchain for machine learning的相关应用

5.1.1 去信任（trustless）的机器学习合约

使用区块链的智能合约来构建机器学习的激励机制，即充分利用区块链的去信任化。

The proposal introduced by [56] implements the concept of trustless ML contract and it is defined in 3 phases. In the first phase, a dataset, an evaluation function, amount of reward, and request for best ML model is submitted by the reward giver/buyer. In the second phase, the provided dataset is downloaded by ML model providers/practitioners and each provider works independently in order to train the ML model. After training, the providers submit their model. In the last phase, the winner is selected. Moreover, such a proposal can be utilized for raising funds transpar- ently for IoT applications such as — medical research. In addition, it can achieve automated self-improvement for AI agents. Unfortunately, this proposal [56] does not require identity and reputation validation for creating a new transaction and hence raises security concerns. Also, this proposal works only for Ethereum blockchain. Fig. 8 represents an illustration of trustless ML contracts.

5.1.2 ML计算中的分布式信任

本节主要强调的是，使用区块链可以解决传统分布式机器学习中的中心化问题，强调的是使用区块链的去中心化。

Another matter to be considered in the context of ML is that these algorithms lack trustability and automation.

Notably, it is difficult to trust results from trained ML models having open source code and open data in an IoT environment.
In fact, multi agent socio-technical systems (which work collaboratively on some tasks, share models and data for local computations) due to the involve- ment of independent agents face trust issues in computations from other agents. In

中心化的系统存在数据篡改威胁

As ML algorithm relies on data that is mutable, so it is difficult to trust the results from these algorithms. The system administrator can manipulate the data source that in return changes the result.

目前的ML模型大都是人工的，缺少自动化。那怎么建立一个信任的、透明的协作计算平台呢？用密码学技术！

Also, existing ML models are mostly controlled by human beings so it is difficult to automate the ML algorithms. Hence, there is a need for developing an environment having trust and transparency in computations for collaborative op- erations. To solve this problem, zero-knowledge proof, Elliptic-curve cryptography (ECC), etc. are some cryptographic techniques that are effective in the verification and validation of computations [73,74].

In this context, Raman et al. [57] proposed a model for verification and validation of computations in a permissioned blockchain network for multi-agent socio-technical system. Authors have demonstrated the usage of blockchain in developing trust for recording and validating audit at each step of computations.
- However, due to lack of scalability large scale computations for a multi agent network prove expensive.
  
  For this, the authors have used a lossy compression technique that reduces the communication and storage cost of the blockchain network.(这篇文章就是模型压缩的相关文章，回头可以去看一下)
Similarly, authors of [62] established a link between ML and blockchain technology in order to solve trustability and automation issues of ML by using association rule mining.

5.1.3 用与Ml models上的可验证的开放仓库（Verifiable open repository of ML models）

用ML来作为区块链挖矿的过程（即挖矿节点上的“可验证”）

比如，使用训练的过程来替代区块链的共识算法。但是，我怎么感觉这里不属于区块链为ML做的事情呢，emmm；有点像ML为区块链做的事情，emmm；本文是不是指的是使用区块链来构造这么一条MLmodel链呢？这一段是不是强调使用区块链来构建一个ML框架的事情呢?

这一章，同时介绍了，使用区块链（智能合约）来为ML做一些工作的时候需要了一些困难，以及相关的解决文献。

Pow共识算法的缺陷？

Among all research work on consensus algorithms, Proof-of-Work(PoW) is the widely accepted technical consensus algorithm use to settle among all participating nodes. However, the PoW consensus algorithm proves costly and environmental unfriendly due to the high computations involved in it. After PoW many other consensus algorithms such as — Proof-of-Stake (PoS), Proof-of-Activity (PoA) were introduced in order to reduce computations while mining blocks.

In this context, the authors of [58] introduced a cryptocurrency named ‘‘WekaCoin’’ that is based on Proof-of-Learning (PoL) consensus algo- rithm. PoL is inspired by open-source ML competitions (e.g. Kaggle and CodaLab). Among all network nodes, some nodes called trainers upload ML models on blockchain network for tasks that were submitted by other nodes called suppliers. (The model initiator may upload their model on a Interplanetary file system (IPFS) system and in return receives checksum hash.) The uploaded models are then tested for data that was not considered by trainers while training. The validator nodes which are selected randomly are then supposed to rank these models and add the information to the block. The trainer nodes having the best model are rewarded with WekaCoins by supplier nodes. This way blockchain can be used for generating verifiable ML models. The flowchart for the understanding of PoL algorithm is presented in Fig. 9. The main advantage of this protocol is that the computations involved in the validation process solve useful tasks as well as creates a validated open repository for ML models and datasets. However, the authors have not discussed the prevention of collusion among suppliers, trainers, and validators.
In contrast to the permissionless blockchain, authors of [69] developed privacy preserving distributed ML model based on permissioned blockchain network. This is, however, a first attempt to propose a distributed ML model for a permissioned blockchain network. Decentralized ML allows machines to perform intelligent decision-making on data securely stored on the blockchain network without involving any TTP. The decentralized ML technique allows algorithms or ML models to run directly on connected mobile devices. This distributed technology is smart contract based marketplace that connects developers, clients, and data owners by facilitating all stakeholders in a way to create a middle-man free ML infrastructure. The authors demonstrated that the impact of proposed error based aggregation rule supports high resilience and mitigates collusion attack.
However, latency and bandwidth are the major drawback of distributed ML [75]. To improve network condition, 5G technology can be adopted as it enables high availability. In this direction, to ensure byzantine resilience for distributive learning in five networks, authors in [70] have proposed a blockchain based secure computing framework. By using a sharding based blockchain, authors have prevented arbitrary attacks on learning convergence.

智能合约存在的问题？

智能合约不能执行太重的任务

However, authors of [76] pointed out that ML programs cannot be stored with blockchain because of the certain limitations of smart contracts. The authors pointed out that smart contracts cannot process high computational tasks.

这一段内容的思想表明，计算所带来的损失消耗会影响挖矿的程度（这与我的综述文章的思想是一致的，我觉得可以引用一下）

With the blockchain mining process, when output corresponding to any input is expected to be recorded via smart contracts, honest miners then execute the program to verify the correctness of results. In case of a computationally high process, adversarial nodes can skip and carry forward to verify the new block. This way adversarial nodes can get a chance of adding new blocks as honest participants are busy with the execution of smart contracts.

另外，智能合约不能执行随机数。

Moreover, the smart contract cannot carry randomized computations as with randomization honest nodes can have inconsistent output. Besides, as ML computations are costly and randomized, so ML tasks are difficult to execute with blockchain. To address this challenge, the authors of [76] have used a game theory approach that empowers randomized computations on the top of blockchain. Here, a simple incentive mechanism is designed in order to execute the program with crowdsourcing in a blockchain environment.

5.1.4 隐私保护（Privacy preservation）

使用区块链来解决ML中遇到的隐私问题，这里强调的是使用区块链的“密码学技术、不可篡改”等内容吧？不能确定

比如: 为了保护上传时的隐私、使用区块链来保护联邦学习的安全性（但是这一条，我觉得是ML为区块链做的工作吧，隐私保护，emmm。区块链也能够保护联邦学习，但是这里体现的是隐私保护吗？）

为什么ML遇到了隐私保护的问题？

Another matter to be considered in the context of ML is the privacy preservation of data. For example, ML healthcare predictive modeling has proved beneficial in national healthcare research and biomedical discoveries. However, data disclosure of patients to these third-party cloud services leads to privacy attacks. The available distributed privacy preserving predictive models are dependent on the central server to execute the modeling process [77].

下面的这句话，应该不足以支撑本段观点吧，emmm

Institutional policies, single point of failure, mutable disseminate data, and trust issues are some associated risks with the existing client–server architecture. Moreover, any participating node cannot leave or join the network for a short period of time in order to avoid any recovery issues.

作者的结论为：The state-of-art research has adopted blockchain technology in order to deal with the above-mentioned risks. The characteristics of blockchain technology make it suitable to deal with centralized privacy preservation models.

但是我个人觉得，上述内容并不能证明区块链能保护数据的隐私吧，emmm

ML中存在单点故障、等问题？

Institutional policies, single point of failure, mutable disseminate data, and trust issues are some associated risks with the existing client–server architecture. Moreover, any participating node cannot leave or join the network for a short period of time in order to avoid any recovery issues.

答：Blockchain avoids a single point of failure, Byzantine General, and Sybil attack problem and preserves privacy while predicting the modeling process.

本文给出的区块链能保护ML隐私的案例

In this context, Kou et al. [59] have presented Modelchain, a private blockchain that enabled privacy preserving pre- dictive modeling for the healthcare industry. Instead of relying on only PoW protocol, the authors have designed a new algorithm on the top of PoW named proof of information to increase the efficiency and accuracy of ML model. Unfortunately, the proof of information algorithm proves inefficient to deal with the scalability of the network. The result section demonstrated that Modelchain provides a secure and privacy preserving interoperability framework. Unfortunately, privacy preservation is provided but the authors of [59] did not consider the basic requirements for differential privacy as differential privacy based ML has to consider the fact that how many times a ML model can be trained without any privacy breach.

没有太看懂本文作者解释的上述文章的问题！！
Subsequently, Chen et al. [65] proposed another decentralized ML system called ‘‘Learningchain’’ that takes both linear and non-linear learning models in account without relying on the central server. Here, differential privacy based methods are also designed to preserve the privacy of data. Differential privacy or cryptographic solutions have proved to be efficient for preserving user’s data privacy [78–80]. This model is implemented on the Ethereum platform and a stochastic gradient descent algorithm is used to design a predictive model over blockchain.

The proposal works in 3 phases. In the first phase, a P2P network is initialized. In the second phase, data holders calculate their local gradients as per predefined common loss function and predictive model using differential privacy methods. Next, computed gradients are broadcasted in the network using differential privacy scheme for learning models. After reaching a consensus, local gradients are aggregated by the authority holder using Learningchain. Three different datasets were used for training and testing purposes, i.e., synthetic dataset, Wisconsin breast cancer dataset, and Modified Na- tional Institute of Standards and Technology database (MNIST) dataset. It is concluded in results that there is a trade-off between privacy and accuracy as lowering the privacy budget increases test errors.

为了保护上传数据时的隐私问题。

With the growing trend of DL models, many DL models are designed to be run on client devices such as — IoT devices or smart devices. Although this technique demands enough memory and disk space to run the models in real-time. Also, because of privacy concerns, it is not recommended to upload client data on a centralized machine for processing and executing ML algorithms.

Along the same line of thought, to preserve privacy while uploading client ML data, authors of [61] proposed another work. Singla et al. [61] proposed a blockchain-based system that stores client device profiles in a shared household to predict user activity. Here, the main aim is to enable automatic customization of each client using blockchain decentralized security and privacy. The personalization feature of each device is computed using rule mining. However, this proposal is based on the assumption that client preferences are not changing.

解决协作数据分享问题

Similarly, to solve the challenge in collaborative data sharing among multiple parties in IoT applications, Lu et al. [66] proposed a privacy preserving data sharing model using differential privacy methods.

引出联邦学习

However, rather than sharing raw data directly, the federated learning algorithm is utilized into permissioned blockchain network through which only data model is shared over decentralized multiple parties. In a centralized ML model, participants upload their data on central cloud server. The server performs all computational tasks for training on the data as shown in Fig. 10(a).

吹一波联邦学习

This model involves high risks of privacy attacks. Also, communication overhead is created between participants and the cloud server. In contrast, federated learning enables ML models to be computed on distributed mobile devices. This technique helps ML models to be trained on the devices where data is produced. This way the privacy of data is ensured as data of a particular device does not leave its data production place. This technique is disrupting the centralized way of data training.

联邦学习的过程

In federated learning, each device has its local training dataset that is never seen by the server and each device generates an update to the existing global model located at the server. Next, the server combines these models by aggregating them and the whole process is repeated until global model training is completed. The primary benefit of federated learning is the decoupling of the training phase from the requirement of direct access to raw training data. The process of federated learning based model is represented in Fig. 10(b).

为什么联邦学习要和区块链结合在一起

Therefore, it minimizes training and privacy risk. However, the usage of a single central server is vulnerable to a single point of failure. Moreover, there is no reward service for distributed devices. Notably, the devices with more data samples should be given reward as it con- tributes more to global training. With blockchain, verified local updates and exchanges can be enabled along with providing corresponding rewards proportional to the size of training sample size. The illustration of blockchain based federated learning has been represented in Fig. 11.

联邦学习和区块链结合时遇到【假装拥有数据】攻击及解决案例

Unfortunately, the federated learning technique fails to provide security in case of the presence of Byzantine nodes. If an attacker, pretends to be a real data holder and breaks down the security of system, such an attacker is called Byzantine attacker.

In another work, Zhu et al. [67] also presented a blockchain based privacy preserving method for securing updates and achieving consensus in federated learning.Here, blockchain technology is adopted to deal with Byzantine devices in the network. In particular, only updates are added in blockchain transaction records. Along with broadcasting digital signatures of a node, other information such as — hyper-parameters, difference in weights, and public ID’s are also broadcasted. The other participants of the network validate the broadcasted transactions as per their local datasets. If majority of the participants approve that the performance score of the updated model is greater than the existing models then updates are added to the model.
Similarly, Doku et al. [63] also integrated blockchain technology and federated learning to improve the quality of data. Here, the hash of mobile device data is stored on blockchain whereas data still remains on the user’s device, and only the locally analyzed results will be shared with ML practitioners via a secure network. In addition to this, incentives will be provided to data owners.

使用区块链来加强联邦学习的安全性

Additionally, in order to enhance the security of federated learning, the authors of [71] proposed a framework based on blockchain in order to verify and exchange local learning models. This scheme aims to activate on-device ML involving any centralized server. A reward mechanism is also proposed for user and miner node participation. Additionally, authors have evaluated end-to-end average learning completion latency.
In a closely related work, authors of [72] proposed federated learning with multi-access edge computing and blockchain technology. Here, edge devices are employed to provide resources to mobile devices and also to act as blockchain nodes. Here, a separate channel is dedicated for learning of every global model in the blockchain network.
- Unfortunately, in this proposal user devices are dependent on the integrity of corresponding edge nodes for sending transactions to blockchain networks. Additionally, no reward mechanism for user and miner nodes is designed by authors.

保护ML过程的数据安全

Also, authors of [81] leveraged suite of ML to support data exchange on the blockchain via smart contract for a distributed data vending architecture. Particularly, data embedding and distance metric learning approaches of ML research are used to enable retrieval of smart con- tracts without affecting the integrity of private data. Here, the signature of data entry is generated using data embedding procedure with privacy preservation, and further signatures are taken to measure similarity among data entries.
In an alternative work, the authors of [64] also proposed blockchain based model named ‘‘secureSVM’’ for privacy preserved sharing of data while training ML algorithms. Here, IoT data generator encrypts data on the local device by their private key, and this encrypted data is stored on blockchain. The experimental result proves that incorporating blockchain with SVM classifier improves the accuracy of the system model.

上面这两个案例没有看懂！！！

5.1.5 ML数据上的加密安全（Cryptographic security on ML data）

使用区块链来保护ML使用的数据的安全访问。

但是，我怎么感觉这一章节的内容在上一个章节中已经提过了呢，emmm

使用基于区块链的访问控制管理器实时安全地访问存储在不同地方的数据

是不是可以理解为 ML使用数据的安全访问控制???

Classification of IoT data with black-box concept, questions the type of data being collected. Hence, the system needs to attain con- fidentiality, integrity, anonymity, and secure access on data. Authors of [60] have used blockchain in retraining stacked denoising autoen- coder (SDA) algorithm for arrhythmia classification. Retraining is used to solve non-stationary nature of ECG data because it enables deep net- work in learning any new distribution at specific time intervals whereas SDA has the feature of taking different relevant features from data samples. Here, patient data stored on external storage that is collected by retraining SDA algorithm are securely accessed using blockchain based access control manager in real-time. A scenario of blockchain based secure access control on ML data has been represented in Fig. 12.

这个是研究区块链在CNN网络结构中的作用吗？看不太懂

More recently, Goel et al. [68] experimentally investigated the role of blockchain in providing authenticity to each block of Convolutional Neural Network (CNN) model. In CNN, each convolution layer is referred to as a block and the authors pinpointed the accountability of each block for correct output. To this end, blocks of CNN are kept in random order and neighbor blocks have the information regarding the next legitimate block. Indeed, hiding the architecture of the network from attacker, mitigates the threat of white box adversarial attack. Also, this scheme enhances transparency between blocks and the entire network. Unfortunately, the complexity of the system is quite high.
使用区块链来为ML提供匿名性

Another potential application of blockchain for ML is in providing anonymity. As discussed earlier, if the data is stored anonymously, it is hard to link the true identity of the person. Authors of [82] pointed that the facility of pseudo anonymity provided by blockchain can encourage the use of ML on anonymous dataset. Researchers can now use massive datasets for their research in order to improve the prediction results of healthcare system. However, along with anonymity, encrypting data could enhance security of the system. To address this challenge, homo- morphic encryption was introduced that has the ability to execute ML operations on encrypted data [83].

本章内容的总结

Summary and insights Section 4.1 focuses on various applications of blockchain technology targeting ML areas for IoT environment. Incorporating blockchain technology in ML provides reliable sharing of data for different tasks of ML including prediction, forecasting, voice, and speech recognization to name a few. However, we have made several observations after reviewing and tabulating the literature. Clearly, with trustless ML contracts, trustless rewards can be provided to the best ML model.（去信任的奖励） However, there are some risks in the proposal that need to be deal with. For example, the organizer may deny to reveal the testing dataset which may stop submitters for their work as no evaluation function would be available then. Moreover, based on selection criteria, the reward money can be claimed by the first submitter fulfilling evaluation criteria. Hence, the reward mechanism can be evenly distributed in order to incentivize more participation. Also, it has been observed that nodes are still reluctant to host data on IPFS blockchain data storage. So, future work should consider these problems before designing revised trustless ML contracts.

Additionally, it has been observed that most of the proposals have leveraged public blockchain which makes data generation speed slow. Hence, a fast data stream situation in blockchain is another important topic of research. Moreover, it has been observed that researchers have not considered the confidentiality of ML data in the blockchain network.

我有个问题，如果没有保护区块链网络中的ML数据保密性，那上面介绍的数据隐私保护方法是什么呢？搞不懂，emm，另外，感觉本文作者总结的并不是很好呀，emmm

5.2 ML for blockchain

For blockchain, ML can solve issues of uncertain and complex features.

作者认为：在区块链环境中，IoT传感器中产生数据。这些数据可以使用ML算法进行分析。

In a blockchain environment, the data gathered from IoT sensors can be analyzed and monitored at multiple points by ML models for efficient decision-making [98].
有文献提出「blockchain thinking」The main aim is to utilize the frame- work of blockchain for initiating thinking machines.

Following an emerging trend rendered by the adoption of ML in blockchain, Swan [99] introduced a new term called ‘‘blockchain thinking’’ that enables accommodating thinking on blockchain network. The main aim is to utilize the frame- work of blockchain for initiating thinking machines. In such a type of framework, input involves sensor data. Further, the input data is processed at a specific location to generate output that includes storing information to memory or taking a specific action. This process involves ‘‘personal thinking chains’’ that signify backup of full human mind files.
- 为了实现区块链思考，可以结合IPFS技术（但是，为什么放在第一段呢？）
  
  To implement the blockchain thinking process IPFS could be relevant as it eases P2P file serving system [100]. Notably, the research work of ML is entirely data-driven. This data can be shared via a central resource or a distributed file system. Using central repository will be inefficient with the increase in the number of users. On the other hand, IPFS is a distributed file system to store data files in a decentralized manner. Also, each file in an IPFS is assigned a unique fingerprint called cryptographic hash. IPFS will disseminate data files with a list of trusted nodes and the data will be available to other users using content identifiers.

将ML用于区块链的相关研究比较如下6、7、8、9表

深度强化学习和区块链的相关比较，在资源管理和计算卸载领域，基于交易

相关的ML模型比较（价格预测）

基于区块链和联邦学习的价格预测的比较（基于交易）

5.2.1 Resource management and computational offloading

本章的主要背景：在IoT系统中遇到了一些资源浪费等问题，为了解决这个挑战，一些灵活的资源管理框架将blockchain和ML结合在一起。

Resource management is the process of scheduling and allocating resources in order to maximize efficiency of the IoT system. Energy consumption, transparency, operational expenditure, request scheduling, latency, content caching, and security are some of the issues involved in the realization of resource management process [101]. To address this challenge, few secure and flexible resource management framework has been developed in literature by integrating blockchain and ML.

引出深度强化学习

A blockchain based platform possesses the capability to store all records of transactions related to resource management in a distributed and transparent data structures. However, to increase the efficiency of the network, ML models can be experimented with blockchain. In particular, deep reinforcement learning (DRL) has been extensively used with blockchain to achieve resource management tasks. DRL technique has the capability to handle dynamic and large dimensional features of IoT. The main concept behind DRL is that similar to a biological agent, an artificial agent may learn from interaction with its surroundings to take further decisions. By interacting with the environment, the agent gathers experience to optimize objectives served in the form of cumu- lative rewards.

For example, authors in [86] have used DRL method for maximizing transactional throughput of the blockchain network. In particular, DRL selects block producers, block size, and block interval to adjust the dynamic features of the Internet of Vehicles (IoV) scenario
Also, in order to achieve resource management for tasks such as — content caching, computation offloading, spectrum sharing, etc., the authors in [85] have utilized DRL. Specifically, this scheme has utilized DRL for the Device-to-Device (D2D) caching scheme that matches the caching supply and demand pairs to maximizes the network utilities of consortium blockchain enabled framework. Notably, DRL based caching scheme optimizes bandwidth between caching requester and provider. It has been demonstrated in the results that cumulative average system utility has been improved. However, this proposal has not discussed the mining procedure.
Meanwhile, when embedded with smart contracts, ML helps to minimize the energy expense in cloud data centers (DC’s) as discussed by the authors of [84]. Here, the smart contract facility of blockchain migrates the requests and virtual machines to the cloud DC’s with minimum load, and RL method based request migration is used for energy cost minimization as this method does not require any prior knowledge. Fig. 14 represents the blockchain and ML empowered resource management scenario for smart grid networks. Here, all com- putation intensive tasks including caching, billing, demand-response management, etc. are implemented at edge layer of the network due to resource constraints. Notably, learning capable ML agents employed on edge devices are responsible for implementing effective caching, computational offloading, scheduling, and real-time decisions on the edge devices. Moreover, mobile base stations used to transfer data to edge devices also have ML models running on them for scheduling computational or storage requests.

ML对区块链的另一个应用是在移动区块链网络中的卸载（指的是：移动设备的计算能力有限，）

Another perspective application of ML for blockchain is in offloading approaches for the mobile blockchain networks. With the introduction of mobile technology, the blockchain network can now be easily used with mobile devices so that more flexible blockchain applications for IoT can be developed. However, with mobile systems, resource-constrained IoT devices face difficulty while mining blocks. In this context, mobile edge computing facilitates high computational tasks for mobile devices. However, there is a challenge of effectively allocating available edge computing resources to miners. Mobile de- vices can offload their high computational tasks to the assigned mobile edge/cloud server. With a motive to enhance the performance of the system, literature contains multiple offloading approaches.

For example, convex optimization model, and game theory approaches has been used by authors of [119–123] that minimizes task execution latency. Nevertheless, these methods fail for highly complex online models and also they demands prior knowledge about the system. To solve this issue, RL can be used where a learning agent is employed to derive an optimal solution for computational offloading via trail-and-error method. Moreover, this solution does not require prior system statistics knowledge.
However, for high dimensional computational offloading challenges, RL solution also gets fail due to high dimensions of state and action space as pointed by work in [124,125]. To deal with high dimensional data, the use of DRL is beneficial and some literature work has demonstrated the scalability and offloading efficiency of DRL in blockchain based edge computing applications. DRL can achieve an optimal offloading strategy based on past experiences of offloading. Both of the proposals in [87,88] were designed to preserve users’ privacy and to achieve security as an optimization problem. By using DRL method, performance metrics including computational latency, energy consumed, and privacy level were analyzed proving feasibility of the proposed scheme with reduced offloading latency and minimum energy consumption
上述的样例只是避免了在挖矿过程中的计算卸载

The above-discussed offloading approaches are designed only for mining tasks whereas data processing tasks are ignored. In contrast, the work in [89] has discussed computational offloading for both mining and data processing tasks combining DRL and genetic algorithms. Additionally, Markov decision process has been used to handle the dynamic environment.
However, to implement DRL method for offloading decisions, the major challenge is to achieve convergence and accuracy of deep NN. Also, there is a need to develop effective resource allocation on mobile blockchain. To address this challenge, authors of [102] designed a multilayer NN supported auction mechanism for re- source allocation in mobile edge computing environment. The auction mechanism assures that edge resources are allocated to those miners who value resources the most. Simulation results demonstrated that the proposal converges quickly to a solution where the profit of the service provider is higher than the proposal provided by the authors of [126].
Recently, Asheralieva and Niyato [90] proposed a Bayesian RL and DL based approach to make interactions among miners in blockchain network with mobile edge computing. In particular, a game theory based approach is used by miners to offload its block operation to any of the base stations with mobile edge computing server.
In contrast, the authors in [103] have used federated learning to deal user equipment privacy issues as edge node transactions are mostly based on centralized approach. Federated learning builds ML models without centralizing the training data on a central server. Here federated learning facil- itates user equipment’s to train their data locally without exposing the data for optimizing system model. In contrast, blockchain and smart contract facility are used to secure transactions cross silo FL in B5G network.

5.2.2 ML被用来预测电子货币的价格（Predicting cryptocurrency price）

比特币的开放性为价格预测提供了机会

Bitcoin [127] introduced by Satoshi Nakamoto is the first world’s most popular cryptocurrency and is accepted by 111 countries world- wide. As a valuable cryptocurrency, Bitcoin provides an opportunity for price prediction because of its volatility and open nature [128].

比特币的价格波动引起了研究者们的兴趣

The price of Bitcoin was around $7202 in late 2019, compared to about $3468 in January 2019 [129]. Researchers and stakeholders of the financial sector are trying to figure out the reason for changing trends in the cryptography market. Similar to stock market prediction, Bitcoin price prediction can be represented as a model for time series prediction.

由于缺乏季节性和比特币区块链网络的高波动性，这些传统的时间序列模型不适合比特币的价格预测。

However, conventional time series approaches are based on linear assumptions and are effective in the case of seasonal and noisy data [130]. The absence of seasonality and the high volatile nature of Bitcoin blockchain network makes these traditional time series models unsuitable for Bitcoin price prediction. Nevertheless, for time-series prediction of uncertain data, some non-linear methods such as — ANN, Bayesian Neural Network (BNN), and SVM have gathered interest from researchers. Generally, ML based price prediction models have been evaluated on the following evaluation metrics:

Relatively few studies have been conducted on estimating time-series of Bitcoin price using ML model. In this context, to deal with uncertain and non-linear data, DL has been proved to be an effective solution.

For example, for the first time, the authors of [96] used DL for price prediction of cryptocurrency. Other than Bitcoin, DL tech- niques are applied to predict the price of Ethereum, Ripple, and digital cash cryptocurrency. For result analysis, the Long short-term memory (LSTM) model is compared with the generalized regression neural network model (GRNN). LSTM is a subtype of recurrent neural network (RNN) and it is designed to deal with long-term dependency problems. LSTM follows recurrent topology whereas GRNN has a parallel and memory based system and attains fast learning with a large sample size. However, the prediction results of LSTM are better over GRNN for RMSE. Rather than just presenting a predictive model, the authors have also conducted a chaotic time series analysis.
Similarly, Mcnally et al. [91] predict Bitcoin blockchain price using both LSTM, and RNN methods reporting price prediction accuracy of LSTM to be better than RNN. Here, both NN models, i.e., RNN and LSTM are experimented with two hidden layers having 20 nodes per layer. The dataset used for train- ing purposes is considered from Aug. 19, 2013 to July 19, 2016. The result section proves that RNN, LSTM, and Autoregressive Integrated Moving Average (ARIMA) all have almost similar accuracy, i.e., 50.25, 52.78, and 50.05 respectively. ARIMA model, however, implements time series data having linear nature. As Bitcoin data is volatile in nature, so ARIMA cannot generate accurate results as compared to RNN and LSTM. Here, the DL models are trained with only considering Bitcoin price index.
Likewise, the authors of [131] demonstrated the impact of LSTM for Bitcoin price prediction by opting for 10 neurons in the hidden layer.
In contrast, Jang and Lee [92] conducted a study on predicting Bitcoin price by using a BNN. BNN is based on the Bayesian theory for neural networks. BNN’s have applications in various fields such as — pattern recognization, Natural Language Processing (NLP), image recognization, traffic flow prediction, etc. [132]. Similar to a model of Multilayer Perceptron (MLP), a BNN consists of an input layer, an output layer, and one or multiple hidden layers. While training model, backpropagation method updates the weight of neurons at each layer with current error propagated backward by output layer to the previous layer. In addition to backpropagation method, delta rule is used to minimize the sum of errors. By utilizing the backpropagation method, BNN can handle exclusive OR (XOR). Also, the regularization term of BNN prevents overfitting problems in training data.

先前的文章专注于分析区块链价格，而忽略了区块链变量的非线性关系。

Notably, previous literature work has focused on analyzing Bitcoin prices **without taking into account its non-linear relation with blockchain variables.**Further, the authors of [92] have concluded that an ML model only trained with the Bitcoin price index results in poor predictive performance. (在平时的科研中，如果遇到没有找到引用文献的案例，或者是了更好的论断效果，可以考虑通过展示实验效果的方式来论证)

Differently, Barro’s Bitcoin pricing model [133] has been considered by authors for empirical study. In this proposal, the blockchain variables such as — average block size, transactions per block, median confirmation time, hash rate, difficulty, miners revenue, and the number of confirmed transactions are used for training of model that analyzes Bitcoin price by using BNN’s and results are compared with those obtained using Support Vector Regression (SVR) and linear regression model. It is observed that both training and testing phases show poor performance with SVR model. Notably, rather than training model with only Bitcoin price index, BNN considers non- linear effect of blockchain information and other macro economical factors affecting the price of Bitcoin whereas regression model can only handle linear relationships. Although as an advantage, the feature ex- traction procedure of regression model removes incorrect values which results in better prediction model.
Similarly, Madan et al. [93] chose 26 features related to the Bitcoin network along with daily Bitcoin prices. Some of these features include average confirmation time, block size, difficulty, estimated transaction volume, and number of transactions, etc. To predict the Bitcoin price, authors have leveraged SVM, random forest, and binomial generalized linear model (GLM) algorithm and achieved prediction accuracy of around 97% without cross-validating that however limits the generalizability of results. Results demonstrate that the random forest algorithm performs best as it is based on the non-parametric decision tree. However, the precision value for random forest is lower than that of binomial GLM as it also possesses the ca- pability to solve linearization problems for Bitcoin dataset.
In addition, Greaves and AU [94] developed another Bitcoin price prediction model by leveraging SVM and ANN and conclude that accuracy with ANN is best, i.e., 55%. Authors have used historical time delta of 1 h, 1 day, 1 weak, and 1 month to develop features for supervised learning. Total Bitcoin passing through, net Bitcoin flow, number of transactions, and closeness centrality are the collected features for predicting price. Simultaneously, they concluded that net Bitcoin flow, and number of transactions are the most informative Bitcoin features.
Another effort to analyze features that highly relate with Bitcoin price change is carried out in [95] by using linear regression, random forest and gradient de- scent models. Here, authors have taken features from the dataset such as — number of wallets, unspent transaction output, block size, and some others. The performance result of the proposal has been evaluated using RMSE and MAE.
Likewise, Velankar et al. [134] predicted Bitcoin price using Bayesian regression, and random forest method. Block size, total Bitcoins, day high, day low, number of transactions, and trade volume are the set of selected parameters to be fed to the predictive network.
On the same line of thought, Mangal et al. [97] experimented with logistic regression, SVM, ARIMA, and RNN and concluded that RNN has the most accuracy among all.

作者解释了一下，为啥要在本文中讨论加密货币价格预测。

Notably the existing literature on cryptocurrency price prediction is not designed for the IoT environment. However, applications of IoT network includes payment transactions to be made between nodes. In a blockchain based IoT network, payments are realized with digital cryptocurrency and hence the discussed studies on cryptocurrency price prediction could be applied to IoT networks as well

5.2.3 将ML用于区块链上的异常检测/攻击预防

区块链中可能会遇到51攻击和双花攻击等

With the popularity of blockchain, the risk of security issues such as 51% attack (majority attack), double spending attack, etc. also increases as discussed in [135,136].

对这两种攻击的解释

Due to propagation delay in blockchain network, the double spending attack might happen when a participant tries to indulge in more than one transaction with the same number of cryptocoins. On the other hand, the majority attack happens when more than 50% of the network’s participants control the network and conspire to take control over the ledger.

ML可以用于区块链的什么内容？

Moreover, the open nature and public design of Bitcoin system allow any user to be a participant. The goal of ML models is to learn insights, outliers, classify, and detect patterns in large data repository, so it can be used for blockchain attack detection.

Moreover, with blockchain technology, ML algorithms can train, learn, and can take decisions on local system in a decentralized network. Hence, processing data locally can prevent security and pri- vacy issues to some extent. Various authors have used ML models for anomaly detection in blockchain networks. Both supervised and unsupervised ML algorithms have been employed to design intrusion detection and prevention system. To detect isolate malicious in the network, various ML models are utilized by literature studies such as — SVM, k-means clustering, etc.

For example, Dey [110] has discussed the issue of majority attack in the blockchain network. Specifically, the majority attack is a concern in consortium blockchain (e.g., Hyperledger) as it involves business parties collaboration.
To solve the problem faced by majority attack, authors of [110] have proposed an approach based on supervised ML model and algorithmic game theory. Supervised ML algorithms are leveraged to classify whether the attack will take place or not. However, this work is still in progress, and simulation results or any proof have not been demonstrated by the authors.
In contrast to the supervised ML approach, another effort for detecting anomalies in Bitcoin network is made by Pham and Lee [112] using 3 unsupervised ML methods that include k-means clustering, Mahalanobis distance based method, and SVM (on two Bitcoin trans- action graphs). The dataset used for training includes 6,336769 users with 37,450461 transactions and 12 features (including in-degree, out- degree, average in-transactions, balance, etc.) are extracted.
On the other hand, the same authors in their research in [112] use laws of power degree and local outlier factors on the two graphs produced by Bitcoin network to detect anomalies.
In a closely related work, authors of [114] proposed an unsupervised statistical ML approach to detect anomalies on blockchain based sensor data belonging to condition management of the industrial asset.
Following a trend rendered by the adoption of unsupervised ML for anomaly detection, authors of [108] used trimmed k-means clustering for cybercrime detection in Bitcoin network. Compared to other approaches on fraud detection, k-means clustering provides better results in terms of detection rate.
Similarly, Scicchitano et al. [137] proposed an anomaly detection system using an unsupervised encoder decoder DL model which is trained with aggregated information extracted by analyzing blockchain network activities.

阻止使用比特币来进行非法交易：人口贩卖、买卖毒品

Besides, in order to prevent human trafficking and drug sale involving Bitcoin, Portnoff et al. [138] proposed another ML based classifier that categories ads by the person paid for the ads. The ML classifier uti- lizes stylometry that takes two ads as input and differentiates whether the ads are published by the same or different users. The flowchart for ML based anomaly detection in blockchain network is presented in Fig. 15. Firstly, the IoT data provider collects the data from IoT sensors and sends it for the data preparation phase which involves data preprocessing (transforming the dataset into machine readable format) and feature extraction. Further, the data analysis phase is carried which involves training data with selected ML algorithms. Here, the weights and biases are adjusted in order to get more correct predictions. Finally, the trained model is tested against never seen dataset for anomaly detection.

使用分片的这个案例并不能用来举例

Differently, in the research [63], authors have leveraged the concept of sharding to solve scalability issues. While implementing the concept of sharding, the blockchain network is divided into interest groups and each group has its own ledger to verify transactions. Dividing the network improves network efficiency by empowering parallelism. Proof of Common Interest consensus algorithm is used to validate data that is directed to the relevant interest group. The proposal mitigates DDoS, MITM, and data leakage attacks.

Notably, an online ML security system that detects abnormal clients in the network appears to be a topic that is understudied. To this end, Bogner [113] proposed an online unsupervised ML method for fraud detection that is optimized for interoperability. Different from other approaches, research of Bonger involves visualization techniques along with an interactive querying system meant for manual expert analysis. The proposal is evaluated using public Ethereum blockchain network.

智能合约中可能存在漏洞

On the other hand, authors in [115] focused on the security of Ethereum smart contracts. As smart contracts are open in nature, any vulnerability present in the contract is visible to anybody on the network. For example, the decentralized autonomous organization (DAO) is a smart contract and due to some vulnerabilities in its code, it was hacked losing $150 million [139].

Here, in [115], authors have utilized CNN model for automatic feature extraction along with learning and detecting compiler bugs in smart contracts. They translated the byte- code of Solidity in RGB color code which is further transformed into a fixed size encoded image. Next, the encoded image is fed to CNN for detecting bugs.
In the similar direction, Tann et al. [116] utilized the LSTM model to detect new attack trends for the smart contracts. LSTM performs a two-class classification and reduces detection loss function to maximize classification accuracy, and to detect security threats in smart contracts. Authors have leveraged the fact that smart contracts are sequential in nature, so, they can be easily used to update the LSTM model for future contracts.

5.2.4 降低网络的匿名性

Another potential application of ML for blockchain is to reduce the anonymity of the network. Notably, blockchain network is assumed to attain a high degree of anonymity as in blockchain each participant is referred by its public key address. However, the authors of [140] claim that it is possible to cluster Bitcoin addresses and map them to real-word identity.

In the same context, Harlev et al. [109] conducted a study to probe the true depth of participant’s anonymity using a supervised ML approach. Firstly, the addresses are clustered where they predict the category of yet unidentified Bitcoin addresses based on how addresses are controlled by a single entity using behavioral intelligence- based clustering and co-spend clustering. Next, the identified clustered are categorized into one of the predefined categories, i.e., exchange, gambling, hosted wallet, merchant services, mining pool, mixing, ran- somware, and scam, etc. The primary dataset used for simulation includes transactional data which has details about each transaction. Here, seven different ML algorithms are used to analyze the transac- tional data involving k-nearest neighbor, random forests, extra trees, Adaboost, decision trees, gradient boosting, and bagging classifier. The result section concludes that the gradient boosting method performs best among all.
In contrast, Jourdan et al. [107] experimentally obtain lower value for parameter F1-score by using gradient boosting method. Also, their methodology involves a complex step for hyper-parameter optimization.
In a closely related work, authors of [141] et al. presented a method to break the Bitcoin anonymity concept via entity charac- terization. Here, the cascade of classifiers is used which first involves entity classification using address and motif’s and next step uses this output for input of next classification step. Experiments are conducted and compared using Adaboost, Random forest, and gradient boost models. However, there is a disadvantage that this approach is not able to characterize entities with normal user behavior. The proposal is although able to detect six entity classes, i.e., Exchange, Gambling, Market, Mining Pool, Mixer, and Service.

The general procedure of entity characterization process of Bitcoin is represented in Fig. 16.

5.2.5 区块链相关数据的分类

Classification of data is very important for decision-making tasks[142]. Popular classification algorithms include k-nearest neighbor based methods, decision tree methods, NN based networks, multivariate discriminant analysis method, and SVM method [143]. ML classification methodology has been used with blockchain for data quality and transaction entity classification which is discussed as follows:

Blockchain data quality classification: With the increase in the revolution of IoT technology, the usage of health specific ap- plications such as — smart bands, smart watches, etc. has also been increased. However, the presence of malicious nodes can sometimes lead to slow degradation of the system. This personal health data is secured with blockchain network by many re- searchers [144]. Moreover, to check the validity of continuous and dynamic generated data by sensors, authors of [117] have used ML techniques. Apart from previous roles in blockchain, another role named data validator has been introduced who is responsible for validating and certifying the quality of data generated from sensors. Here, the quality classifier for health data classified the input data with predefined features and removes meaningless data and noise. As an example, take the case of smart watch readings for 24 h. The data validation algorithm can differentiate sleep related data from other workout data. However, the predefined rules depending on the choice of owner decide whether to classify sleep related data as high quality or noise. Fig. 17 presents an illustration of ML based blockchain data quality validation process in a healthcare network. The serving data is validated before it is piped to the blockchain network. The data analyzer is responsible for computing predefined set of statistics sufficient to define the data. ML agent works at data val- idator module that trains the system using schema and constrains API’s. The system can also classify the data into categories using ML classifiers. In another work, authors of [145] utilized ML to analyze data of a blockchain based credit card scoring system. The blockchain based transaction data is sent to ML agent that extracts features from data and then applies binary classification model to categorize customers that would not be able to the requisite amount in a destined amount of time. These classification results are next considered by the bank to decide whether credit request is to be initiated for a particular customer or not. Another matter to be considered in the context of centralization is trading IoT generated data with a TTP. To this end, the authors of [106] presented a data trading model utilizing blockchain, smart contract, and similarity learning. Here, arbitration insti- tution having responsibility of maintaining smart contract uses ML services to solve any dispute over availability of data for data purchasers utilizing classification and clustering data solu- tions. In particular, similarity learning (distance metric learning) is adopted to validate the distance between features of actual data and declared data. Distance metric learning has been used extensively for classification and clustering problems.
Classification of blockchain peers/ transaction entities: Public blockchains are open and can be joined by anybody in the net- work. In such a case, there is a possibility that some participants may misbehave for personal interest while the majority of the par- ticipants behave legally. Clearly, it is hard to study the behavior of participants manually. To address this problem, the authors of [118] present an approach to classify behavior patterns of participants into predefined categories by using LSTM based DL approach. The transaction amount is extracted as a feature to clas- sify participants. Based on the transaction amount, participants are classified into three categories, i.e., stable transaction history, medium jitter history, and high jitter transaction history. With a similar motive, the authors of [104] classified the entities of transaction in 4 categories, i.e., exchange, service, gambling, and mining pool. Here, gradient boosted decision tree algorithm with a Gaussian process based optimization is used as a classification method. The results concluded that the accuracy in the classifica- tion of categories exchange, gambling, and service categorization is high as compared to the mining pool category. Additionally, authors of [105] presented a supervised ML approach in order to classify entities of transactions engaged in cybercriminal activity. To train the classification model 854 categorical observations with 12 classes and 10000 non-categorical identifiers are con- sidered. It is concluded in results that random forest, extremely randomized forest, bagging, and gradient boosting are the best four classifiers.

本一章节作者的总结

Summary and insights Section 4.2 reviews various ML applications for blockchain networks such as — to optimize resource allocation, to im- prove cryptocurrency price prediction, to detect anomalies of the network, and to classify blockchain related data. The increasing storage size on blockchain demands more resources. With data sharding and pruning solutions, ML can help blockchain networks in taking better decisions for data stor- age. Also, ML in blockchain networks can identify malicious activities by developing and training ML models. However, it has been observed that learning based examination of the blockchain systems has not been exploited much in literature work. Moreover, for protecting wallet privacy, the applications of clustering techniques that are proposed to address a broad range of blockchain applications have not utilized any string search mechanism such as — bloom filter. By using the string search mechanisms, storage complexity and searching time complexity for various validation and verification operations can be reduced significantly. Also, the proposed clustering pro- posals can be extended to increase the relatively low sample size of clusters along with adding more cluster categories to differentiate effectively between clusters.

六、ML+BC+IoT的挑战

While the previous sections have presented a study on the integration of blockchain and ML, this section discusses challenges that need to be considered for future research

Confidentiality is still not fully preserved with blockchain as any node can trace transactions and it is observed that only a few re- search studies have focused on the lack of confidentiality feature of blockchain for ML data. Moreover, blockchain standards and regulations are yet to be finalized.
Here, it is worth mentioning the problem of data storage as nodes of the blockchain network keep the copy of every transaction of the network. This increasing database size could be difficult to handle in future. Hence, the issue of scalability for blockchain platforms should be focused to popularize the applications of blockchain for ML. As a solution, the usage of emerging mecha- nisms such as — sidechains or childchains should be encouraged in research. Moreover, PoW computations can prove costly in terms of resource utilization and transaction time. So, models should be developed that do not consume unnecessary compu- tational power. Also, the existing ML models demand creation of custom datasets having specific variables. Moreover, these are not able to satisfy the various service requirements of complex networks. Hence, it is challenging to scale for development of ML models with the ever increasing IoT data.
It is also observed from the tabulated comparison of available literature that most of the research work is based on permissioned blockchain. However, the vulnerability of 51% attack is easy to launch in permissioned blockchain to which none of the studies have considered. Also, the use of permissioned blockchain limits the access of an enormous amount of data that can be required for ML system to process accurate decision-making. To address this problem, blockchain platforms and IoT resources should be equipped with a Trusted Execution Environment (TEE) [146].
Federated learning adopted by many researchers faces the issue of communication bandwidth. Undoubtedly, the mobile device has enough computing resources in order to implement federated learning. Unfortunately, the bandwidth of wireless communi- cation is not adequate. So, research has been shifting gradu- ally towards computational resources to wireless communication Methods such as — deep gradient compression should be used to decrease the communication bandwidth [147].
Also, in public blockchain data is publicly available and accessible for all readers which is indeed a privacy concern. However, using private blockchain can limit the exposure of large amount of data which is obviously necessary for ML model to perform accurate decision-making. Along with privacy, security is another concerning factor as this technology suffers attacks in the applica- tion layer. Also, the consensus mechanisms can be compromised depending on the hashing power of the miner. Nevertheless, ML algorithms provide detection of various attacks in blockchain networks but still, there are challenges for using ML algorithms in detecting malicious threats. For example, for a large dataset having malicious data the security solution for detecting ma- licious behavior has to deal with high dimensionality of data for pre-processing. In such case, ML model has to first perform dimensionality reduction step. Moreover, it is impossible to train a ML model with a large dataset in real-time so it is challenging to detect online attacks in dynamic networks.
5G and B5G are an example of a heterogeneous network designed for a wide range of IoT devices. The enormous amount of data generated from these devices can put heavy weight on ML model for decision making leading to limited performance. In this con- text, blockchain can solve the security issues to some extent but network performance at issue will still be a problem.

七、本文作者的总结

In this paper, we reviewed the current state-of-art related to the collaboration of ML and blockchain. We presented an overview of blockchain technology and how this decentralized technology can solve the privacy issues related to ML. Moreover, we provide an overview of ML technology and discussed key applications, applicability of blockchain features for ML. The literature review shows that blockchain and ML collaborated applications are still in infancy and there are many research challenges that need to be addressed. However, the current research is a foundation for an interdisciplinary perspective. In the future, we will implement one of these techniques in future IoT applications to check its performance with respect to other applications using various performance evaluation metrics.

本文作者的声明

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

八、我的总结

区块链和分布式机器学习在结合上能产生巨大的好处，是一个可以作为博士课题的研究方向。总的来说，本文是一篇BC+ML的综述性文章，这篇文章的总结性上我觉得并不是很好，主要的侧重点在目前发表的文章所研究的内容上，我觉得emmm，并不是很好，读下来并没有很舒服的感觉。我觉得这与该领域的文章太少有一定的关系。
欢迎大家加我的微信（在csdn主页），🤠 卡一波好友位，如果有相同的研究偏好，咱们可以相互交流。

参考

文章信息

网址：https://www.sciencedirect.com/science/article/abs/pii/S0140366421002632

封面信息

网址：https://www.gracg.com/works/view/1495553
作者：刘翔ART http://gracg.com/user/user93912SfTIdM

【阅读笔记】Blockchain management and ML adaptation for IoT environment in 5G and beyond ...