Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Abstract—Electrical load profiling supports retailers in identi
fying consumer categories for customizing tariff design. However,
each retailer only has access to the data of the customers it serves.
Centralized joint clustering on retailers’ union load dataset either
enables the identification of more types of users that allows to
design more customized retail plans, or informs whether each
retailer already has a sufficiently broad customer base. However,
the centralized clustering requires access to the confidential
data of retailers. This may cause privacy issues among retail
ers, because retailers can not or do not want to share their
confidential information with others. To tackle this issue, we
propose a privacy-preserving distributed clustering framework
by developing a privacy-preserving accelerated average consen
sus (PP-AAC) algorithm. Using the proposed framework, we
modify several commonly used clustering methods, including k
means, fuzzy C-means, and Gaussian mixture model, to provide
privacy-preserving distributed clustering methods. In this way,
the clustering on retailers’ union dataset can be achieved only
by local calculations and information sharing between neigh
boring retailers without sacrificing privacy. The correctness,
privacy-preserving property, time-saving feature, and robustness
to random communication failures of the proposed methods are
verified using a real-world Irish residential dataset.
Index Terms—Load pattern recognition, residential load pro-
filing, clustering, privacy-preserving, distributed, consensus.
I. INTRODUCTION
A
MASSIVE number of fine-grained electricity consump
tion data are being collected by smart meters. Identifying
the load patterns from these smart meter data, i.e., residen
tial load profiling, supports retailers and distribution system
operators (DSO) in having a better understanding of the
consumption behavior of consumers.
Technically, residential load profiling aims to capture differ
ent types of customers and behaviors. Particularly for retailers,
knowing the types of consumers is important, as this is the
prerequisite to designing customized retail plans [1]. Retailers
can improve their commercial attractiveness by formulating
Manuscript received February 26, 2020; revised July 28, 2020; accepted
October 11, 2020. Date of publication October 14, 2020; date of current
version February 26, 2021. This work was supported by the National Natural
Science Foundation of China under Grant U1766206. Paper no. TSG-00285-
2020. (Corresponding author: Chen Shen.)
Mengshuo Jia and Chen Shen are with the State Key Laboratory
of Power Systems, Tsinghua University, Beijing 100084, China (e-mail:
Yi Wang and Gabriela Hug are with the Power Systems Laboratory, ETH
Zürich, 8092 Zürich, Switzerland.
Color versions of one or more of the figures in this article are available
online at https://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSG.2020.3031007
different competitive retail plans for different classes of con
sumers [2]. Therefore, knowing more and particularly more
diverse user types has strong practical significance for retailers.
However, each retailer only has access to the data of the con
sumers it serves (the so-called “horizontally partitioned mode”
of data [3]). Hence, it is very likely that a retailer is only aware
of a few consumer types. For example, emerging retailers who
have just joined the retail market might have little knowl
edge of user types; existing retailers who adopt fixed retail
strategies might only attract specified classes of consumers as
well. Joint clustering on the union dataset of multiple retailers
allows identifying all customer types in the joint dataset.
Overall, the amount of information provided by joint clus
tering constitutes an upper bound of what a retailer can obtain
using only its own dataset. Therefore, the practical signifi-
cance of realizing joint clustering is actually twofold: (1) joint
clustering either helps retailers identify more types of users,
or (2) tells retailers whether their own user types are diverse
enough to capture all the types in the union dataset. The for
mer could help retailers design more diversified retail plans
and the latter enables retailers to reach an informed decision
whether they should stop seeking the cooperation that aims to
identify more user types. Both are useful for retailers in terms
of practical purposes.
Nevertheless, joint clustering requires retailers to share data
with others. Because these data are confidential information,
retailers are prohibited to directly or indirectly share this
information. The former refers to directly sharing the raw
data, and the latter refers to sharing statistics information,
e.g., retailers’ numbers of consumers in any category.
Thus, a privacy-preserving distributed clustering scheme is
required, where retailers can possibly cooperate with oth
ers to jointly achieve the clustering results on their union
consumption dataset via local calculation and communica
tion. During the cooperation, the confidential information of
each retailer, e.g., the raw residential load data or the num
ber of consumers in a category, can not be deduced by
others. Note that in this article, we choose the term “pri
vacy” to represent the confidentiality of retailers. The specific
definition of “privacy” will be further given in the next
section.
So far, various clustering algorithms have been applied for
load profiling, such as hierarchical clustering using differ
ent linkages [4], CFSFDP [5], k-means [6], fuzzy C-means
algorithm (FCA) [7], Gaussian mixture model (GMM) [8],
self organizing map [9], etc. However, to the best of our
knowledge, there is no relevant research on privacy-preserving
distributed clustering for load profiling.
To bridge this gap, this article proposes a privacy-preserving
distributed clustering framework for load profiling. This
framework can be used to transform three commonly used
clustering methods, i.e., k-means, FCA, and GMM, into
distributed clustering algorithms for the purpose of privacy
preserving load profiling. There are four reasons why we
chose k-means, fuzzy C-means, and Gaussian mixture model
(GMM): (1) they are commonly used algorithms for elec
trical load clustering [10]–[12]; (2) they include both the
‘hard’ and ‘soft’ clustering methods, that is, k-means is a
‘hard’ clustering method that delivers deterministic cluster
ing results [13] whereas FCA and GMM are ‘soft’ clustering
methods that provide an extent or a probability measure to
describe the belonging of samples to clusters. Such methods
can be leveraged to evaluate overlapping clusters or uncertain
cluster memberships [14]; (3) they are distance-matrix-free
techniques, which means that they do not require a complete
communication network, i.e., communication links among
neighboring retailers are sufficient; (4) they have common
alities regarding their implementation, which will be further
discussed in Section II.