12

Application of cluster analysis for enhancing power consumption awareness in smart grids

Guido Coletta; Alfredo Vaccaro; Domenico Villacci; Ahmed F. Zobaa     University of Sannio, Benevento, Italy
Brunel University, Uxbridge, United Kingdom

Abstract

The conceptualization of computing paradigms aimed at converting the power demand data into actionable information, allowing the prosumer to have a full understanding of the available information, represents a timely and relevant issue to address in the context of future smart grids. In the light of this need, this chapter outlines the potential role of self-organizing models based on clustering analysis for classifying the load profiles, correlating them with the endogenous measured variables, and identifying irregularities in energy consumption. The benefits deriving from the application of the proposed framework on complex load patterns have been assessed by detailed experimental results obtained on a real case study.

Keywords

Load monitoring; Situational awareness; Smart grids computing; Data clustering; Data-driven techniques

1 Introduction

In recent years, transmission and distribution grids have been subjected to radical changes, which mainly affected their governance and operation criteria. The need for increasing the penetration of renewable power generators, the large-scale deployment of the liberalized electricity markets, and the difficulties in upgrading the grid infrastructure are some of the main challenges that should be faced by system operators. In this complex scenario, the role of grid users is drastically changing because they are evolving from passive customers, which were characterized by inelastic power demand profiles, to the so-called prosumers, which can play an active role by properly dispatching their generation units, modulating their power demand in response to price signals, and offering high-value services to grid operators. The increasing number of these new entities is expected to sensibly improve the efficiency and security of existing electrical grids by supporting their evolution to active, flexible, and self-healing systems composed of distributed and interactive resources. Thanks to these features, the development of new tools aimed at allowing the prosumers to optimally coordinate their consumption/production profiles in function of the grid state and the spot energy price has been recognized as one of the most important foundations of future smart grids.

In this context, recent studies reported in the power system literature have demonstrated that in a smart grid domain, the prosumers, if supported by accurate and reliable information about their actual energy consumption, can achieve electric energy savings up to 5–10%, which mainly derive from the detection of load anomalies and the power demand adaption to the electricity price dynamic. These advanced functions, which are usually referred as demand-side management (DSM) and demand response (DR) techniques, allow smart grid operators to interact with the prosumers in order to modify the load profiles during severe grid contingencies, avoiding the need for activating large-scale load shedding plans while also allowing the prosumers to dynamically change their demand profiles according to optimal energy sourcing policies. Despite these benefits, the adoption of these functions in existing power systems is still at its infancy, and several open problems need to be fixed in order to support their large-scale deployment. In particular, DSM and DR techniques frequently operate on data-rich but information-limited domains because the decisions that should be identified derive by the solution of complex optimization problems whose cost and constraints functions depend on a large quantity of uncorrelated and heterogeneous data (i.e., historical load profiles, spot prices, environmental variables, etc.). On the other hand, these decisions should be identified in computation times that should be fast enough for the information to be useful in a short-term operation horizon.

Hence, the development of computing paradigms aimed at converting the power demand data into actionable information, allowing the prosumer to have a full understanding of the available information, represent a timely and relevant issue to address. In the light of this need, a promising research direction is the development of smart techniques for load profile analysis, which allows prosumers to detect irregular demand patterns such as unusual energy usage, decreasing efficiency, erroneous regulations, and load malfunctioning. This strategic information could enhance the energy efficiency, reduce the supplying costs, and increase the situational awareness of prosumers about their actual power demand. These features are extremely useful in complex energy systems, characterized by large and spatially distributed loads, where the sensor data streaming could not allow a full understanding of the actual energy usage [1, 2].

To address this issue, the adoption of principal component analysis (PCA) and data clustering techniques have been recognized as the most promising enabling methodologies for load pattern classification. In particular, in [35] the authors proposed a methodology based on cluster analysis for classifying the proper class of homogenous users, designing the most effective tariff paradigm for optimal energy sourcing. In [6] a similar technique based on PCA and hierarchical-based clustering techniques has been proposed to solve the same problem. Also, Ozgonenel et al. [7] compare the performances of Euclidean and Mahalanobis distances in k-mean-based clustering in the task of classifying electricity customers for fault detection, and in Almutairi et al. [8] a repeated k-mean clustering methodology, which improves the ability in consumer classification, is presented.

According to these arguments, in this chapter the employment of self-organizing models based on clustering analysis for power consumption awareness in a smart grid domain is proposed. The main idea is to process the sensor data streaming for classifying the load profiles, correlating them with the endogenous measured variables, and identifying irregularities in energy consumption. The analysis of the statistical correlations between the power demand and the endogenous variables is performed by a fuzzy inference system, which computes the key performance indexes for each load profile class. These indexes are processed by an outlier detection technique, which identifies irregular load patterns originated from devices malfunctions, system faults or incorrect control settings. Experimental results obtained on a real case study are presented and discussed in order to emphasize the benefits deriving from the application of the proposed framework on complex load patterns.

2 Mathematical preliminaries

2.1 Elements of cluster analysis

The self-organizing models for load anomaly detection proposed in this chapter are based on a class of data-clustering techniques that aim to organize data in homogeneous clusters, characterized by high similarity degree. In this context, the main problem to solve is how to classify objects in homogeneous groups starting from multivariate observations, maximizing the similarity of the objects in the same cluster, and maximizing, at the same time, the difference between the different clusters.

2.1.1 Clustering techniques

Among the different approaches that can be adopted for cluster formation, the top-down and the bottom-up paradigms are the most frequently adopted. The first one assumes that the cluster is generated starting from a single group containing all objects, then dividing it in smaller and more homogeneous ones until a termination criteria is satisfied. On the contrary, bottom-up processes associate objects starting from the situation that all of them initially represent distinct clusters.

Another very common classification of data-clustering processes is between:

  •  Partitioning clustering
  •  Hierarchical clustering

Partitioning clustering techniques provide a division of data set into nonoverlapping sets. The cluster shapes are the product of predetermined criteria for group development and a cluster is evaluated by comparing each object with respect to a point representative of the cluster, namely the centroid. One of the most diffused partitioning clustering algorithms is represented by k-mean methodology. The mathematical formulation of the partitioning clustering can be expressed through the following definition:

Definition 2.1

(Partitioning clustering). Let X = {X1, X2, …, XN} a set of data. The partitioning clustering provides a subset C = {C1, C2, …, Ck} such that:

X=i=1kCiCiCj=forij

si1_e

Otherwise, hierarchical clustering is based on the inclusion of smaller clusters in bigger ones, hence allowing the presence of subclusters. The outcome of this algorithm is a hierarchy tree, namely the dendrogram, depicting correlations between different clusters levels. It can be noted that hierarchical clustering can be viewed as a sequence of partitioning clusterings, and that one can obtain the latter by considering one of the levels of the dendrogram. The objective function followed by clustering algorithms’ points to highlighting the local structure of the data and maximizing the cluster dissimilarity.

A further categorization concerns the possibility that an object belongs exclusively or not to a specific cluster. In the light of this need, three possibilities can be distinguished, leading to different clustering types:

  •  Exclusive clustering
  •  Overlapping (or nonexclusive) clustering
  •  Fuzzy clustering

It is worth observing that, in the first category, objects belong exclusively to a specific cluster while, in the second one, objects can belong simultaneously to more clusters. In many situations, instead, belonging to a cluster is not true, but there is a probability of belonging to different clusters. In this field of application, fuzzy or probabilistic clustering algorithms are very useful, allowing us to address these problems by considering each object belonging to each cluster with a membership degree between 0 and 1, subject to the constraint that the sum of them for each object must be 1. In a probabilistic sense, that is equal to saying that the sum of probabilities for each object must be 1.

The last classification is between

  •  Partial clustering
  •  Complete clustering

A complete clustering assigns all objects to clusters while a partial one does not, allowing the addressing of data issues, that is, outliers, noise or out-of-interest data, among all.

2.1.2 Cluster classification

After giving a classification of the clustering algorithms, let us provide now a classification of cluster types. It is possible to distinguish among five cluster typologies (i.e., well-separated, prototype-based, graph-based, density-based, and conceptual clusters).

  • Well-separated is characterized by the propriety that the distance between points in different clusters is greater than the distance between any two points in the same cluster.
  • Prototype-based is characterized by the propriety that the distance between each element in the cluster and the relative prototype is minimal with respect to the protothype of the other clusters. The prototype of the cluster is often a centroid (for continuous attributes data) or a medoid (for categorical attributes data).
  • Graph-based is useful when data are represented by graphs. In such case a cluster is defined by a connected component, that is, a group of objects interconnected with each other, but that present has no connection with elements outside.
  • Density-based is defined by a high-density data region surrounded by a low-density data region.
  • Conceptual clusters is the most general cluster definition and includes all the previous. In particular, a cluster, in such a point of view, is simply defined as a group of objects that shares some proprieties. For example, a prototype-based cluster can be seen as the group of objects that shares the propriety to be the closest to the centroid.

2.1.3 k-Means

The k-mean is an exclusive partitioning clustering technique based on prototype-based clusters and represents one of the simplest and popular approaches to cluster analysis. The k-mean technique allows grouping a data set into k predetermined clusters. Let us show the basic k-mean algorithm: the first step is to fix an initial set of k centroids and to assign each object of the dataset to the closest centroid. Then the centroids are updated by considering the objects of their relative clusters. The procedure repeats until the centroid does not change anymore. So, the algorithm can be summarized as follows:

  • 1 Initialization of a number of cluster k and an initial centroid set C0;
    2 whilecentroids change over a certain tolerancedo
    3        Assign elements to the closest cluster w.r.t. the ith-1 centroids;
    4        Compute the ith centroid set on the basis of the associated objects;
    5 end

The first question could be how to compute the distance between each object and the relative centroid. To answer this question, it is necessary to define appropriate metrics, that is, appropriate distances d(xi, xj), quantifying the affinity degree of couples of different objects (xi, xj). Formally, a generic distance or metric is defined as:

Definition 2.2

(Metric). A metric (or distance) is defined as:

d(x,y)=(|xy|p)1/p

si2_e

abiding by the following proprieties:

  1. 1. Nonnegativity
    d(x, y) ≥ 0
  2. 2. Symmetry
    d(x, y) = d(x, y)
  3. 3. Triangle inequality
    d(x, y) ≤ d(x, z) + d(z, y)
  4. 4. Identity of indiscernible
    d(x,y)=0x=y;si3_e

There exist miscellaneous distance definitions, for example:

  •  Manhattan distance (L1): d = |xy|
  •  Euclidean distance (L2): d=(xx)2si4_e
  •  Mahalanobis distance: d = (x, y)S−1(x, y)

Formally, once the distance d is defined, the k-means procedure is formalized as:

S=argminSi=1kxSidist(ci,x)

si5_e  (1)

where k is the number of clusters, x is an object, Si is the ith cluster and ci is the centroid of the ith cluster.

For application domain addressed in this chapter, the most-used distances are the L1 and, among the others, L2. It could be demonstrated that for the Manhattan metric, the optimal ith centroid is the median of the object of the cluster. For the particular case of L2 metric:

S=argminSi=1kxSixci22

si6_e  (2)

with

ci=1dim(Si)xSix

si7_e  (3)

2.1.4 c-Means

c-Means is a fuzzy-based clustering technique extending the concepts introduced by the k-means algorithm. This approach consists of considering each object belonging to all the clusters with a certain confidence degree.

  • 1 Initialization of a number of cluster k and an initial centroid set C0;
    2 whilecentroids change over a certain tolerancedo
    3        Compute the fuzzy membership matrix, ω, w.r.t. the centroids;           Compute the ith centroid set;
    4 end

Formally, the c-means procedure is formalized as:

S=argminSi=1kj=1nωi,jmdist(ci,xj)

si8_e  (4)

where n is the number of objects in the dataset and m > 1 is the fuzzifier coefficient indicating the cluster's fuzziness by determining the belonging degree of each object to each cluster. In the limit of m = 1 the weights ω tend to 0 or 1, representing the particular case of the k-means algorithm. Particularizing for the Euclidean metric, as for the previous section, the c-means algorithm can be formalized as:

S=argminSi=1kj=1nωi,jmxjci22

si9_e  (5)

with

ci=j=1nωi,jmxjj=1nωi,jm

si10_e  (6)

and

ωi,j=1i=1kxicj2xick2

si11_e  (7)

3 Detecting load outliers by clustering analysis

The described techniques can play an important role in power consumption awareness by classifying load profiles, identifying the statistical correlations between these and the endogenous variables, and detecting irregular energy usage. In this context, the main idea is to properly classify the hourly demand and the environmental temperature profiles, and identifying their correlation by applying a fuzzy inference system. Fuzzy logic completely revises the classical set theory and the semantic concept of truth, turning upside down the traditional foundation of the classical set theory, because it allows the definition of a belonging degree of an object to a set, that is considering that object belongs to that set with a certain measure. According to this paradigm, an object can belong to more than one set. In the same way, the concept of truth is radically changed; in the classical meaning, for example, if A is true, it cannot be false and vice versa. In fuzzy logic, a statement is associated with a certain degree of truth. These features are particularly useful for the application under study, because they allow the obtaining of a reliable classification of the load and temperature profiles, and an effective assessment of their correlations. A generic fuzzy inference system can be organized into three different conceptual steps: (i) fuzzification, (ii) inference, and (iii) defuzzification. A graphic scheme of the above is provided in Fig. 2, and the overall architecture of the proposed framework is shown in Fig. 1.

Fig. 1
Fig. 1 Proposed algorithm.
Fig. 2
Fig. 2 Fuzzy inference system.

Analyzing these figures, it is worth nothing that the first step in applying the proposed framework is to obtain reference patterns for both the electrical and the environmental daily profiles by classifying the sensor data streaming generated by the power meter and the meteorological station. The distances between the measured and the reference profiles is then processed by a fuzzy inference system to discriminate the pattern regularity, namely to detect the so-called load and temperature anomalies. Starting from this estimation, and on the basis of heuristic experiences on thermostatic loads, it could be argued that a load anomaly induced by a temperature anomaly could be considered as a regular load pattern, that is, during summer if the environmental temperature is extremely high, the load is expected to sensibly increase. On the contrary, a load anomaly that does not correspond to a temperature anomaly is a clear indication of an irregular load pattern. These straightforward heuristic rules have been translated in proper fuzzy rules, which represent the knowledge base of the proposed fuzzy inference system.

4 Case study

In this section, the proposed framework has been applied in the task of detecting power demand irregularities for a large commercial user located in the south of Italy. The analyzed domain is characterized by complex demand patterns, mainly due to air conditioning, lighting, and medical and technological services, which changes considerably on the basis of the seasons and environmental conditions. The corresponding distribution of the yearly average power demands is reported in Fig. 3.

Fig. 3
Fig. 3 User energy consumptions.

As shown in Fig. 3, the air conditioning has a bearing of 42% on the total energy consumption. Further, lighting represents another relevant item in the energy computation because many wards need lighting for more than 12–14 h/day. Laundry and kitchen represent another big portion of the energy loads, even though they consist totally of thermal energy requirements.

The electric system under analysis consists of a medium-voltage distribution network connected, by three power lines, to the user substation.

In order to detect the demand irregularities for this complex user, the load profiles and the temperature measurements for 1 year of operation time, reported in Figs. 4 and 5, respectively, have been considered. Analyzing these figures, it is worth noting that the considered yearly profiles have been characterized by complex and heterogeneous patterns, which are characterized by extremely variable electrical and environmental dynamics.

Fig. 4
Fig. 4 One-year load profiles.
Fig. 5
Fig. 5 One-year temperature profiles.

The proposed methodology can be conceptually divided into two main parts: the clustering-based classification of the load and temperature profiles, and the fuzzy-based processing of the computed quantities. The first step consists in classifying the power meter and sensor acquisitions within predetermined clusters, through k-mean- and c-mean-based algorithms. The first are classified into 16 different clusters on the basis of the day (business day/holiday), the loading level (i.e., low and high load), and the season.

The results of this classification process are shown in Figs. 613, which depict the centroids identified by both k-mean and c-mean clustering techniques.

Fig. 6
Fig. 6 Wintry load profiles: k-mean. (A) Business day; (B) holiday.
Fig. 7
Fig. 7 Spring load profiles: k-mean. (A) Business day; (B) holiday.
Fig. 8
Fig. 8 Summer load profiles: k-mean. (A) Business day; (B) holiday.
Fig. 9
Fig. 9 Fall load profiles: k-mean. (A) Business day; (B) holiday.
Fig. 10
Fig. 10 Wintry load profiles: c-mean. (A) Business day; (B) holiday.
Fig. 11
Fig. 11 Spring load profiles: c-mean. (A) Business day; (B) holiday.
Fig. 12
Fig. 12 Summer load profiles: c-mean. (A) Business day; (B) holiday.
Fig. 13
Fig. 13 Fall load profiles: c-mean. (A) Business day; (B) holiday.

Moreover, the temperature profiles are classified on the basis of the season and the day (hot, middle, and cold), as shown in Figs. 1421, where, as for the active power load profiles, the cluster centroids for both k-mean and c-mean techniques are represented.

Fig. 14
Fig. 14 Wintry temperature profiles: k-mean. (A) Business day; (B) holiday.
Fig. 15
Fig. 15 Spring temperature profiles: k-mean. (A) Business day; (B) holiday.
Fig. 16
Fig. 16 Summer temperature profiles: k-mean. (A) Business day; (B) holiday.
Fig. 17
Fig. 17 Fall temperature profiles: k-mean. (A) Business day; (B) holiday.
Fig. 18
Fig. 18 Wintry temperature profiles: c-mean. (A) Business day; (B) holiday.
Fig. 19
Fig. 19 Spring temperature profiles: c-mean. (A) Business day; (B) holiday.
Fig. 20
Fig. 20 Summer temperature profiles: c-mean. (A) Business day; (B) holiday.
Fig. 21
Fig. 21 Fall temperature profiles: c-mean. (A) Business day; (B) holiday.

The second conceptual step (i.e., the fuzzy-based outliers detection) computes the distance between each load and temperature profile and the corresponding centroid. When such distance is greater than a threshold value, the corresponding load profile is classified as an anomalous profile. In order to distinguish between endogenous or exogenous effects, the framework correlates demand and temperature profiles: if temperature and demand profiles are both anomalous the anomaly is classified as exogenous while if only the absorption profile is anomalous, it is classified as endogenous, and the considered profile is detected as an outlier.

The results of this process are summarized in Figs. 2229, where, for each minimum and maximum temperature profile, the temperature and absorption anomalies have been highlighted by denoting them with circles and stars, respectively. Analyzing these results, it is possible to identify the days in which temperature and absorption outliers coexist in the system and, hence, it is possible to identify absorption anomalies that cannot be related to climatic events and that can be reasonably associated with other factors influencing system operation.

Fig. 22
Fig. 22 Wintry anomalies: k-mean. (A) Business day; (B) holiday.
Fig. 23
Fig. 23 Spring anomalies: k-mean. (A) Business day; (B) holiday.
Fig. 24
Fig. 24 Summer anomalies: k-mean. (A) Business day; (B) holiday.
Fig. 25
Fig. 25 Fall anomalies: k-mean. (A) Business day; (B) holiday.
Fig. 26
Fig. 26 Wintry anomalies: c-mean. (A) Business day; (B) holiday.
Fig. 27
Fig. 27 Spring anomalies: c-mean. (A) Business day; (B) holiday.
Fig. 28
Fig. 28 Summer anomalies: c-mean. (A) Business day; (B) holiday.
Fig. 29
Fig. 29 Fall anomalies: c-mean. (A) Business day; (B) holiday.

Finally, the results of this application can be summarized in tabular format, allowing the performance comparison between k-mean and c-mean data clustering approaches (Tables 13).

Table 1

Load anomalies
k-Meanc-Means
Winter6 36 77 78 356 357 359 360 361 362 3642 6 9 13 48 58 68 72 77 358 359 360 361 362 363 365
Spring85 100 116 122 159 161 162 163 165 17385 90 98 100 116 117 134 150 154 156 158 162
Summer182 184 200 204 207 220 225 228 240 241 242 265200 201 207 225 233 234 237
Fall268 272 273 286 292 298 302 303 306 333 348 350268 269 272 273 275 277 280 281 302 306 310 333 335 352

Table 2

Temperature anomalies
k-Meanc-Means
Winter32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 482 4 5 6 9 10 11 12 13 16 17 18 19 20 23 24 25 26 27 30 31 51 52 55 57 58 59 63 64 67 68 70 71 77 78 356 359 360 361 364
Spring118 119 120 121 122 123 130 131 132 133 134 141 142 145 146 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 17396 97 100 102 103 104 107 108 109 110 111 114 115 116 119 120 128 133 134 135 137 138 141 143 149 154 155 161 162 168 169
Summer174 175 176 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 196 197 201 202 203 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244174 175 176 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 196 197 201 202 203 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
Fall268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 310 311 317 318 319268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 310 311 317 318 319

Table 3

Endogenous load anomalies
k-Meansc-Means
Winter6 77 78 356 357 359 360 361 362 36448 72 358 362 363 365
Spring85 100 11685 90 98 117 150 156 158
Summer200 204 207 225 265200 207 225
Fall302 303 306 333 348 350302 306 333 335 352

5 Conclusion

The development of computing paradigms aimed at converting the power demand data into actionable information, allowing the prosumer to have a full understanding of the available information, represents a timely and relevant issue to address. In light of this need, this chapter analyzed the potential role of cluster analysis and fuzzy-based programming for detecting irregular demand patterns, such as unusual energy usage, decreasing efficiency, erroneous regulations, and load malfunctioning. This strategic information could enhance the energy efficiency, reduce the supplying costs, and increase the situational awareness of prosumers about their actual power demand. Experimental results obtained on a real case study have been presented and discussed in order to assess the benefits deriving by the application of the proposed framework in the task of analyzing complex load patterns.

References

[1] Kim Y.-I., Ko J.-M., Song J.-J., Choi H. Repeated clustering to improve the discrimination of typical daily load profile. J. Electr. Eng. Technol. 2012;7(3):281–287.

[2] Yang S.-L., Shen C., et al. A review of electric load classification in smart grid environment. Renew. Sustain. Energy Rev. 2013;24:103–110.

[3] Chicco G., Napoli R., Piglione F. Application of clustering algorithms and self organising maps to classify electricity customers. In: IEEE; 7. 2003 IEEE Bologna Power Tech Conference Proceedings. 2003;vol. 1.

[4] Chicco G., Napoli R., Piglione F. Comparisons among clustering techniques for electricity customer classification. IEEE Trans. Power Syst. 2006;21(2):933–940.

[5] Chicco G., Napoli R., Piglione F. Load pattern clustering for short-term load forecasting of anomalous days. In: IEEE; 6. 2001 IEEE Porto Power Tech Proceedings. 2001;vol. 2.

[6] Li Y., Cheng Y., Zhang L., Huang H. Application of multivariate statistical analysis to classify electricity customers. In: IEEE; 2008:1–6. China International Conference on Electricity Distribution, 2008. CICED 2008.

[7] Ozgonenel O., Thomas D.W.P., Yalcin T., Bertizlioglu I.N. Detection of blackouts by using K-means clustering in a power system. In: IET; 2012:1–6. 11th International Conference on Developments in Power Systems Protection, 2012. DPSP 2012.

[8] Anaparthi K., Chaudhuri B., Thornhill N.F., Pal B.C. Coherency identification in power systems through principal component analysis. IEEE Trans. Power Sys. 2005;20(3):1658–1660.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.37.10