Skip to content

Commit 5d24eb3

Browse files
Merge branch 'master' into paper
2 parents 85a41a8 + feea614 commit 5d24eb3

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

paper/paper.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ bibliography: paper.bib
2828

2929
``TimeSeriesClustering`` is a Julia implementation of unsupervised learning methods for time series datasets. It provides functionality for clustering and aggregating, detecting motifs, and quantifying similarity between time series datasets.
3030
The software provides a type system for temporal data, and provides an implementation of the most commonly used clustering methods and extreme value selection methods for temporal data.
31-
``TimeSeriesClustering`` provides simple integration of multi-dimensional time series data (e.g. multiple attributes such as wind availability, solar availability, and electricity demand) in a single aggregation process.
31+
``TimeSeriesClustering`` provides simple integration of multi-dimensional time-series data (e.g., multiple attributes such as wind availability, solar availability, and electricity demand) in a single aggregation process.
3232
The software is applicable to general time series datasets and lends itself well to a multitude of application areas within the field of time series data mining.
3333
``TimeSeriesClustering`` was originally developed to perform time series aggregation for energy systems optimization problems. Because of the software's origin, many of the examples in this work stem from the field of energy systems optimization.
3434

@@ -39,11 +39,11 @@ The clustering methods that are implemented in ``TimeSeriesClustering`` follow t
3939

4040
The following are the key features that ``TimeSeriesClustering`` provides. Implementation details can be found in the software's documentation.
4141

42-
- *The type system*: The data type (called struct in Julia) ``ClustData`` stores all time-series data in a common format. Besides the data itself, it automatically processes and stores information which are relevant for later use in the application for which the time-series data will be used. The data type ``ClustResult`` additionally stores information relevant for evaluating clustering performance. These data types make ``TimeSeriesClustering`` to be easily integrated with any analysis that relies on iterative evaluation of the clustering and aggregation methods.
42+
- *The type system*: The data type (called struct in Julia) ``ClustData`` stores all time-series data in a common format. Besides the data itself, it automatically processes and stores information that is relevant for later use in the application for which the time-series data will be used. The data type ``ClustResult`` additionally stores information relevant for evaluating clustering performance. These data types make ``TimeSeriesClustering`` easy to integrate with any analysis that relies on iterative evaluation of the clustering and aggregation methods.
4343

4444
- *The aggregation methods*: The most commonly used clustering methods and extreme value selection methods are implemented with a common interface, allowing for simple comparison of these methods on a given data set and optimization problem.
4545

46-
- *The generalized import of time series in csv format*: Time series can be loaded through csv files in a pre-defined format. From this, variable names, which we call attributes, and node names are automatically loaded and stored. The original time series can be sliced into periods of user-defined length. This information can then be used in the definition of the sets of the optimization problem later.
46+
- *The generalized import of time series in csv format*: Time series can be loaded through csv files in a pre-defined format. From this, variable names, which we call attributes, and node names are automatically loaded and stored. The original time series can be sliced into periods of user-defined length. This information can later be used in the definition of the sets of the optimization problem.
4747

4848
- *Multiple attributes and nodes*: Multiple time series, one for each attribute (and node, if the data has a spatial component), are automatically combined and aggregated simultaneously.
4949

@@ -55,7 +55,7 @@ In energy systems optimization, the choice of temporal modeling, especially of t
5555
It is thus important to not view time-series aggregation and optimization model formulation as two seperate, consecutive steps, but to integrate time-series aggregation into the overall process of building an optimization model in an iterative manner. Because the most commonly used clustering methods and extreme value selection methods are implemented with a common interface, ``TimeSeriesClustering`` allows for this iterative integration in a simple way.
5656

5757
The type system for temporal data provided by ``TimeSeriesClustering`` allows for easy integration with the formulation of optimization problems.
58-
The information stored in the datatype ``ClustData`` such as the number of periods, the number of time steps per period, and the chronology of the periods can be used to formulate the sets of an optimization problem.
58+
The information stored in the datatype ``ClustData``, such as the number of periods, the number of time steps per period, and the chronology of the periods, can be used to formulate the sets of an optimization problem.
5959

6060
``TimeSeriesClustering`` provides two sample optimization problems to illustrate the integration of time-series aggregation and optimization problem formulation through our type system.
6161
However, it is generally thought to be independent of the application at hand, and others are encouraged to use the package as a base for their own optimization problem formulation.
@@ -83,31 +83,31 @@ With specific focus on energy systems optimization, time-series aggregation has
8383
[``Calliope``](https://github.com/calliope-project/calliope) [@Pfenninger:2018] is a capacity expansion modeling software in Python that includes time-series aggregation for the use case of generation and transmission capacity expansion modeling.
8484
``TimeSeriesClustering`` is the first package to provide time-series aggregation in Julia [@Bezanson:2017].
8585
For energy systems optimization, this is advantageous because it can be used in conjunction with the [``JuMP``](https://github.com/JuliaOpt/JuMP.jl) package [@Dunning:2017] in Julia, which provides an excellent modeling language for optimization problems.
86-
Furthermore, ``TimeSeriesClustering`` includes both clustering and extreme value selection and integrates them into the same output type. This is important in order to retain the characteristics of the time-series that are relevant to many optimization problems.
86+
Furthermore, ``TimeSeriesClustering`` includes both clustering and extreme value selection, and integrates them into the same output type. This is important in order to retain the characteristics of the time-series that are relevant to many optimization problems.
8787

8888
# Application areas
8989
``TimeSeriesClustering`` is broadly applicable to many fields where time series analysis occurs.
9090
Time-series clustering and aggregation methods alone have applications in the fields of aviation, astronomy, biology, climate, energy, environment, finance, medicine, psychology, robotics, speech recognition, and user analysis [@Liao:2005, @Aghabozorgi:2015].
91-
These methods can be used for time-series representation and indexing, which helps reduce the dimension (i.e. the number of data points) of the original data [@Fu:2011].
91+
These methods can be used for time-series representation and indexing, which helps reduce the dimension (i.e., the number of data points) of the original data [@Fu:2011].
9292

9393
Many tasks in time series data mining also fall into the application area of our software [@Fu:2011, @Hebert:2014].
9494
Here, our software can be used to measure similarity between time-series datasets [@Serra:2014].
9595
Closely related is the task of finding time-series motifs [@Lin:2002, @Yankov:2007, @Mueen:2014]. Time-series motifs are pairs of individual time series that are very similar to each other.
9696
This task occurs in many disciplines, for example in finding repeated animal behavior [@Mueen:2013], finding regulatory elements in DNA [@Das:2007], and finding patterns in EEG signals [@Castro:2010].
9797
Another application area of our software is segmentation and clustering of audio datasets [@Siegler:1997, @Lefevre:2011, @Kamper:2017].
9898

99-
In the remainder of this section, we provide an overview of how time-series aggregation applies to input data of optimization problems.
99+
In the remainder of this section, we provide an overview of how time-series aggregation applies to the input data of optimization problems.
100100

101101
Generally, optimization is concerned with the maximization or minimization of a certain objective subject to a number of constraints. The range of optimization problems ``TimeSeriesClustering`` is applicable to is broad.
102-
They generally fall into the class of design and operations problems, also called planning problems or two-stage optimization problems. In these problems, decisions on two time horizons have to be made: Long-term design decisions, as to what equipment to buy, and short-term operating decisions, as to when to operate that equipment. Because the two time horizons are intertwined, operating decisions impact the system design, and vice versa. Operating decisions are of temporal nature, and the amount of temporal input data for these optimization problems often makes them computationally intractable.
103-
Usually, time series of length $N$ (e.g. hourly electricity demand for one year, where $N=8760$) are split into $\hat{K}$ periods of length $T=\frac{N}{\hat{K}}$ (e.g. $\hat{K}=365$ daily periods, with $T=24$), and each of the $\hat{K}$ periods is treated independently in the operations stage of the optimization problem. Using time-series aggregation methods, we can represent the data with $K < \hat{K}$ periods, which results in reduced computational complexity and improved modeling performance.
102+
They generally fall into the class of design and operations problems, also called planning problems or two-stage optimization problems. In these problems, decisions on two time horizons have to be made: long-term design decisions, as to what equipment to buy, and short-term operating decisions, as to when to operate that equipment. Because the two time horizons are intertwined, operating decisions impact the system design, and vice versa. Operating decisions are of a temporal nature, and the amount of temporal input data for these optimization problems often makes them computationally intractable.
103+
Usually, time series of length $N$ (e.g., hourly electricity demand for one year, where $N=8760$) are split into $\hat{K}$ periods of length $T=\frac{N}{\hat{K}}$ (e.g., $\hat{K}=365$ daily periods, with $T=24$), and each of the $\hat{K}$ periods is treated independently in the operations stage of the optimization problem. Using time-series aggregation methods, we can represent the data with $K < \hat{K}$ periods, which results in reduced computational complexity and improved modeling performance.
104104

105-
Many of the design and operations optimization problems that time-series aggregation has been applied to are in the general domain of energy systems optimization. These problems include generation and transmission capacity expansion problems [@Nahmmacher:2016; @Pfenninger:2017], local energy supply system design problems [@Bahl:2017; @Kotzur:2018], and individual technology design problems [@Brodrick:2017; @Teichgraeber:2017].
105+
Many of the design and operations optimization problems to which time-series aggregation has been applied are in the general domain of energy systems optimization. These problems include generation and transmission capacity expansion problems [@Nahmmacher:2016; @Pfenninger:2017], local energy supply system design problems [@Bahl:2017; @Kotzur:2018], and individual technology design problems [@Brodrick:2017; @Teichgraeber:2017].
106106
Time series of interest in these problems include energy demands (electricity, heating, cooling), electricity prices, wind and solar availability factors, and temperatures.
107107

108108
Many other planning problems in operations research that involve time-varying operations have similar characteristics that make them suitable for time-series aggregation. Some examples are aggregate and detailed production scheduling, job shop design and scheduling, distribution system (warehouse) design and control [@Dempster:1981], and electric vehicle charging station sizing [@Jia:2012].
109109
Time series of interest in these problems include product demands, electricity prices, and electricity demands.
110-
A related class of problems that ``TimeSeriesClustering`` can be useful to is scenario reduction for stochastic programming [@Karuppiah:2010]. Two-stage stochastic programs have similar characteristics to the previously described two-stage problems, and are often computationally intractable due to a large number of scenarios. ``TimeSeriesClustering`` can be used to reduce a large number of scenarios $\hat{K}$ into a computationally tractable number of scenarios $K < \hat{K}$.
110+
A related class of problems to which ``TimeSeriesClustering`` can be useful is scenario reduction for stochastic programming [@Karuppiah:2010]. Two-stage stochastic programs have similar characteristics to the previously described two-stage problems, and are often computationally intractable due to a large number of scenarios. ``TimeSeriesClustering`` can be used to reduce a large number of scenarios $\hat{K}$ into a computationally tractable number of scenarios $K < \hat{K}$.
111111
Furthermore, ``TimeSeriesClustering`` could be used in operational contexts such as developing operational strategies for typical days, or aggregating repetitive operating conditions for use in model predictive control.
112112
Because it keeps track of the chronology of the periods, it can also be used to calculate transition probabilities between clustered periods for Markov chain modeling.
113113

0 commit comments

Comments
 (0)