Merge branch 'master' into paper

holgerteichgraeber · web-flow · commit 5d24eb309544 · 2019-09-01T19:39:35.000-07:00
diff --git a/paper/paper.md b/paper/paper.md
@@ -28,7 +28,7 @@ bibliography: paper.bib
 
 ``TimeSeriesClustering`` is a Julia implementation of unsupervised learning methods for time series datasets. It provides functionality for clustering and aggregating, detecting motifs, and quantifying similarity between time series datasets.
 The software provides a type system for temporal data, and provides an implementation of the most commonly used clustering methods and extreme value selection methods for temporal data.
-``TimeSeriesClustering`` provides simple integration of multi-dimensional time series data (e.g. multiple attributes such as wind availability, solar availability, and electricity demand) in a single aggregation process.
+``TimeSeriesClustering`` provides simple integration of multi-dimensional time-series data (e.g., multiple attributes such as wind availability, solar availability, and electricity demand) in a single aggregation process.
 The software is applicable to general time series datasets and lends itself well to a multitude of application areas within the field of time series data mining.
 ``TimeSeriesClustering`` was originally developed to perform time series aggregation for energy systems optimization problems. Because of the software's origin, many of the examples in this work stem from the field of energy systems optimization.
 
@@ -39,11 +39,11 @@ The clustering methods that are implemented in ``TimeSeriesClustering`` follow t
 
 The following are the key features that ``TimeSeriesClustering`` provides. Implementation details can be found in the software's documentation.
 
-- *The type system*: The data type (called struct in Julia) ``ClustData`` stores all time-series data in a common format. Besides the data itself, it automatically processes and stores information which are relevant for later use in the application for which the time-series data will be used. The data type ``ClustResult`` additionally stores information relevant for evaluating clustering performance. These data types make ``TimeSeriesClustering`` to be easily integrated with any analysis that relies on iterative evaluation of the clustering and aggregation methods.
+- *The type system*: The data type (called struct in Julia) ``ClustData`` stores all time-series data in a common format. Besides the data itself, it automatically processes and stores information that is relevant for later use in the application for which the time-series data will be used. The data type ``ClustResult`` additionally stores information relevant for evaluating clustering performance. These data types make ``TimeSeriesClustering`` easy to integrate with any analysis that relies on iterative evaluation of the clustering and aggregation methods.
 
 - *The aggregation methods*: The most commonly used clustering methods and extreme value selection methods are implemented with a common interface, allowing for simple comparison of these methods on a given data set and optimization problem.
 
-- *The generalized import of time series in csv format*: Time series can be loaded through csv files in a pre-defined format. From this, variable names, which we call attributes, and node names are automatically loaded and stored. The original time series can be sliced into periods of user-defined length. This information can then be used in the definition of the sets of the optimization problem later.
+- *The generalized import of time series in csv format*: Time series can be loaded through csv files in a pre-defined format. From this, variable names, which we call attributes, and node names are automatically loaded and stored. The original time series can be sliced into periods of user-defined length. This information can later be used in the definition of the sets of the optimization problem.
 
 - *Multiple attributes and nodes*: Multiple time series, one for each attribute (and node, if the data has a spatial component), are automatically combined and aggregated simultaneously.
 
@@ -55,7 +55,7 @@ In energy systems optimization, the choice of temporal modeling, especially of t
 It is thus important to not view time-series aggregation and optimization model formulation as two seperate, consecutive steps, but to integrate time-series aggregation into the overall process of building an optimization model in an iterative manner. Because the most commonly used clustering methods and extreme value selection methods are implemented with a common interface, ``TimeSeriesClustering`` allows for this iterative integration in a simple way.
 
 The type system for temporal data provided by ``TimeSeriesClustering`` allows for easy integration with the formulation of optimization problems.
-The information stored in the datatype ``ClustData`` such as the number of periods, the number of time steps per period, and the chronology of the periods can be used to formulate the sets of an optimization problem.
+The information stored in the datatype ``ClustData``, such as the number of periods, the number of time steps per period, and the chronology of the periods, can be used to formulate the sets of an optimization problem.
 
 ``TimeSeriesClustering`` provides two sample optimization problems to illustrate the integration of time-series aggregation and optimization problem formulation through our type system.
 However, it is generally thought to be independent of the application at hand, and others are encouraged to use the package as a base for their own optimization problem formulation.
@@ -83,31 +83,31 @@ With specific focus on energy systems optimization, time-series aggregation has
 [``Calliope``](https://github.com/calliope-project/calliope) [@Pfenninger:2018] is a capacity expansion modeling software in Python that includes time-series aggregation for the use case of generation and transmission capacity expansion modeling.
 ``TimeSeriesClustering`` is the first package to provide time-series aggregation in Julia [@Bezanson:2017].
 For energy systems optimization, this is advantageous because it can be used in conjunction with the [``JuMP``](https://github.com/JuliaOpt/JuMP.jl) package [@Dunning:2017] in Julia, which provides an excellent modeling language for optimization problems.
-Furthermore, ``TimeSeriesClustering`` includes both clustering and extreme value selection and integrates them into the same output type. This is important in order to retain the characteristics of the time-series that are relevant to many optimization problems.
+Furthermore, ``TimeSeriesClustering`` includes both clustering and extreme value selection, and integrates them into the same output type. This is important in order to retain the characteristics of the time-series that are relevant to many optimization problems.
 
 # Application areas
 ``TimeSeriesClustering`` is broadly applicable to many fields where time series analysis occurs.
 Time-series clustering and aggregation methods alone have applications in the fields of aviation, astronomy, biology, climate, energy, environment, finance, medicine, psychology, robotics, speech recognition, and user analysis [@Liao:2005, @Aghabozorgi:2015].
-These methods can be used for time-series representation and indexing, which helps reduce the dimension (i.e. the number of data points) of the original data [@Fu:2011].
+These methods can be used for time-series representation and indexing, which helps reduce the dimension (i.e., the number of data points) of the original data [@Fu:2011].
 
 Many tasks in time series data mining also fall into the application area of our software [@Fu:2011, @Hebert:2014].
 Here, our software can be used to measure similarity between time-series datasets [@Serra:2014].
 Closely related is the task of finding time-series motifs [@Lin:2002, @Yankov:2007, @Mueen:2014]. Time-series motifs are pairs of individual time series that are very similar to each other.
 This task occurs in many disciplines, for example in finding repeated animal behavior [@Mueen:2013], finding regulatory elements in DNA [@Das:2007], and finding patterns in EEG signals [@Castro:2010].
 Another application area of our software is segmentation and clustering of audio datasets [@Siegler:1997, @Lefevre:2011, @Kamper:2017].
 
-In the remainder of this section, we provide an overview of how time-series aggregation applies to input data of optimization problems.
+In the remainder of this section, we provide an overview of how time-series aggregation applies to the input data of optimization problems.
 
 Generally, optimization is concerned with the maximization or minimization of a certain objective subject to a number of constraints. The range of optimization problems ``TimeSeriesClustering`` is applicable to is broad.
-They generally fall into the class of design and operations problems, also called planning problems or two-stage optimization problems. In these problems, decisions on two time horizons have to be made: Long-term design decisions, as to what equipment to buy, and short-term operating decisions, as to when to operate that equipment. Because the two time horizons are intertwined, operating decisions impact the system design, and vice versa. Operating decisions are of temporal nature, and the amount of temporal input data for these optimization problems often makes them computationally intractable.
-Usually, time series of length $N$ (e.g. hourly electricity demand for one year, where $N=8760$) are split into $\hat{K}$ periods of length $T=\frac{N}{\hat{K}}$ (e.g. $\hat{K}=365$ daily periods, with $T=24$), and each of the $\hat{K}$ periods is treated independently in the operations stage of the optimization problem. Using time-series aggregation methods, we can represent the data with $K < \hat{K}$ periods, which results in reduced computational complexity and improved modeling performance.
+They generally fall into the class of design and operations problems, also called planning problems or two-stage optimization problems. In these problems, decisions on two time horizons have to be made: long-term design decisions, as to what equipment to buy, and short-term operating decisions, as to when to operate that equipment. Because the two time horizons are intertwined, operating decisions impact the system design, and vice versa. Operating decisions are of a temporal nature, and the amount of temporal input data for these optimization problems often makes them computationally intractable.
+Usually, time series of length $N$ (e.g., hourly electricity demand for one year, where $N=8760$) are split into $\hat{K}$ periods of length $T=\frac{N}{\hat{K}}$ (e.g., $\hat{K}=365$ daily periods, with $T=24$), and each of the $\hat{K}$ periods is treated independently in the operations stage of the optimization problem. Using time-series aggregation methods, we can represent the data with $K < \hat{K}$ periods, which results in reduced computational complexity and improved modeling performance.
 
-Many of the design and operations optimization problems that time-series aggregation has been applied to are in the general domain of energy systems optimization. These problems include generation and transmission capacity expansion problems [@Nahmmacher:2016; @Pfenninger:2017], local energy supply system design problems [@Bahl:2017; @Kotzur:2018], and individual technology design problems [@Brodrick:2017; @Teichgraeber:2017].
+Many of the design and operations optimization problems to which time-series aggregation has been applied are in the general domain of energy systems optimization. These problems include generation and transmission capacity expansion problems [@Nahmmacher:2016; @Pfenninger:2017], local energy supply system design problems [@Bahl:2017; @Kotzur:2018], and individual technology design problems [@Brodrick:2017; @Teichgraeber:2017].
 Time series of interest in these problems include energy demands (electricity, heating, cooling), electricity prices, wind and solar availability factors, and temperatures.
 
 Many other planning problems in operations research that involve time-varying operations have similar characteristics that make them suitable for time-series aggregation. Some examples are aggregate and detailed production scheduling, job shop design and scheduling, distribution system (warehouse) design and control [@Dempster:1981], and electric vehicle charging station sizing [@Jia:2012].
 Time series of interest in these problems include product demands, electricity prices, and electricity demands.
-A related class of problems that ``TimeSeriesClustering`` can be useful to is scenario reduction for stochastic programming [@Karuppiah:2010]. Two-stage stochastic programs have similar characteristics to the previously described two-stage problems, and are often computationally intractable due to a large number of scenarios. ``TimeSeriesClustering`` can be used to reduce a large number of scenarios $\hat{K}$ into a computationally tractable number of scenarios $K < \hat{K}$.
+A related class of problems to which ``TimeSeriesClustering`` can be useful is scenario reduction for stochastic programming [@Karuppiah:2010]. Two-stage stochastic programs have similar characteristics to the previously described two-stage problems, and are often computationally intractable due to a large number of scenarios. ``TimeSeriesClustering`` can be used to reduce a large number of scenarios $\hat{K}$ into a computationally tractable number of scenarios $K < \hat{K}$.
 Furthermore, ``TimeSeriesClustering`` could be used in operational contexts such as developing operational strategies for typical days, or aggregating repetitive operating conditions for use in model predictive control.
 Because it keeps track of the chronology of the periods, it can also be used to calculate transition probabilities between clustered periods for Markov chain modeling.