|
| 1 | +# To use this package |
| 2 | +using ClustForOpt |
| 3 | +using Test |
| 4 | + |
| 5 | +@testset "workflow_intro" begin |
| 6 | + |
| 7 | + ######################### |
| 8 | + #= Load Time Series Data |
| 9 | + ######################### |
| 10 | + How to load data provided with the package: |
| 11 | + The data is for a Capacity Expansion Problem "CEP" |
| 12 | + and for the single node representation of Germany "GER_1" |
| 13 | + The original timeseries has 8760 entries (one for each hour of the year) |
| 14 | + It should be cut into K=365 periods (365 days) with T=24 timesteps per period (24h per day) =# |
| 15 | + data_path=normpath(joinpath(dirname(@__FILE__),"..","data","TS_GER_1")) |
| 16 | + ts_input_data = load_timeseries_data(data_path; T=24, years=[2016]) |
| 17 | + |
| 18 | + #= ClustData |
| 19 | + How the struct is setup: |
| 20 | + ClustData{region::String,K::Int,T::Int,data::Dict{String,Array},weights::Array{Float64},mean::Dict{String,Array},sdv::Dict{String,Array}} <: TSData |
| 21 | + -region: specifies region data belongs to |
| 22 | + -K: number of periods |
| 23 | + -T: time steps per period |
| 24 | + -data: Data in form of a dictionary for each attribute `"[file name]-[column name]"` |
| 25 | + -weights: this is the absolute weight. E.g. for a year of 365 days, sum(weights)=365 |
| 26 | + -mean: For normalized data: The shift of the mean as a dictionary for each attribute |
| 27 | + -sdv: For normalized data: Standard deviation as a dictionary for each attribute |
| 28 | +
|
| 29 | + How to access a struct: |
| 30 | + [object].[fieldname] =# |
| 31 | + number_of_periods=ts_input_data.K |
| 32 | + # How to access a dictionary: |
| 33 | + data_solar_germany=ts_input_data.data["solar-germany"] |
| 34 | + # How to plot data |
| 35 | + #using Plots |
| 36 | + # plot(Array of our data, no legend, dotted lines, label on the x-Axis, label on the y-Axis) |
| 37 | + #plot_input_solar=plot(ts_input_data.data["solar-germany"], legend=false, linestyle=:dot, xlabel="Time [h]", ylabel="Solar availability factor [%]") |
| 38 | + |
| 39 | + # How to load your own data: |
| 40 | + # put your data into your homedirectory into a folder called tutorial |
| 41 | + # The data should have the following structure: see ClustForOpt/data folder |
| 42 | + #= |
| 43 | + - Loading all `*.csv` files in the folder or the file `data_path` |
| 44 | + The `*.csv` files shall have the following structure and must have the same length: |
| 45 | + |Timestamp |[column names...]| |
| 46 | + |[iterator]|[values] | |
| 47 | + The first column should be called `Timestamp` if it contains a time iterator |
| 48 | + The other columns can specify the single timeseries like specific geolocation. |
| 49 | + Each column in `[file name].csv` file will be added to the ClustData.data called `"[file name]-[column name]"` |
| 50 | + - region is an additional String to specify the loaded time series data |
| 51 | + - K describes the number of periods in the input data |
| 52 | + - T describes the length of each period =# |
| 53 | + load_your_own_data=false |
| 54 | + if load_your_own_data |
| 55 | + # Single file at the path e.g. homedir/tutorial/solar.csv |
| 56 | + # It will automatically call the data 'solar' within the datastruct |
| 57 | + my_path=joinpath(homedir(),"tutorial","solar.csv") |
| 58 | + your_data_1=load_timeseries_data(my_path; region="none", T=24) |
| 59 | + # Multiple files in the folder e.g. homedir/tutorial/ |
| 60 | + # Within the data struct, it will automatically call the data the names of the csv filenames |
| 61 | + my_path=joinpath(homedir(),"tutorial") |
| 62 | + data_path=normpath(joinpath(dirname(@__FILE__),"..","data","TS_GER_18")) |
| 63 | + your_data_2 = load_timeseries_data(data_path; T=24, years=[2015]) |
| 64 | + end |
| 65 | + |
| 66 | + |
| 67 | + ############# |
| 68 | + # Clustering |
| 69 | + ############# |
| 70 | + # Quick example and investigation of the best result: |
| 71 | + ts_clust_result = run_clust(ts_input_data; method="kmeans", representation="centroid", n_init=5, n_clust=5) # note that you should use n_init=1000 at least for kmeans. |
| 72 | + ts_clust_data = ts_clust_result.clust_data |
| 73 | + # And some plotting: |
| 74 | + #plot_comb_solar=plot!(plot_input_solar, ts_clust_data.data["solar-germany"], linestyle=:solid, width=3) |
| 75 | + #plot_clust_soar=plot(ts_clust_data.data["el_demand-germany"], legend=false, linestyle=:solid, width=3, xlabel="Time [h]", ylabel="Solar availability factor [%]") |
| 76 | + |
| 77 | + |
| 78 | + #= Clustering options: |
| 79 | + `run_clust()` takes the full `data` and gives a struct with the clustered data as the output. |
| 80 | +
|
| 81 | +
|
| 82 | + ## Supported clustering methods |
| 83 | + The following combinations of clustering method and representations are supported by `run_clust`: |
| 84 | +
|
| 85 | + Name | method | representation |
| 86 | + ----------------------------------------------------|-------------------|---------------- |
| 87 | + k-means clustering | `<kmeans>` | `<centroid>` |
| 88 | + k-means clustering with medoid representation | `<kmeans>` | `<medoid>` |
| 89 | + k-medoids clustering (partitional) | `<kmedoids>` | `<medoid>` |
| 90 | + k-medoids clustering (exact) [requires Gurobi] | `<kmedoids_exact>`| `<medoid>` |
| 91 | + hierarchical clustering with centroid representation| `<hierarchical>` | `<centroid>` |
| 92 | + hierarchical clustering with medoid representation | `<hierarchical>` | `<medoid>` |
| 93 | +
|
| 94 | + ## Other input parameters |
| 95 | +
|
| 96 | + The input parameter `n_clust` determines the number of clusters,i.e., representative periods. |
| 97 | +
|
| 98 | + `n_init` determines the number of random starting points. As a rule of thumb, use: |
| 99 | + `n_init` should be chosen 1000 or 10000 if you use k-means or k-medoids |
| 100 | + `n_init` should be chosen 1 if you use k-medoids_exact or hierarchical clustering |
| 101 | +
|
| 102 | + `iterations` is defaulted to 300, which is a good value for kmeans and kmedoids in our experience. The parameter iterations does not matter when you use k-medoids exact or hierarchical clustering. |
| 103 | +
|
| 104 | + =# |
| 105 | + |
| 106 | + # A clustering run with different options chosen as an example |
| 107 | + ts_clust_result_2 = run_clust(ts_input_data; method="kmedoids", representation="medoid", n_init=100, n_clust=4, iterations=500) |
| 108 | + |
| 109 | + @test 0==0 |
| 110 | +end |
0 commit comments