Skip to content

Commit 4e7ac0c

Browse files
committed
workshop tested and enhanced
Added output and figures to lessons Tested the code in all lessons, corrected typos Added actual code to code folder Added detail to some lessons, like the tidy data callout
1 parent c41c828 commit 4e7ac0c

24 files changed

+12261
-16
lines changed

_episodes/01-create-new-environment.md

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,35 @@ This workshop utilizes some Python packages (such as Plotly) that cannot be inst
2020
* plotly-geo
2121
* jupyterlab
2222

23+
### A note about anaconda
24+
25+
![XKCD 1987: Python Environment](../fig/xkcd_python_environment.png)
26+
27+
Python can live in many different places on your computer, and each source may have different packages already installed.
28+
By using an anaconda environment that we create, and by explicitly using only that environment, we can avoid conflicts...
29+
and know exactly what environment is being used to run our python code. And we avoid the mess indicated by the above comic!
30+
2331
## Create an environment from the `environment.yml` file
2432

2533
The necessary packages are specified in the `environment.yml` file.
2634
Open your terminal, and navigate to the project directory. Then, take a look at the contents.
2735

2836
~~~
2937
cd ~/Desktop/data_viz_workshop
30-
ls
38+
ls -F
3139
~~~
3240
{: .language-bash}
3341

42+
> ## ls -F
43+
> For a refresher on bash commands, refer to the [Unix Shell](http://swcarpentry.github.io/shell-novice/) lesson.
44+
> `ls` lists the contents of a directory, and the `-F` flag will add a `/` to directories to more clearly distinguish between directories and files.
45+
{: .callout}
46+
47+
~~~
48+
Data/ environment.yml
49+
~~~
50+
{: .output}
51+
3452
You should now see an `environment.yml` file and a `Data` directory.
3553

3654
Make sure that conda is working on your machine. You can verify this with:
@@ -40,6 +58,16 @@ conda env list
4058
~~~
4159
{: .language-bash}
4260

61+
~~~
62+
# conda environments:
63+
#
64+
base * /opt/anaconda3
65+
66+
# other environments you have already created will be listed here.
67+
# the * indicates the currently active environment
68+
~~~
69+
{: .output}
70+
4371
This will list all of your conda environments. You should make sure that you do not already have an environment called `dataviz`, or it will be overwritten. If you do already have an environment called `dataviz`, you can change the environment name by editing the first line in the `environment.yml` file.
4472

4573
Now, you need to create a new environment using this `environment.yml` file. To do this, type in the command line:
@@ -59,6 +87,20 @@ conda list
5987
~~~
6088
{: .language-bash}
6189

90+
~~~
91+
# packages in environment at /opt/anaconda3/envs/dataviz:
92+
#
93+
# Name Version Build Channel
94+
abseil-cpp 20210324.2 he49afe7_0 conda-forge
95+
altair 4.1.0 py_1 conda-forge
96+
anyio 3.3.0 py39h6e9494a_0 conda-forge
97+
...
98+
zipp 3.5.0 pyhd8ed1ab_0 conda-forge
99+
zlib 1.2.11 h7795811_1010 conda-forge
100+
zstd 1.5.0 h582d3a0_0 conda-forge
101+
~~~
102+
{: .output}
103+
62104
Now we will need to tell Jupyter that this environment exists and should be made available as a kernel in Jupyter Lab.
63105

64106
~~~

_episodes/02-data-wrangling.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ We are going take this very wide dataset and make it very long, so the unit of o
3030
Let's go ahead and get started by opening a Jupyter Notebook with the `dataviz` kernel. If you navigated to the `Data` folder to look at the CSV file, navigate back to the root before opening the new notebook.
3131
We are also going to rename this new notebook to `data_wrangling.ipynb`.
3232

33+
![Jupyter Lab - Notebooks - dataviz kernel](../fig/jupyter_lab_dataviz_notebook.png)
34+
3335
Jupyter Notebooks are very handy because we can combine documentation (markdown cells) with our program (code cells) in a reader-friendly way.
3436
Let's make our first cell into a markdown cell, and give this notebook a title:
3537

@@ -84,7 +86,7 @@ cols
8486
Now, we can call `pd.melt()` and pass `cols` rather than typing out the whole list.
8587

8688
~~~
87-
df_melted = pd.melt(df, id_vars=['country', 'continent'], value_vars = cols
89+
df_melted = pd.melt(df, id_vars=['country', 'continent'], value_vars = cols)
8890
df_melted
8991
~~~
9092
{: .language-python}
@@ -107,6 +109,25 @@ df_melted
107109
~~~
108110
{: .language-python}
109111

112+
Take a moment to compare this dataframe to the one we started with. What are some advantages to having the data in this format?
113+
114+
> ## Tidy Data
115+
> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
116+
> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
117+
>
118+
> Tidy data follows 3 rules:
119+
> 1. Each variable forms a column
120+
> 2. Each observation forms a row
121+
> 3. Each type of observational unit forms a table
122+
>
123+
> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html).
124+
>
125+
> The revised rules are:
126+
> 1. Each variable must have its own column
127+
> 2. Each observation must have its own row
128+
> 3. Each value must have its own cell
129+
{: .callout}
130+
110131
## Saving the final dataframe
111132

112133
Now that all of our columns contain the appropriate information, in a tidy/long format, it's time to save our dataframe back to a CSV file. But first, we're going to re-order our columns (and remove the now extra `variable` column) and sort the rows.
@@ -125,6 +146,7 @@ df_final.to_csv("Data/gapminder_tidy.csv", index=False)
125146
~~~
126147
{: .language-python}
127148

149+
We set the index to False so that the index column does not get saved to the CSV file.
128150

129151
{% include links.md %}
130152

_episodes/03-create-visualizations.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ Let's make our first cell into a markdown cell, and give this notebook a title:
2222
~~~
2323
{: .source}
2424

25+
Remember to also add some metadata and describe what this notebook does.
26+
2527
## Import our newly tidy data
2628

2729
First, we need to import pandas and Plotly Express, and then read in our dataframe.
@@ -54,7 +56,7 @@ df.query("country=='New Zealand'")
5456
This will select all of the rows where `country` is "New Zealand". We can add our second condition by either chaining another `query()` function or specifying the additional condition in the same `query()` function.
5557

5658
~~~
57-
df.query("country=='New Zealand'").query("metric=='gdpPercap")
59+
df.query("country=='New Zealand'").query("metric=='gdpPercap'")
5860
df.query("country=='New Zealand' & metric=='gdpPercap'")
5961
~~~
6062
{: .language-python}
@@ -74,9 +76,12 @@ fig.show()
7476
~~~
7577
{: .language-python}
7678

79+
![Plot of New Zealand's GDP over time](../fig/L3_firstplot.png)
80+
7781
There it is! Our first line plot.
7882

79-
## When you want multiple lines
83+
84+
## When you want to compare - adding more lines and labels
8085

8186
By itself, this plot of New Zealand's GDP isn't especially interesting. Let's add another line, to compare it to Australia.
8287

@@ -96,6 +101,9 @@ fig.show()
96101
~~~
97102
{: .language-python}
98103

104+
![Plot of Oceania's GDP over time](../fig/L3_secondplot.png)
105+
106+
99107
Great! This already looking better. But we should fix that y-axis label and add a title.
100108

101109
~~~
@@ -105,6 +113,8 @@ fig.show()
105113
~~~
106114
{: .language-python}
107115

116+
![Plot of Oceania's GDP over time with correct labels](../fig/L3_thirdplot.png)
117+
108118
You can go ahead and experiment with creating different plots for the different continents and metrics.
109119

110120
> ## Interactivity is baked in to Plotly charts

_episodes/04-create-streamlit-app.md

Lines changed: 56 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,15 @@ keypoints:
1919

2020
Now that our data and visualizations are prepped, it's finally time to create our Streamlit app.
2121

22-
## Creating app.py
22+
## Creating and starting the app
2323

2424
While you usually want to create Jupyter Notebooks in Jupyter Lab, you can also create other file types and have a terminal. We are going to use both of these capabilities.
2525

26-
From the Launcher, click on "Text File" under "Other" (make sure you are currently in your project root directory, and not the `data` folder). This will open a new file. By default, this will be a text file, but you can change this. Go ahead and save this empty file as `app.py`. Then we can add some import statements, and save the file again.
26+
From the Launcher, click on "Text File" under "Other" (make sure you are currently in your project root directory, and not the `data` folder). This will open a new file.
27+
28+
![Open a Text File](../fig/open_text_file.png)
29+
30+
By default, this will be a text file, but you can change this. Go ahead and save this empty file as `app.py` ("File" > "Save Text As..." > "app.py"). Then we can add some import statements, and save the file again.
2731

2832
~~~
2933
import streamlit as st
@@ -33,9 +37,55 @@ import plotly.express as px
3337
{: .language-python}
3438

3539
Next, go back to the Launcher and click on "Terminal" under "Other". This will launch a terminal window within Jupyter Lab.
40+
41+
![Open a Terminal](../fig/open_terminal.png)
42+
3643
If you type `pwd` and enter, you will see that you are currently in your project root.
44+
45+
~~~
46+
pwd
47+
~~~
48+
{: .language-bash}
49+
50+
~~~
51+
/Users/<you>/Desktop/data_viz_workshop
52+
~~~
53+
{: .output}
54+
3755
If you type `ls` and enter, you will see all of your files and directories.
38-
Make sure that you see `app.py`. We can also see what environment we are currently in with `conda env list`. There should be a * next to `dataviz`. If not, go ahead and type `conda activate dataviz`.
56+
57+
~~~
58+
ls
59+
~~~
60+
{: .language-bash}
61+
62+
~~~
63+
Data app.py data_visualizations.ipynb data_wrangling.ipynb environment.yml
64+
~~~
65+
{: .output}
66+
67+
Make sure that you see `app.py`. We can also see what environment we are currently in with `conda env list`. There should be a * next to `dataviz`.
68+
69+
~~~
70+
conda env list
71+
~~~
72+
{: .language-bash}
73+
74+
~~~
75+
# conda environments:
76+
#
77+
base /opt/anaconda3
78+
dataviz * /opt/anaconda3/envs/dataviz
79+
~~~
80+
{: .output}
81+
82+
If not, go ahead and type `conda activate dataviz`.
83+
84+
~~~
85+
conda activate dataviz
86+
~~~
87+
{: .language-bash}
88+
3989
Now that we know we are in the right place and have the right environment (the one with streamlit installed), we are going to start the Streamlit app.
4090

4191
~~~
@@ -98,14 +148,16 @@ df_gdp_o = df.query("continent=='Oceania' & metric=='gdpPercap'")
98148
99149
title = "GDP for countries in Oceania"
100150
fig = px.line(df_gdp_o, x = "year", y = "value", color = "country", title = title, labels={"value": "GDP Percap"})
101-
st.plotly_chart(fig)
151+
st.plotly_chart(fig, use_container_width=True)
102152
~~~
103153
{: .language-python}
104154

105155
You know the drill! Save, switch over to the Streamlit app, and click "Rerun".
106156

107157
We now have a web application that can allow you to share your interactive visualizations.
108158

159+
![Streamlit app after this lesson](../fig/streamlit_app_lesson4fin.png)
160+
109161
> ## Share your app online
110162
> Right now, our app only lives on our computer. Like Jupyter Lab, the app is displaying in a web browser but has the URL `localhost:####` (where #### represents the port number).
111163
> To easily make this app public and shared online, you can sign up for a [Streamlit Sharing](https://streamlit.io/sharing-sign-up) account. This will let you share up to 3 apps.

_episodes/05-refactoring-for-flexibility.md

Lines changed: 46 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,12 @@ print(new_query)
6565
~~~
6666
{: .language-python}
6767

68+
~~~
69+
continent=='Oceania' & metric=='gdpPercap'
70+
continent=='Oceania' & metric=='gdpPercap'
71+
~~~
72+
{: .output}
73+
6874
Notice how the two strings are identical?
6975

7076
The `new_query` variable is more flexible, because we can redefine the `continent` and `metric` variables. Go ahead and try it!
@@ -79,6 +85,11 @@ print(query)
7985
~~~
8086
{: .language-python}
8187

88+
~~~
89+
continent=='Europe' & metric=='pop'
90+
~~~
91+
{: .output}
92+
8293
It's important to isolate these `continent` and `metric` values because we can adjust them with our widgets.
8394

8495
Let's go ahead and try incorporating this into our plot (still in the Jupyter Notebook)
@@ -96,17 +107,30 @@ fig.show()
96107
~~~
97108
{: .language-python}
98109

99-
Do you notice any other places that we need to incorporate f-Strings? The title and axis lables!
110+
![Plot of Europe's population over time with wrong labels](../fig/L5_firstplot.png)
111+
112+
Something about this plot is funky... do you notice any other places where we need to incorporate f-Strings?
113+
114+
The title and axis lables!
100115

101116
~~~
102117
continent = "Europe"
103118
metric = "pop"
104119
105120
title = f"{metric} for countries in {continent}"
106121
labels = {"value": f"{metric}"}
122+
123+
print(title)
124+
print(labels["value"])
107125
~~~
108126
{: .language-python}
109127

128+
~~~
129+
pop for countries in Europe
130+
pop
131+
~~~
132+
{: .output}
133+
110134
Let's show that plot again, with our updated code:
111135

112136
~~~
@@ -122,7 +146,9 @@ fig.show()
122146
~~~
123147
{: .language-python}
124148

125-
There's just one more thing to tweak. "gdpPercap", "lifeExp", and "pop" aren't the prettiest labels. Let's map them to more display-friendly labels with a dictionary. Then we can call on this dictionary within our f-strings
149+
![Plot of Europe's population over time with correct labels](../fig/L5_secondplot.png)
150+
151+
There's just one more thing to tweak. "gdpPercap", "lifeExp", and "pop" aren't the prettiest labels. Let's map them to more display-friendly labels with a dictionary. Then we can call on this dictionary within our f-strings:
126152

127153
~~~
128154
metric_labels = {"gdpPercap": "GDP Per Capita", "lifeExp": "Average Life Expectancy", "pop": "Population"}
@@ -146,23 +172,39 @@ fig.show()
146172
~~~
147173
{: .language-python}
148174

175+
![Plot of Europe's population over time with better labels](../fig/L5_thirdplot.png)
176+
149177
## Getting lists of possible values
150178

151179
There is one last step before we can be ready to create our widgets. We need a list of all continents and all metrics, so that users can select from valid options. To do this, we will use pandas' `unique()` function.
152180

153181
~~~
154-
df['continents'].unique()
182+
df['continent'].unique()
155183
~~~
156184
{: .language-python}
157185

186+
~~~
187+
array(['Africa', 'Americas', 'Asia', 'Europe', 'Oceania'], dtype=object)
188+
~~~
189+
{: .output}
190+
158191
See how we get every possible value in the `continents` column exactly once? Let's define this as a list, assign it to a variable, and do the same thing for `metric`.
159192

160193
~~~
161194
continent_list = list(df['continent'].unique())
162195
metric_list = list(df['metric'].unique())
196+
197+
print(continent_list)
198+
print(metric_list)
163199
~~~
164200
{: .language-python}
165201

202+
~~~
203+
['Africa', 'Americas', 'Asia', 'Europe', 'Oceania']
204+
['gdpPercap', 'lifeExp', 'pop']
205+
~~~
206+
{: .output}
207+
166208
These lists will be used when defining our widgets.
167209

168210
## Update app.py with our refactored code
@@ -196,6 +238,7 @@ st.plotly_chart(fig, use_container_width=True)
196238
~~~
197239
{: .language-python}
198240

241+
![Streamlit app after this lesson](../fig/streamlit_app_lesson5fin.png)
199242

200243
{% include links.md %}
201244

0 commit comments

Comments
 (0)