carpentries-incubator
diff --git a/‎_episodes/01-create-new-environment.md‎
Lines changed: 43 additions & 1 deletion b/‎_episodes/01-create-new-environment.md‎
Lines changed: 43 additions & 1 deletion
diff --git a/‎_episodes/02-data-wrangling.md‎
Lines changed: 23 additions & 1 deletion b/‎_episodes/02-data-wrangling.md‎
Lines changed: 23 additions & 1 deletion
diff --git a/‎_episodes/03-create-visualizations.md‎
Lines changed: 12 additions & 2 deletions b/‎_episodes/03-create-visualizations.md‎
Lines changed: 12 additions & 2 deletions
diff --git a/‎_episodes/04-create-streamlit-app.md‎
Lines changed: 56 additions & 4 deletions b/‎_episodes/04-create-streamlit-app.md‎
Lines changed: 56 additions & 4 deletions
diff --git a/‎_episodes/05-refactoring-for-flexibility.md‎
Lines changed: 46 additions & 3 deletions b/‎_episodes/05-refactoring-for-flexibility.md‎
Lines changed: 46 additions & 3 deletions
@@ -20,17 +20,35 @@ This workshop utilizes some Python packages (such as Plotly) that cannot be inst
 * plotly-geo
 * jupyterlab
 
+### A note about anaconda
+
+![XKCD 1987: Python Environment](../fig/xkcd_python_environment.png)
+
+Python can live in many different places on your computer, and each source may have different packages already installed. 
+By using an anaconda environment that we create, and by explicitly using only that environment, we can avoid conflicts...
+and know exactly what environment is being used to run our python code. And we avoid the mess indicated by the above comic!
+
 ## Create an environment from the `environment.yml` file
 
 The necessary packages are specified in the `environment.yml` file. 
 Open your terminal, and navigate to the project directory. Then, take a look at the contents.
 
 ~~~
 cd ~/Desktop/data_viz_workshop
-ls
+ls -F
 ~~~
 {: .language-bash}
 
+> ## ls -F
+> For a refresher on bash commands, refer to the [Unix Shell](http://swcarpentry.github.io/shell-novice/) lesson. 
+> `ls` lists the contents of a directory, and the `-F` flag will add a `/` to directories to more clearly distinguish between directories and files.
+{: .callout}
+
+~~~
+Data/    environment.yml
+~~~
+{: .output}
+
 You should now see an `environment.yml` file and a `Data` directory.
 
 Make sure that conda is working on your machine. You can verify this with: 
@@ -40,6 +58,16 @@ conda env list
 ~~~
 {: .language-bash}
 
+~~~
+# conda environments:
+#
+base                  *   /opt/anaconda3
+
+# other environments you have already created will be listed here.
+# the * indicates the currently active environment
+~~~
+{: .output}
+
 This will list all of your conda environments. You should make sure that you do not already have an environment called `dataviz`, or it will be overwritten. If you do already have an environment called `dataviz`, you can change the environment name by editing the first line in the `environment.yml` file.
 
 Now, you need to create a new environment using this `environment.yml` file. To do this, type in the command line:
@@ -59,6 +87,20 @@ conda list
 ~~~
 {: .language-bash}
 
+~~~
+# packages in environment at /opt/anaconda3/envs/dataviz:
+#
+# Name                    Version                   Build  Channel
+abseil-cpp                20210324.2           he49afe7_0    conda-forge
+altair                    4.1.0                      py_1    conda-forge
+anyio                     3.3.0            py39h6e9494a_0    conda-forge
+...
+zipp                      3.5.0              pyhd8ed1ab_0    conda-forge
+zlib                      1.2.11            h7795811_1010    conda-forge
+zstd                      1.5.0                h582d3a0_0    conda-forge
+~~~
+{: .output}
+
 Now we will need to tell Jupyter that this environment exists and should be made available as a kernel in Jupyter Lab.
 
 ~~~
 
@@ -30,6 +30,8 @@ We are going take this very wide dataset and make it very long, so the unit of o
 Let's go ahead and get started by opening a Jupyter Notebook with the `dataviz` kernel. If you navigated to the `Data` folder to look at the CSV file, navigate back to the root before opening the new notebook. 
 We are also going to rename this new notebook to `data_wrangling.ipynb`.
 
+![Jupyter Lab - Notebooks - dataviz kernel](../fig/jupyter_lab_dataviz_notebook.png)
+
 Jupyter Notebooks are very handy because we can combine documentation (markdown cells) with our program (code cells) in a reader-friendly way.
 Let's make our first cell into a markdown cell, and give this notebook a title:
 
@@ -84,7 +86,7 @@ cols
 Now, we can call `pd.melt()` and pass `cols` rather than typing out the whole list.
 
 ~~~
-df_melted = pd.melt(df, id_vars=['country', 'continent'], value_vars = cols
+df_melted = pd.melt(df, id_vars=['country', 'continent'], value_vars = cols)
 df_melted
 ~~~
 {: .language-python}
@@ -107,6 +109,25 @@ df_melted
 ~~~
 {: .language-python}
 
+Take a moment to compare this dataframe to the one we started with. What are some advantages to having the data in this format?
+
+> ## Tidy Data
+> The term "tidy data" may be most popular in the R ecosystem (the "tidyverse" is a collection of R packages designed around the tidy data philosophy), but it is applicable to all tabular datasets, not matter what programming language you are using to wrangle your data.
+> You can ready more about the tidy data philosophy in Hadley Wickham's 2014 paper, "Tidy Data", available [here](https://vita.had.co.nz/papers/tidy-data.pdf).
+>
+> Tidy data follows 3 rules:
+> 1. Each variable forms a column
+> 2. Each observation forms a row
+> 3. Each type of observational unit forms a table
+>
+> Wickham later refined and revised the tidy data philosophy, and published it in the 12th chapter of his open access textbook "R for Data Science" - available [here](https://r4ds.had.co.nz/tidy-data.html). 
+>
+> The revised rules are:
+> 1. Each variable must have its own column
+> 2. Each observation must have its own row
+> 3. Each value must have its own cell
+{: .callout}
+
 ## Saving the final dataframe
 
 Now that all of our columns contain the appropriate information, in a tidy/long format, it's time to save our dataframe back to a CSV file. But first, we're going to re-order our columns (and remove the now extra `variable` column) and sort the rows.
@@ -125,6 +146,7 @@ df_final.to_csv("Data/gapminder_tidy.csv", index=False)
 ~~~
 {: .language-python}
 
+We set the index to False so that the index column does not get saved to the CSV file.
 
 {% include links.md %}
 
@@ -22,6 +22,8 @@ Let's make our first cell into a markdown cell, and give this notebook a title:
 ~~~
 {: .source}
 
+Remember to also add some metadata and describe what this notebook does.
+
 ## Import our newly tidy data
 
 First, we need to import pandas and Plotly Express, and then read in our dataframe.
@@ -54,7 +56,7 @@ df.query("country=='New Zealand'")
 This will select all of the rows where `country` is "New Zealand". We can add our second condition by either chaining another `query()` function or specifying the additional condition in the same `query()` function.
 
 ~~~
-df.query("country=='New Zealand'").query("metric=='gdpPercap")
+df.query("country=='New Zealand'").query("metric=='gdpPercap'")
 df.query("country=='New Zealand' & metric=='gdpPercap'")
 ~~~
 {: .language-python}
@@ -74,9 +76,12 @@ fig.show()
 ~~~
 {: .language-python}
 
+![Plot of New Zealand's GDP over time](../fig/L3_firstplot.png)
+
 There it is! Our first line plot.
 
-## When you want multiple lines
+
+## When you want to compare - adding more lines and labels
 
 By itself, this plot of New Zealand's GDP isn't especially interesting. Let's add another line, to compare it to Australia.
 
@@ -96,6 +101,9 @@ fig.show()
 ~~~
 {: .language-python}
 
+![Plot of Oceania's GDP over time](../fig/L3_secondplot.png)
+
+
 Great! This already looking better. But we should fix that y-axis label and add a title.
 
 ~~~
@@ -105,6 +113,8 @@ fig.show()
 ~~~
 {: .language-python}
 
+![Plot of Oceania's GDP over time with correct labels](../fig/L3_thirdplot.png)
+
 You can go ahead and experiment with creating different plots for the different continents and metrics.
 
 > ## Interactivity is baked in to Plotly charts
 
@@ -19,11 +19,15 @@ keypoints:
 
 Now that our data and visualizations are prepped, it's finally time to create our Streamlit app.
 
-## Creating app.py
+## Creating and starting the app
 
 While you usually want to create Jupyter Notebooks in Jupyter Lab, you can also create other file types and have a terminal. We are going to use both of these capabilities.
 
-From the Launcher, click on "Text File" under "Other" (make sure you are currently in your project root directory, and not the `data` folder). This will open a new file. By default, this will be a text file, but you can change this. Go ahead and save this empty file as `app.py`. Then we can add some import statements, and save the file again.
+From the Launcher, click on "Text File" under "Other" (make sure you are currently in your project root directory, and not the `data` folder). This will open a new file. 
+
+![Open a Text File](../fig/open_text_file.png)
+
+By default, this will be a text file, but you can change this. Go ahead and save this empty file as `app.py` ("File" > "Save Text As..." > "app.py"). Then we can add some import statements, and save the file again.
 
 ~~~
 import streamlit as st
@@ -33,9 +37,55 @@ import plotly.express as px
 {: .language-python}
 
 Next, go back to the Launcher and click on "Terminal" under "Other". This will launch a terminal window within Jupyter Lab. 
+
+![Open a Terminal](../fig/open_terminal.png)
+
 If you type `pwd` and enter, you will see that you are currently in your project root. 
+
+~~~
+pwd
+~~~
+{: .language-bash}
+
+~~~
+/Users/<you>/Desktop/data_viz_workshop
+~~~
+{: .output}
+
 If you type `ls` and enter, you will see all of your files and directories. 
-Make sure that you see `app.py`. We can also see what environment we are currently in with `conda env list`. There should be a * next to `dataviz`. If not, go ahead and type `conda activate dataviz`. 
+
+~~~
+ls
+~~~
+{: .language-bash}
+
+~~~
+Data                      app.py                    data_visualizations.ipynb data_wrangling.ipynb      environment.yml
+~~~
+{: .output}
+
+Make sure that you see `app.py`. We can also see what environment we are currently in with `conda env list`. There should be a * next to `dataviz`. 
+
+~~~
+conda env list
+~~~
+{: .language-bash}
+
+~~~
+# conda environments:
+#
+base                     /opt/anaconda3
+dataviz               *  /opt/anaconda3/envs/dataviz
+~~~
+{: .output}
+
+If not, go ahead and type `conda activate dataviz`. 
+
+~~~
+conda activate dataviz
+~~~
+{: .language-bash}
+
 Now that we know we are in the right place and have the right environment (the one with streamlit installed), we are going to start the Streamlit app.
 
 ~~~
@@ -98,14 +148,16 @@ df_gdp_o = df.query("continent=='Oceania' & metric=='gdpPercap'")
 
 title = "GDP for countries in Oceania"
 fig = px.line(df_gdp_o, x = "year", y = "value", color = "country", title = title, labels={"value": "GDP Percap"})
-st.plotly_chart(fig)
+st.plotly_chart(fig, use_container_width=True)
 ~~~
 {: .language-python}
 
 You know the drill! Save, switch over to the Streamlit app, and click "Rerun".
 
 We now have a web application that can allow you to share your interactive visualizations.
 
+![Streamlit app after this lesson](../fig/streamlit_app_lesson4fin.png)
+
 > ## Share your app online
 > Right now, our app only lives on our computer. Like Jupyter Lab, the app is displaying in a web browser but has the URL `localhost:####` (where #### represents the port number).
 > To easily make this app public and shared online, you can sign up for a [Streamlit Sharing](https://streamlit.io/sharing-sign-up) account. This will let you share up to 3 apps. 
 
@@ -65,6 +65,12 @@ print(new_query)
 ~~~
 {: .language-python}
 
+~~~
+continent=='Oceania' & metric=='gdpPercap'
+continent=='Oceania' & metric=='gdpPercap'
+~~~
+{: .output}
+
 Notice how the two strings are identical?
 
 The `new_query` variable is more flexible, because we can redefine the `continent` and `metric` variables. Go ahead and try it!
@@ -79,6 +85,11 @@ print(query)
 ~~~
 {: .language-python}
 
+~~~
+continent=='Europe' & metric=='pop'
+~~~
+{: .output}
+
 It's important to isolate these `continent` and `metric` values because we can adjust them with our widgets.
 
 Let's go ahead and try incorporating this into our plot (still in the Jupyter Notebook)
@@ -96,17 +107,30 @@ fig.show()
 ~~~
 {: .language-python}
 
-Do you notice any other places that we need to incorporate f-Strings? The title and axis lables!
+![Plot of Europe's population over time with wrong labels](../fig/L5_firstplot.png)
+
+Something about this plot is funky... do you notice any other places where we need to incorporate f-Strings? 
+
+The title and axis lables!
 
 ~~~
 continent = "Europe"
 metric = "pop"
 
 title = f"{metric} for countries in {continent}"
 labels = {"value": f"{metric}"}
+
+print(title)
+print(labels["value"])
 ~~~
 {: .language-python}
 
+~~~
+pop for countries in Europe
+pop
+~~~
+{: .output}
+
 Let's show that plot again, with our updated code:
 
 ~~~
@@ -122,7 +146,9 @@ fig.show()
 ~~~
 {: .language-python}
 
-There's just one more thing to tweak. "gdpPercap", "lifeExp", and "pop" aren't the prettiest labels. Let's map them to more display-friendly labels with a dictionary. Then we can call on this dictionary within our f-strings
+![Plot of Europe's population over time with correct labels](../fig/L5_secondplot.png)
+
+There's just one more thing to tweak. "gdpPercap", "lifeExp", and "pop" aren't the prettiest labels. Let's map them to more display-friendly labels with a dictionary. Then we can call on this dictionary within our f-strings:
 
 ~~~
 metric_labels = {"gdpPercap": "GDP Per Capita", "lifeExp": "Average Life Expectancy", "pop": "Population"}
@@ -146,23 +172,39 @@ fig.show()
 ~~~
 {: .language-python}
 
+![Plot of Europe's population over time with better labels](../fig/L5_thirdplot.png)
+
 ## Getting lists of possible values
 
 There is one last step before we can be ready to create our widgets. We need a list of all continents and all metrics, so that users can select from valid options. To do this, we will use pandas' `unique()` function.
 
 ~~~
-df['continents'].unique()
+df['continent'].unique()
 ~~~
 {: .language-python}
 
+~~~
+array(['Africa', 'Americas', 'Asia', 'Europe', 'Oceania'], dtype=object)
+~~~
+{: .output}
+
 See how we get every possible value in the `continents` column exactly once? Let's define this as a list, assign it to a variable, and do the same thing for `metric`.
 
 ~~~
 continent_list = list(df['continent'].unique())
 metric_list = list(df['metric'].unique())
+
+print(continent_list)
+print(metric_list)
 ~~~
 {: .language-python}
 
+~~~
+['Africa', 'Americas', 'Asia', 'Europe', 'Oceania']
+['gdpPercap', 'lifeExp', 'pop']
+~~~
+{: .output}
+
 These lists will be used when defining our widgets.
 
 ## Update app.py with our refactored code
@@ -196,6 +238,7 @@ st.plotly_chart(fig, use_container_width=True)
 ~~~
 {: .language-python}
 
+![Streamlit app after this lesson](../fig/streamlit_app_lesson5fin.png)
 
 {% include links.md %}