This blog is my personal study notes about data visualization with python. Contents within this blog would be very basic, you are welcome to look through them and provide your valuable feedback. Thanks.
Week 2
Basic Visualization Tools
Area Plots
- Aka area chart, area graph
- represent cumulated totals using numbers or percentages
- based on the line plot
Say this is our dataset:
Sort the dataframe using total
:
df_can.sort_values([‘Total’], ascending = False, axis = 0, inplace = True)
Let’s see how to have a dataframe as following:
Apparently we need to transpose the dataset with specific columns:
years = list(map(str, range(1980, 2014)))
df_can.sort_values(['Total'], ascending = False, axis = 0, inplace = True)
df_top5 = df_can.head()
df_top5 = df_top5[years].transpose()
Plot:
df_top5.plot(kind='area') # we have seen `kind = 'line` before
plt.title("...")
plt.ylabel("...")
plt.xlabel("...")
plt.show()
Output:
Histograms
Dist’n of frequency of a numeric dataset(variable).
Again, this is our dataset: (each row stands for a country and its relevant immigration info to Canada)
Say we are interested in the immigration situation in 2013:
df.can['2013'].plot(kind='hist')
plt.title("...")
plt.ylabel("...")
plt.xlabel("...")
plt.show()
Output:
Note: the bins is not aligned with the tick marks on the horizontal axis.
To make the result more effective:
import numpy as np
count, bin_edges = np.histogram(df_can['2013']) # divide/partition into 10 bins and save the frequency into `count`, and the bin edges into `bin_edges`
df_can['2013'].plot(kind='hist', xticks = bin_edges) # use the bin edges as the x-axis ticks
plt.title()
... ...
plt.show()
Bar Charts
Unlike histograms, compare the values of a variable at a given point in time. Eg, the immigration situations of iceland to Canada from 1980 to 2013.
Again, this is our dataset:
So how to draw the figure above?
years = list(map(str, range(1980, 2014)))
df_iceland = df_can.loc['iceland', years] # df.iloc[#,#]
df_iceland.plot(kind = 'bar')
plt.title("...")
...
plt.show()
Specialized Visualization Tools
Pie Charts
Our dataset: (Country is the index instead of a variable, and the column total
is also derived by taking sum.)
We could using df = df.set_index('Col1')
to set a column to the index. And then we could use df.index.names = [None]
to remove the index name. For the table below, we will remove the word Country
and the blanks within the same row then.
We could use the following code to generate a table below.
df_continents = df_can.groupby(‘Continent’, axis=0).sum()
Then plot:
df_continents['Total'].plot(kind='pie')
plt.title('...')
plt.show()
Output:
Note: in many cases, pie charts are not a good choice compared with bar chart, although they may look pretty attractive.
Box plots
Say we want to know the box plot of Japan’s immigration:
df_japan = df_can.loc['Japan', years].transpose()
df_japan.plot(kind='box')
...
plt.show()
Output:
Scatter Plots
Our data:
Say we want the scatter plot of total
between 1980 to 2013:
# we need to specify the x and y axis variables
df_total.plot(
kind='scatter',
x='year',
y='total',
)
plt.title("...")
... ...
plt.show()
Output:
Week 3
Advanced Visualization Tools
Waffle Charts
Not supported by matplotlib. Check PyWaffle.
Word Clouds
Not supported by matplotlib.
Seaborn and Regression Plots
- Based on Matplotlib
- Efficient. 20~ lines of code using matplotlib could be replaced by 5 fold using seaborn
Let’s look at regression plots. Say this is our data:
Using regplot
:
import seaborn as sns
ax = sns.regplot(x='year', y='total', data=df_tot)
Output:
We can change the color and marker by adding color='green', marker='+'
into regplot()
.
Visualizing Geospatial Data
Intro to Folium
- Create types of Leaflet maps
- Binding of data to a map for choropleth visualizations as well as passing visualiztions as markers on the map
- Has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox API keys
Say we want to center around Canada: (Try changing the zoom_start
value)
Change map styles by naming tiles=
:
Maps with Markers
We have seen how to create map centered around Canada:
canada_map = folium.Map(location=[56.130, -106.35], zoom_start=4)
canada_map
If we want to add something new to the map, we need to create one thing called feature group
. At first this group is empty.
# create a feature group
ontario = folium.map.FeatureGroup()
Then we need to add styles to this feature group by using add_child()
:
# style the feature group
ontario.add_child(
folium.features.CircleMarker(
[51.25, 85.32], radius = 5,
color = 'red', fill_color = 'Red'
)
)
Now the feature group already has something, let’s add the feature group to the map
canada_map.add_child(ontario)
Output:
To better express the map, add the following code:
folium.Marker([51.25, -85.32], popup='Ontario').add_to(canada_map)
Output:
Wrap up together:
canada_map = folium.Map(location=[56.130, -106.35], zoom_start=4)
# create a feature group
ontario = folium.map.FeatureGroup()
# style the feature group
ontario.add_child(
folium.features.CircleMarker(
[51.25, 85.32], radius = 5,
color = 'red', fill_color = 'Red'
)
)
# add the feature group to the map
canada_map.add_child(ontario)
# label the marker
folium.Marker([51.25, -85.32], popup='Ontario').add_to(canada_map)
# generate the map
canada_map
Choropleth Maps
Geojson File
Say this is our data:
Let’s create a world map:
Then add our data:
Week 4
Creating Dashboards with Plotly and Dash
Overview
-
How a dashboard can be used to answer critical business questions.
-
What high-level overview of popular dashboarding tools available in python.
-
How to use basic Plotly, plotly.graph_objects, and plotly express.
-
How to use Dash and basic overview of dash components (core and HTML).
-
How to add different elements (like text box, dropdown, graphs, etc) to the dashboard.
-
How to add interactivity to dash core and HTML components.
Dashboarding Overview
Using databoards to tell your story, data scientists.
Web-based Dashboarding
- Dash from Plotly
- Panel
- Voila
- Streamlit
Dashboard Tools
Here is a link: Python dashboarding tools.
Intro to Plotly
- Interactive, open-source
- support 40+ chart types
- includes chart types like statistical, financial, maps, scientific and 3-dimensional
- can be displayed in Jupyter notebook, saved to HTML, used in developing Python-built web applications
Plotly sub-modules
Plotly graph objects (low-level interface to figures, traces and layout) plotly.graph_objects.Figure
and Plotly Express (high-level wrapper).
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
# using numpy to generate samples
x = np.arange(12)
y = np.random.randint(50, 500, size=12)
Then the following line to create the figure.
Note Ploty.graph contains a JSON object which has a structure of dict. Here, ‘go’ is the plotly JSON object. Updating values of ‘go’ object keywords, chart can be plotted.
fig = go.Figure(data=go.Scatter(x=x, y=y))
fig.update_layout(title='...', xaxis_title='Month', yaxis_title='Sales')
fig.show()
Output:
We can also use plotly.express
to create the figure:
fig = px.line(x=x, y=y, title='...', labels=dict(x='Month', y='Sales'))
fig.show()
Side Notes
df.sample(n= , random_state= )
: randomly selectn
rows(obs) from the dataframe under the random seedrandom_state
line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()
explanation: dividedata
into different groups based onmonth
-> extract theArrDelay
from the data -> take themean
-> generate the index0,1,2,3...
for rows
Intro to Dash
Dash is:
- an open source UI python library from plotly
- easy to build GUI
- declarative and reactive
- rendered in web browser and can be deployed to servers
- inherently cross-platform and mobile ready
Two components: Core
and HTML
import dash_core_components as dcc
import dash_html_components as html
For html: layout, keyword arguments …
Then we create the layout -> add components to it.
Core
is higher level components that are interactive and are generated with JavaScript, HTML, and CSS through the React.js library.
Eg,
Make dashboards interactive
Callback is a python function that are automatically called by Dash whenever an input component’s property changes. @app.callback
@app.callback(Output, Input)
def callback_function:
... ...
... ...
... ...
return some_result
Eg, callback with one input:
Output: (Input
is changeable.)
Eg, callback with two inputs:
Output: