This blog is my personal study notes about data visualization with python. Contents within this blog would be very basic, you are welcome to look through them and provide your valuable feedback. Thanks.

Week 2

Basic Visualization Tools

Area Plots

  • Aka area chart, area graph
  • represent cumulated totals using numbers or percentages
  • based on the line plot

Say this is our dataset:

image

Sort the dataframe using total:

df_can.sort_values([‘Total’], ascending = False, axis = 0, inplace = True)

Let’s see how to have a dataframe as following:

image

Apparently we need to transpose the dataset with specific columns:

years = list(map(str, range(1980, 2014)))
df_can.sort_values(['Total'], ascending = False, axis = 0, inplace = True)
df_top5 = df_can.head()
df_top5 = df_top5[years].transpose()

Plot:

df_top5.plot(kind='area') # we have seen `kind = 'line` before

plt.title("...")
plt.ylabel("...")
plt.xlabel("...")

plt.show()

Output:
image

Histograms

Dist’n of frequency of a numeric dataset(variable).

Again, this is our dataset: (each row stands for a country and its relevant immigration info to Canada)
image

Say we are interested in the immigration situation in 2013:

df.can['2013'].plot(kind='hist')

plt.title("...")
plt.ylabel("...")
plt.xlabel("...")

plt.show()

Output:
image

Note: the bins is not aligned with the tick marks on the horizontal axis.

To make the result more effective:

import numpy as np

count, bin_edges = np.histogram(df_can['2013']) # divide/partition into 10 bins and save the frequency into `count`, and the bin edges into `bin_edges`

df_can['2013'].plot(kind='hist', xticks = bin_edges) # use the bin edges as the x-axis ticks

plt.title()
... ...

plt.show()

Bar Charts

Unlike histograms, compare the values of a variable at a given point in time. Eg, the immigration situations of iceland to Canada from 1980 to 2013.

image

Again, this is our dataset:

image

So how to draw the figure above?

years = list(map(str, range(1980, 2014)))

df_iceland = df_can.loc['iceland', years] # df.iloc[#,#]

df_iceland.plot(kind = 'bar')

plt.title("...")
...
plt.show()

Specialized Visualization Tools

Pie Charts

Our dataset: (Country is the index instead of a variable, and the column total is also derived by taking sum.)

We could using df = df.set_index('Col1') to set a column to the index. And then we could use df.index.names = [None] to remove the index name. For the table below, we will remove the word Country and the blanks within the same row then.

image

We could use the following code to generate a table below.

df_continents = df_can.groupby(‘Continent’, axis=0).sum()

image

Then plot:

df_continents['Total'].plot(kind='pie') 

plt.title('...')

plt.show()

Output:
image

Note: in many cases, pie charts are not a good choice compared with bar chart, although they may look pretty attractive.

Box plots

image

Say we want to know the box plot of Japan’s immigration:

df_japan = df_can.loc['Japan', years].transpose()

df_japan.plot(kind='box')

...

plt.show()

Output:
image

Scatter Plots

image

Our data: image

image

Say we want the scatter plot of total between 1980 to 2013:

# we need to specify the x and y axis variables
df_total.plot(
    kind='scatter',
    x='year',
    y='total',
)

plt.title("...")
... ...
plt.show()

Output: image

Week 3

Advanced Visualization Tools

Waffle Charts

image

Not supported by matplotlib. Check PyWaffle.

Word Clouds

image

Not supported by matplotlib.

Seaborn and Regression Plots

  • Based on Matplotlib
  • Efficient. 20~ lines of code using matplotlib could be replaced by 5 fold using seaborn

Let’s look at regression plots. Say this is our data:

image

Using regplot:

import seaborn as sns
ax = sns.regplot(x='year', y='total', data=df_tot)

Output:
image

We can change the color and marker by adding color='green', marker='+' into regplot().

Visualizing Geospatial Data

Intro to Folium

  • Create types of Leaflet maps
  • Binding of data to a map for choropleth visualizations as well as passing visualiztions as markers on the map
  • Has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox API keys

image

Say we want to center around Canada: (Try changing the zoom_start value)
image

Change map styles by naming tiles=:
image

image

Maps with Markers

We have seen how to create map centered around Canada:

canada_map = folium.Map(location=[56.130, -106.35], zoom_start=4)

canada_map

If we want to add something new to the map, we need to create one thing called feature group. At first this group is empty.

# create a feature group
ontario = folium.map.FeatureGroup()

Then we need to add styles to this feature group by using add_child():

# style the feature group
ontario.add_child(
    folium.features.CircleMarker(
    [51.25, 85.32], radius = 5,
    color = 'red', fill_color = 'Red'
    )
)

Now the feature group already has something, let’s add the feature group to the map

canada_map.add_child(ontario)

Output: image

To better express the map, add the following code:

folium.Marker([51.25, -85.32], popup='Ontario').add_to(canada_map)

Output:
image

Wrap up together:

canada_map = folium.Map(location=[56.130, -106.35], zoom_start=4)

# create a feature group
ontario = folium.map.FeatureGroup()

# style the feature group
ontario.add_child(
    folium.features.CircleMarker(
    [51.25, 85.32], radius = 5,
    color = 'red', fill_color = 'Red'
    )
)

# add the feature group to the map
canada_map.add_child(ontario)

# label the marker
folium.Marker([51.25, -85.32], popup='Ontario').add_to(canada_map)

# generate the map
canada_map

Choropleth Maps

image

Geojson File

image

Say this is our data:

image

Let’s create a world map:

image

Then add our data:

image

Week 4

Creating Dashboards with Plotly and Dash

Overview

  • How a dashboard can be used to answer critical business questions. 

  • What high-level overview of popular dashboarding tools available in python.

  • How to use basic Plotly, plotly.graph_objects, and plotly express.

  • How to use Dash and basic overview of dash components (core and HTML).

  • How to add different elements (like text box, dropdown, graphs, etc) to the dashboard.

  • How to add interactivity to dash core and HTML components.

Dashboarding Overview

Using databoards to tell your story, data scientists.

Web-based Dashboarding

  • Dash from Plotly image
  • Panel image
  • Voila image
  • Streamlit image

Dashboard Tools

image

Here is a link: Python dashboarding tools.

Intro to Plotly

  • Interactive, open-source
  • support 40+ chart types
  • includes chart types like statistical, financial, maps, scientific and 3-dimensional
  • can be displayed in Jupyter notebook, saved to HTML, used in developing Python-built web applications

Plotly sub-modules

Plotly graph objects (low-level interface to figures, traces and layout) plotly.graph_objects.Figure and Plotly Express (high-level wrapper).

import plotly.graph_objects as go
import plotly.express as px
import numpy as np

# using numpy to generate samples
x = np.arange(12)
y = np.random.randint(50, 500, size=12)

Then the following line to create the figure.
Note Ploty.graph contains a JSON object which has a structure of dict. Here, ‘go’ is the plotly JSON object. Updating values of ‘go’ object keywords, chart can be plotted.

fig = go.Figure(data=go.Scatter(x=x, y=y))
fig.update_layout(title='...', xaxis_title='Month', yaxis_title='Sales')
fig.show()

Output: image

We can also use plotly.express to create the figure:

fig = px.line(x=x, y=y, title='...', labels=dict(x='Month', y='Sales'))
fig.show()

Side Notes

  • df.sample(n= , random_state= ): randomly select n rows(obs) from the dataframe under the random seed random_state
  • line_data = data.groupby('Month')['ArrDelay'].mean().reset_index() explanation: divide data into different groups based on month -> extract the ArrDelay from the data -> take the mean -> generate the index 0,1,2,3... for rows

Intro to Dash

Dash is:

  • an open source UI python library from plotly
  • easy to build GUI
  • declarative and reactive
  • rendered in web browser and can be deployed to servers
  • inherently cross-platform and mobile ready

Two components: Core and HTML

import dash_core_components as dcc
import dash_html_components as html

For html: layout, keyword arguments …

image

Then we create the layout -> add components to it.

image

image

Core is higher level components that are interactive and are generated with JavaScript, HTML, and CSS through the React.js library.

Eg,

image
image
image

Make dashboards interactive

Callback is a python function that are automatically called by Dash whenever an input component’s property changes. @app.callback

@app.callback(Output, Input)
def callback_function:
    ... ...
    ... ...
    ... ...
    return some_result

Eg, callback with one input:

image

image

image

image

Output: (Input is changeable.) image

Eg, callback with two inputs:

image

image

Output:
image