Shey Jean-Philippe Data Analytics Portfolio

Project 1: An Exploration of LA Metro Bike Share Data

In this article, I will be exploring the data from the Metro Bike Share system in Los Angeles. From this page I downloaded the trip data from 2021 Q4 (October -December) and bike station information from Station Table.

The data containes the following fields:

Trip Data   Station Data
trip_id end_station station_id
duration end_lat station_name
start_time end_lon go_live_date
end_time bike_id region
start_station plan_duration status
start_lat trip_route_category  
start_lon passholder_type  
bike_type    

Data Cleaning

Before digging in, I always spend some time understanding how the data is laid out and look out for any duplicate or irrelevant data. I cleaned and filtered the data using EXCEL before importing it into PostgreSQL. I made the following changes in EXCEL:

This left me with the following summary values:

To view the SQL code and other images associated with this report visit: sjphilippe github

Stakeholder Questions

I structured my analysis to describe the data from 3 perspectives that bike share stakeholders would be interested in: user data, station data, and bike data. Visualizations were created using Tableau.

User Questions:

  1. How many active users were their between October- December?
  2. What is the ratio of rider pass type?
  3. What is the average duration of trips per rider pass type?

Station Questions:

  1. How many active stations are in each region?
  2. What are the 10 top most popular stations?

Bike Questions:

  1. How many of each bike type are currently in rotation?
  2. How many trips were taken by each type of type?
  3. What was the number of trips taken by the most ridden bike? How long was it used for?
  4. Which station did the most popular bike frequently dock at?

User Analysis

For user analysis I wanted to understand how different rider passes used the Metro service. This included: ratios of pass type, the average durations bikes were used for and how bike usage fluctuated over time. The SQL code to extract this info is seen below:

Select pass_holder, 
	Count(Distinct trip_id) As pass_holder_trips,
	Avg(duration) avg_duration
From metro_trips
Group by pass_holder
Order by pass_holder_trips

LA Metro Bike Share categorizes its riders via passes. There are four distinct pass types:

At the time this data was pulled, Metro Bike Share hosted a total of 62,644 riders. Monthly riders dominated bike usage with a total of 30,993 riders, followed by walkups, annual pass and one day passes. One day pass riders accounted for the greatest duration of bike usage with an average ride time of 75 min for one-way trips and 112 minutes for round trips.

A notable increase in riders was seen on October 10, 2021 which correlated with the CICLAVIA Heart of LA event. This event closed several miles of street in the center of LA to allow for pedestrian traffic on roadways. The jump in riders was most noticeable in walkup passes and one day passes. Walkup passes increased by 344 rides (112% increase) and one day pass increased by 312 (300% increase) compared to the previous day.

Station Analysis

For this analysis I focused on station popularity. Station popularity was determined by how many riders ended their rides at a particular location. To do this I joined both the metro trips and station tables. I excluded virtual stations from the query to better compare physical locations only. See query below:

Select Distinct ms.station_name, 
	region, 
	Count(trip_id) As trips_traveled
From metro_trips As mt
Join metro_stations As ms
On mt.end_station = ms.station_id
Where station_name <> 'Virtual Station'
Group by ms.station_name, region
Order by trips_traveled DESC
Limit 10

There are 217 active stations spanning 3 regions in the Los Angeles area:
139 in Down Town Los Angeles (DTLA)
23 in North Hollywood
55 in Westside

Out of these 217 stations, the 3 most popular stations were:

  1. Downtown Santa Monica E Line station in the Westside region with 2,832
  2. Ocean Front Walk in the Westside region with 2,306.
  3. 7th and Flower in the DTLA region with 2,090 trips

The below images show the density of trips to each station in their respected region. The larger and darker stations had the most riders

Bike Analysis

For bike analysis, I focused on frequency of use. This included which bike type was used the most and which bike had the most trips. See query below:

Select bike_type, Count(distinct trip_id) As num_trips_taken
From metro_trips
Group by bike_type
Order by num_trips_taken DESC;

Select bike_id, 
	bike_type, 
	Max(duration), 
	Count(trip_id) As num_of_trips
From metro_trips
Group by bike_id, bike_type
Order by num_of_trips DESC

I also went one step further to see which station the most popular bike visited. See query below:

Select station_name, Count(trip_id) As num_visits
From metro_trips As mt
Join metro_stations As ms
On mt.end_station = ms.station_id
Where bike_id = 16837
Group by station_name
Order by num_visits DESC

There are 1,309 bikes in rotation:

Although standard bikes account for the majority of trips taken (49,654 of 62,647 trips), electric bikes are by far the most favored bike type. The **top 20 most popular bikes are all listed as electric, averaging over 100+ rides.

Bike id 16837 was the most used bike, with 343 trips total. For comparison, the most popular standard bike was used for 149 trips, and the most popular smart bike was used just 9 times. Bike id 16837’s most frequented end station was 7th and Flower in the DTLA region for a total of 32 trips.

Conclusion

This is my first analytic project and I truly enjoyed the challenge of extracting meaningful information from the dataset. I plan to revisit this project in the future as a comparison to showcase my increased knowledge in visualizations, analysis and SQL. Some things I would like to add:


To view the SQL code and other images associated with this report visit:

sjphilippe github