
Coffee Chain Analysis
Excel is as popular as they come in the business world. So, let's play data analyst consultant and help a moderately sized coffee sales chain make decisions to improve their bottom line.
Learn moreSQL - Data Cleansing and Exploratory Analysis
Tableau - Visual Presentation via Interactive Dashboard
In my fictional role as data analyst for a consulting agency, I’ve been asked to review three files detailing the sales and inventory performance of competitor supermarkets in the Baltimore, Maryland, and surrounding areas. My company hopes to gain some insights they can use to make recommendations to our new client, who has their sights set on opening a chain of their own in the region.
My three datasets contain product information, inventory statistics, and sales metrics by location and store. With over three years of data, I shouldn’t have any problem uncovering some hidden gems.
Glancing over the data provided, I form a few basic exploratory questions I believe will help my goal.
With the foundations of my methodology nailed down, I load my tables into SQL and begin the cleaning process. After a few checks on distinct values and formatting, I update the tables with new field names, and other necessary modifications.
Using a compilation of aggregations, JOINs, subqueries, and CTE’s, I dive deep into the supermarket data tables with my exploratory questions and an open mind to let the data tell me its story.
I create many different queries to unravel any mysteries and answer new questions that come up through my exploration. However, some of the key queries I found relevant, I store into separate VIEWs for later analysis and reporting. These include:
Before presenting my findings, I load my VIEWs into Tableau. I don’t want to overwhelm the stakeholder but preparing for questions I haven’t already thought about is at the top of my mind. I could have loaded the entire database but I’m confident that my subsets of data bring to light the key elements for conclusive action.
One final step before creating my individual charts and dashboard is to transform the location data. Unfortunately, I was only provided with the neighborhood of each store location. Thanks to a quick Google search, I gathered the corresponding zip codes for each neighborhood and integrated them into Tableau as a location alias.
Since the consultants working the account haven’t asked or given any direction on visual aids, I whip up a simple dashboard to highlight the KPIs I found relevant.
Luckily, I had some spare time!Let’s look back at my initial questions and explore the results. First, which stores performed the best over the time period?
Whether looking at sales as a whole, or by each individual year, there were three stores that faired the best. These were Ben Franklin, Shopko, and Family Dollar. However, Burlington Coat Factory, Renys, T.J. Maxx, and Kmart, all had high performing years that should be considered for any actionable recommendations.
Store | Year | Total Sales |
---|---|---|
Ben Franklin | 2017 | $383,764.05 |
Ben Franklin | 2020 | $381,020.85 |
Ben Franklin | 2018 | $374,340.80 |
Ben Franklin | 2019 | $373,399.83 |
Family Dollar | 2020 | $357,342.37 |
Shopko | 2020 | $356,432.46 |
Shopko | 2017 | $355,659.04 |
Shopko | 2019 | $354,607.35 |
Family Dollar | 2018 | $346,013.81 |
Family Dollar | 2017 | $342,923.02 |
$3,625,503.58 |
As for the second question, regarding location, I do believe that location plays a unique role in potential success.
By examining the three top stores (Ben Franklin, Shopko, and Family Dollar), all three are located just west of downtown Baltimore. If I expand the search based on total sales above 1.3 million, I find a similar result include locations to the west of Baltimore.
However, there is one outlier, located in Charles Village, just north of Baltimore. This location accounted for the highest volume of sales, just over 1.7 million. The two stores located in this area are Ollie’s Bargain Outlet and Fred Meyer. The locations nearest Charles Village did not fare as well, which could be indicative of the population segment.
The third question initially posed is related to individual product purchases. Were there any correlations between store success and inventory items stocked? Based on the stats, no, there is no direct correlation between top selling products and inventory levels.
What does this mean? Maybe nothing. Maybe something. It would certainly take more data to say definitively. But for now, it doesn’t appear that stores are stocking up on their best-selling products.
For example, when I compare all products sold and their stocked quantities against the top ten products sold (by the three best stores), it’s clear that these items are ranked in the lower end of quantity percentages.
# Create a temporary table showing the top ten products sold and their key metrics, by the 3 best stores
CREATE temporary table temp_table AS
SELECT product_name,
SUM(quantity_available) AS quantity_avilable,
SUM(quantity) AS total_quantity,
ROUND(AVG(unit_price),2) AS avg_price,
ROUND(SUM(unit_price * quantity),2) AS total_sales
FROM supermarket_products sp
JOIN supermarket_sales ss
ON sp.product_id = ss.product_id
JOIN supermarket_inventory si
ON ss.product_id = si.product_id
WHERE supplier IN ('Ben Franklin','Shopko','Family Dollar')
GROUP BY product_name, supplier, neighborhood
ORDER BY total_sales DESC, total_quantity DESC
LIMIT 10;
# Create a secondary temp table containing a single column of product names from the primary temp table
CREATE temporary table t2 AS
SELECT product_name
FROM temp_table;
# Querying all products from each store, then filtering on top products from the 3 best stores.
# The lower the ranking (1-4), the less inventory they stocked for that product
SELECT product_name,
SUM(quantity_available) AS quantity_available,
SUM(quantity) AS total_quantity,
ROUND(AVG(unit_price),2) AS avg_price,
ROUND(SUM(unit_price * quantity),2) AS total_sales,
DENSE_RANK() OVER(PARTITION BY product_name ORDER BY SUM(quantity_available) DESC) AS ranking
FROM supermarket_products sp
JOIN supermarket_sales ss
ON sp.product_id = ss.product_id
JOIN supermarket_inventory si
ON ss.product_id = si.product_id
WHERE product_name IN (SELECT * FROM t2)
GROUP BY product_name, supplier, neighborhood
ORDER BY total_sales DESC, total_quantity DESC;
In addition, strictly looking at total sales, the top items inventoried are vastly different when we compare the best sellers for Ben Franklin, Shopko, and Family Dollar. However, you have to factor in that these alternative product sales numbers have more to do with individual cost in most cases.
With the data we have available, there are two clear certainties we can pass on to the consultant lead for the client.
Telling you my thought process and all the steps I took with this analysis is all fine and dandy. But nothing beats hands-on experience. So, to have a peek at the original data, the SQL queries I used, and the Tableau workbook, click the link below!
My GITHUBExcel is as popular as they come in the business world. So, let's play data analyst consultant and help a moderately sized coffee sales chain make decisions to improve their bottom line.
Learn moreLet's analyze the desired skills obtained from a Google search and see where you stand. Using Tableau, filter by company or job board and get an idea of what the your salary would be for your skill set.
Learn more