Regional Supermarket Sales

SQL - Data Cleansing and Exploratory Analysis

Tableau - Visual Presentation via Interactive Dashboard

Backstory

In my fictional role as data analyst for a consulting agency, I’ve been asked to review three files detailing the sales and inventory performance of competitor supermarkets in the Baltimore, Maryland, and surrounding areas. My company hopes to gain some insights they can use to make recommendations to our new client, who has their sights set on opening a chain of their own in the region.

My three datasets contain product information, inventory statistics, and sales metrics by location and store. With over three years of data, I shouldn’t have any problem uncovering some hidden gems.

Process

Glancing over the data provided, I form a few basic exploratory questions I believe will help my goal.

  1. Which stores have performed the best in sales (in total, by year)?
  2. Should location be a determining factor for the client?
  3. Since we have inventory data, can I find correlation between the inventory management approach and overall profitability?

With the foundations of my methodology nailed down, I load my tables into SQL and begin the cleaning process. After a few checks on distinct values and formatting, I update the tables with new field names, and other necessary modifications.

Using a compilation of aggregations, JOINs, subqueries, and CTE’s, I dive deep into the supermarket data tables with my exploratory questions and an open mind to let the data tell me its story.

I create many different queries to unravel any mysteries and answer new questions that come up through my exploration. However, some of the key queries I found relevant, I store into separate VIEWs for later analysis and reporting. These include:

  • Top sales totals by store over the full length of time
  • Top sales totals by year
  • Top potential sales based on inventory levels
  • Total sales by location
  • Top unique products based on total sales value
  • Product breakdown of quantity sold, average price, and total sales (by location and store)

Visualization

Before presenting my findings, I load my VIEWs into Tableau. I don’t want to overwhelm the stakeholder but preparing for questions I haven’t already thought about is at the top of my mind. I could have loaded the entire database but I’m confident that my subsets of data bring to light the key elements for conclusive action.

One final step before creating my individual charts and dashboard is to transform the location data. Unfortunately, I was only provided with the neighborhood of each store location. Thanks to a quick Google search, I gathered the corresponding zip codes for each neighborhood and integrated them into Tableau as a location alias.

Since the consultants working the account haven’t asked or given any direction on visual aids, I whip up a simple dashboard to highlight the KPIs I found relevant.

Luckily, I had some spare time!

Conclusions

Let’s look back at my initial questions and explore the results. First, which stores performed the best over the time period?

Whether looking at sales as a whole, or by each individual year, there were three stores that faired the best. These were Ben Franklin, Shopko, and Family Dollar. However, Burlington Coat Factory, Renys, T.J. Maxx, and Kmart, all had high performing years that should be considered for any actionable recommendations.

Top 10 Sales by Store and Year

Store Year Total Sales
Ben Franklin 2017 $383,764.05
Ben Franklin 2020 $381,020.85
Ben Franklin 2018 $374,340.80
Ben Franklin 2019 $373,399.83
Family Dollar 2020 $357,342.37
Shopko 2020 $356,432.46
Shopko 2017 $355,659.04
Shopko 2019 $354,607.35
Family Dollar 2018 $346,013.81
Family Dollar 2017 $342,923.02
$3,625,503.58

As for the second question, regarding location, I do believe that location plays a unique role in potential success.

By examining the three top stores (Ben Franklin, Shopko, and Family Dollar), all three are located just west of downtown Baltimore. If I expand the search based on total sales above 1.3 million, I find a similar result include locations to the west of Baltimore.

However, there is one outlier, located in Charles Village, just north of Baltimore. This location accounted for the highest volume of sales, just over 1.7 million. The two stores located in this area are Ollie’s Bargain Outlet and Fred Meyer. The locations nearest Charles Village did not fare as well, which could be indicative of the population segment.

The third question initially posed is related to individual product purchases. Were there any correlations between store success and inventory items stocked? Based on the stats, no, there is no direct correlation between top selling products and inventory levels.

What does this mean? Maybe nothing. Maybe something. It would certainly take more data to say definitively. But for now, it doesn’t appear that stores are stocking up on their best-selling products.

For example, when I compare all products sold and their stocked quantities against the top ten products sold (by the three best stores), it’s clear that these items are ranked in the lower end of quantity percentages.

SQL Query Snippet

# Create a temporary table showing the top ten products sold and their key metrics, by the 3 best stores

CREATE temporary table temp_table AS 
	SELECT product_name,
		SUM(quantity_available) AS quantity_avilable,
		SUM(quantity) AS total_quantity, 
		ROUND(AVG(unit_price),2) AS avg_price, 
		ROUND(SUM(unit_price * quantity),2) AS total_sales
	FROM supermarket_products sp
	JOIN supermarket_sales ss
		ON sp.product_id = ss.product_id
	JOIN supermarket_inventory si
		ON ss.product_id = si.product_id
	WHERE supplier IN ('Ben Franklin','Shopko','Family Dollar')
	GROUP BY product_name, supplier, neighborhood
	ORDER BY total_sales DESC, total_quantity DESC
	LIMIT 10;

# Create a secondary temp table containing a single column of product names from the primary temp table

CREATE temporary table t2 AS
	SELECT product_name 
	FROM temp_table;

# Querying all products from each store, then filtering on top products from the 3 best stores.
# The lower the ranking (1-4), the less inventory they stocked for that product

SELECT product_name,
	SUM(quantity_available) AS quantity_available,
	SUM(quantity) AS total_quantity, 
	ROUND(AVG(unit_price),2) AS avg_price, 
	ROUND(SUM(unit_price * quantity),2) AS total_sales,
	DENSE_RANK() OVER(PARTITION BY product_name ORDER BY SUM(quantity_available) DESC) AS ranking
FROM supermarket_products sp
JOIN supermarket_sales ss
	ON sp.product_id = ss.product_id
JOIN supermarket_inventory si
	ON ss.product_id = si.product_id
WHERE product_name IN (SELECT * FROM t2)
GROUP BY product_name, supplier, neighborhood
ORDER BY total_sales DESC, total_quantity DESC;

In addition, strictly looking at total sales, the top items inventoried are vastly different when we compare the best sellers for Ben Franklin, Shopko, and Family Dollar. However, you have to factor in that these alternative product sales numbers have more to do with individual cost in most cases.

Recommendations

With the data we have available, there are two clear certainties we can pass on to the consultant lead for the client.

  1. Location matters! With an expansive dataset, we could factor in customer demographics and weigh things like income and household capacity to find more profitable areas. But my recommendation would be to scout areas west of downtown Baltimore for the first chain location. The data shows a high confidence that successful sales numbers will be found there.


  2. Manage inventory with a conservative mindset. Not only will it ease the financial stress many new businesses experience, but mimicking the success of current competitors will bare fruit in the long run. Not only will the new store have time to analyze sales for proper course corrections, but the additional funds provided can go toward marketing initiatives.

    Furthermore, we have a list of products and average prices that do well in each location. This will help the client remain competitive in their market and plan initial stock purchases judiciously.

Behind the Scenes

Telling you my thought process and all the steps I took with this analysis is all fine and dandy. But nothing beats hands-on experience. So, to have a peek at the original data, the SQL queries I used, and the Tableau workbook, click the link below!

My GITHUB

Coffee Chain Analysis

Excel is as popular as they come in the business world. So, let's play data analyst consultant and help a moderately sized coffee sales chain make decisions to improve their bottom line.

Learn more

Data Analyst Skills Job Search

Let's analyze the desired skills obtained from a Google search and see where you stand. Using Tableau, filter by company or job board and get an idea of what the your salary would be for your skill set.

Learn more