Here is a simple method of finding the median value using SQL. The first will be a simple example that finds the median number of units sold for a given book (ISBN) over a period of time. There will be 3 sections of code (that will be combined using CTEs). First we need to find the median number, that is for a given book, how many times was it sold. The sales_data table contains 3 columns: ISBN, SaleDate, UnitsSold
select ISBN, ceiling(count(*)/2.0) as median from sales_data group by ISBN
The use of ceiling in the above equation accounts for the fact that if we have only 1 sale for an ISBN, then 1/2=0, whereas ceiling(1/2.0)=1. The need for the median to start at 1 and not 0 will become evident in the text below.
select ISBN, Units_Sold, ROW_NUMBER() OVER(Partition by ISBN Order by Units_Sold) as r_num from sales_data
Since we want to know the median of UnitsSold we must Order by UnitsSold. Using the rownumber() function we get a set of numbers, starting at 1 (here is the need for ceiling above) and continuing until we get a new ISBN. Once we get a new ISBN, the numbering restarts at 1. Let’s say that our test data looks like this:
ISBN Units_Sold r_num
9780000000001 10 1
9780000000001 13 2
9780000000001 20 3
Our median will be ceiling(3/2.0)=2. That means the median value for Units_Sold for ISBN=9780000000001 is 13. The full code is below. Please note I create this as a view so there is no need to be a ; before the first with.
Create view MedianUnitsSold as with base as ( select ISBN, ceiling(count(*)/2.0) as median from sales_data group by ISBN ), population as ( select ISBN, Units_Sold, ROW_NUMBER() OVER(Partition by ISBN Order by Units_Sold) as r_num from sales_data ) select p.ISBN, p.Units_Sold from population p join base b on (b.ISBN=p.ISBN and b.median=p.r_num)
As always, I hope this post has been able to help you solve a problem that you’ve had or provided you with some new insight. Feel free to post comments below.