# Improving file load speeds in Azure Data Factories

Here’s my latest tutorial on Azure: Improving file load speeds.

Here’s my latest tutorial on Azure: Improving file load speeds.

It’s been a while since I’ve posted and quite a bit has happened. I’ve starting working on providing user experiences with Azure Analysis Services and PowerBI. With that in mind I’ve rebuilt our SSIS packages in Logic Apps/Pipelines (best combination ever!), created Azure Hyperscale and Data Warehouse servers,Azure Analysis Services tabular models, and even toyed Azure Cosmos DB instances. I’ve learned a few tips and tricks that were hard to find so I decided to pass this information on, but this time: in video!

The link below is my first YouTube tutorial. I think for the foreseeable future I’ll be doing videos as I think they lend themselves to technical tutorials much more than blogs.

In publishing, and I’m sure many other industries, we get data at both the daily level and the weekly level. To properly tie these two pieces of data you need to aggregate the daily data and adjust the date to align with the weekly date. Our weekly data is set to the Saturday of the week.

As an example all daily data acquired between 10/21/2018 and 10/27/2018 must have a WeekEndingDate of 10/27/2018. This can be done with a case statement, but a there is an inline formula that will come in handy when doing projections (YTD, previous 6 months, etc). I’ll present the formula and then show modifications to handle variations.

select dateadd(dd,7-datepart(dw,'10/21/2018'),'10/21/2018')

The above formula determines how many days away from 7 the current date is and simply adds that many days. We use 7, because Saturday is the 7th day of the week. I’ll break it down.

datepart(dw,'10/21/2018')

The above snippet returns a value of 2, because it is a Monday. 7 if it is a Saturday. You subtract 7 from that to determine how many days you have to move forward. In the example, you need to move 5 days from Monday to make it to Saturday. This gives you the number of days to add. Reducing the example:

select dateadd(dd,5,'10/21/2018')

This comes in very handy when you are looking to move backward in time a certain number of days, weeks, months, or years but align to a Saturday. An example is below, which goes back 1 month.

select dateadd(dd,(datepart(dw,7-dateadd(mm,-1,'10/21/2018')),dateadd(mm,-1,'10/21/2018'))

Here is a simple way to remember how to use the formula. Remember you have to move your date first, then push it to Saturday.

declare @var date select @var=dateadd(mm,-1,'10/2/2018') select dateadd(dd,7-datepart(dw,@var),@var)

Here is a simple method of finding the median value using SQL. The first will be a simple example that finds the median number of units sold for a given book (ISBN) over a period of time. There will be 3 sections of code (that will be combined using CTEs). First we need to find the median number, that is for a given book, how many times was it sold. The sales_data* *table contains* *3 columns*: ISBN, SaleDate, UnitsSold*

select ISBN, ceiling(count(*)/2.0) as median from sales_data group by ISBN

The use of ceiling in the above equation accounts for the fact that if we have only 1 sale for an ISBN, then 1/2=0, whereas **ceiling**(1/2.0)=1. The need for the median to start at 1 and not 0 will become evident in the text below.

select ISBN, Units_Sold, ROW_NUMBER() OVER(Partition by ISBN Order by Units_Sold) as r_num from sales_data

Since we want to know the median of *UnitsSold we must **Order by** Units*Sold. Using the **row number()**

ISBN Units_Sold r_num

9780000000001 10 1

9780000000001 13 2

9780000000001 20 3

Our *median* will be **ceiling**(3/2.0)=2. That means the *median* value for Units_Sold for ISBN=9780000000001 is 13. The full code is below. Please note I create this as a view so there is no need to be a **;** before the first **with**.

Create view MedianUnitsSold as with base as ( select ISBN, ceiling(count(*)/2.0) as median from sales_data group by ISBN ), population as ( select ISBN, Units_Sold, ROW_NUMBER() OVER(Partition by ISBN Order by Units_Sold) as r_num from sales_data ) select p.ISBN, p.Units_Sold from population p join base b on (b.ISBN=p.ISBN and b.median=p.r_num)

As always, I hope this post has been able to help you solve a problem that you’ve had or provided you with some new insight. Feel free to post comments below.

I came across a situation where I need unique numbers for both a large and small set of data. The standard approach is to create a table with an identity column and do your inserts. For a table with tens of thousands of records, this might be the best approach, but there is another one: * ROW_NUMBER()*.

The key to using the* ROW_NUMBER()* function is to provide a constant so you will get a contiguous sequence. Let's look at sample data.

Format

AB

AL

CD

CE

GG

LI

Let's look at the function:

ROW_NUMBER() OVER (Partition by <group> order by <columns>)

I can first construct the function as follows

select Format,

ROW_NUMBER() OVER(PARTITION BY ** Format **ORDER BY FORMAT) as id

However this will give us the following results:

id Format

1 AB

1 AL

1 CD

1 CE

1 GG

1 LI

The reason is your **PARTITION BY** creates a unique number for each format. To solve this problem you need to put a value that will never change over the life of the select. I use **1**.

select Format,

ROW_NUMBER() OVER(PARTITION BY * 1* ORDER BY FORMAT) as id

The new result set is:

id Format

1 AB

2 AL

3 CD

4 CE

5 GG

6 LI

Hope this helps improve your sql. As always, leave comments below.