Moving average formula

How to Calculate Rolling Average in Power BI

Do you need to calculate a rolling average in Power BI?

Rolling averages (or moving averages) allow the smoothing of data fluctuations over a specified period. While calculating it may seem complex, it’s a pretty straightforward process.

All you need to do is determine a fixed window or interval, take the average of the values within it, and then shift the window through the dataset.

The rolling average is a very useful tool because it conveniently highlights trends and portrays data patterns and trajectories with greater clarity and precision. It’s also a useful tool that simplifies the process of monitoring and tracking various trends in different areas.

In this article, you will learn how you can calculate rolling averages in Power BI.

Creating Rolling Average with DAX

The rolling average calculates the value of a particular dimension over a defined period. Therefore, to create a rolling average measure in Power BI, it is essential to have a date table in place.

When creating a date table for any time-based analysis, you should take certain requirements into account. These requirements are as follows.

The date table must:

  1. contain every day for all years within your fact table.
  2. have at least one field set as a Date or DateTime datatype.
  3. only contain unique date or datetime values, without repetition.
  4. be marked as a date table (best practice).

You can create a date table in Power BI using the Power Query editor or DAX’s CALENDAR OR CALENDARAUTO function.

Date Table = VAR MinYear = YEAR( MIN( ‘Sales Table’[transaction_date])) VAR MaxYear = YEAR( MAX( ‘Sales Table’[transaction_date])) RETURN ADDCOLUMNS( FILTER( CALENDARAUTO(), AND( YEAR( [Date] ) >= MinYear, YEAR( [Date] )  

If your data doesn’t come with a date table, this is a syntax you can use to create a date table. It’s a dynamic and reusable syntax that only requires you to change the ‘Sales Table’[transaction_date] column name to reflect the date column in your data set.

The syntax not only removes the need to extract date components into separate columns manually, as you would in Power Query, but it also streamlines the entire process.

Let’s break down the syntax step by step:

1. VAR MinYear = YEAR(MIN(‘Sales Table’[transaction_date])) – the MinYear variable is assigned the value of the minimum year from the ‘Sales Table’[transaction_date] column. It uses the YEAR function to extract the year component from the minimum date.

2. VAR MaxYear = YEAR(MAX(‘Sales Table’[transaction_date])) – The MaxYear variable is assigned the value of the maximum year from the ‘Sales Table’[transaction_date] column. It also uses the YEAR function to extract the year component from the maximum date.

3. RETURN – The RETURN statement includes the main calculation logic. It uses the ADDCOLUMNS function to create a date table whose values are based on the filtered years between MinYear and MaxYear.

Using the FILTER, CALENDARAUTO, and AND functions, ADDCOLUMNS creates a table with dates that fall within the year range specified by the MinYear and MaxYear variables.

Next, the Year, Month Number, Month Name, and Quarter columns are populated with their respective values with the help of the INT and FORMAT functions. These functions extract the desired values from the [Date] column created by the CALENDARAUTO function.

Now that you have created a date table, you can proceed with creating the moving average measure. You can do this in two ways – by using the AVERAGEX or DATESINPERIOD function.

Create a Rolling Average with the AVERAGEX Function

The AVERAGEX function is designed to calculate the arithmetic mean of an expression evaluated across each row in a specified table. This characteristic makes it an ideal tool for computing the rolling average.

In a way, you can say the AVERAGEX function calculates the rolling average of a dimension daily since it returns the arithmetic mean of a set of values.

To use it to calculate the moving average over longer periods, such as 7 days, 30 days, or a year, the AVERAGEX function would require modifications.

The syntax for utilizing the AVERAGEX function to calculate a 30-day moving average is as follows.

Moving Average (AVERAGEX) = VAR LastTransactionDate = MAX('Dates'[Transaction_Date]) VAR AverageDay = 30 VAR PeriodInVisual = FILTER( ALL( 'Dates'[Transaction_Date] ), AND( 'Dates'[Transaction_Date] > LastTransactionDate - AverageDay, 'Dates'[Transaction_Date]  

This syntax calculates a 30-day moving average using the AVERAGEX function. Let’s break it down step by step:

1. VAR LastTransactionDate = MAX('Dates'[Transaction_Date]) – This variable returns the maximum (latest) value from the ‘Dates'[Transaction_Date] column. It represents the reference point for calculating the moving average.

2. VAR AverageDay = 30 – This variable, AverageDay, sets the number of days used as the moving average window. In this case, it is set to 30 days.

4. VAR OutPut = CALCULATE(AVERAGEX('Dates', [Total Sales]), PeriodInVisual) –This variable, OutPut, uses the CALCULATE function along with the AVERAGEX function. It calculates the average of the [Total Sales] column for each date in the Dates table but considers only the dates included in the PeriodInVisual filtered table.

5. RETURN OutPut –This line returns the value of the OutPut variable, which represents the calculated moving average.

The syntax calculates the moving average by defining a window of the last 30 days from the maximum transaction date as defined by the current filter context. It then calculates the average of the [Total Sales] column within that window and returns the result as the moving average.

When you use the measure in a visual, at first glance, it looks like the measure only returns the arithmetic mean. And that’s what it does but for the first 29 days. On the 30th day, the Moving Average (AVERAGEX ) measure starts to return the 30-day average of [Total Sales] measure in this visual.

An alternative approach to understanding how the calculation works is by creating a moving total measure. You can do this by replacing the AVERAGEX with the SUMX function in the Moving Average (AVERAGEX) syntax.

When you place this measure in the visual, you will notice a discrepancy between the total value for January in the [Moving TOTAL (SUMX)] column and the [Total Sales] column.

This difference arises because the sum in the Moving [TOTAL (SUMX)] column is based on a 30-day period, whereas the [Total Sales] column considers the total for the entire month of January, which has 31 days.

If you divide the moving total value in the January 5 row by 5 (since it’s the total sales value over 5 days), you will get the moving average value in the [Moving Average (AVERAGEX)] column.

Same way when you divide the January 30 value in the moving total column by 30, you will get the moving average value in the [Moving Average (AVERAGEX)] column. And from this point on, every moving total value is divided by 30 to return the 30-day moving average.

If you want to use a different period for your running average calculation, you can change the value of the AverageDay variable to 7 or 365. This will return the weekly or yearly moving average values.

Create a Rolling Average with the DATESINPERIOD Function

Power BI has several time intelligence functions that assist in measuring over a predetermined period. One such function is DATESINPERIOD. It returns a single-column date table that starts on the chosen start date and continues backward or forwards for the specified interval duration.

The DATESINPERIOD function has the following parameters:

  • – a column containing dates.
  • – the start or end date of the period based on the interval’s positive or negative value.
  • – a positive or negative number that specifies the interval between the date period. When is negative, the dates are counted backwards from . Conversely, if is positive, the dates are counted forward from .
  • – the type of interval. This could be Year, Quarter, Month, or Day intervals.

Consider this case: If is -7 while is DAY, DATESINPERIOD would return a one-column date table containing all the dates from the previous week starting from the designated .

Moving AVERAGE (DATESINPERIOD) = CALCULATE( [Total Sales], DATESINPERIOD( Dates[Transaction_Date], MAX( Dates[Transaction_Date] ), -30, DAY ))/30 

Use this DATESINPERIOD syntax to calculate the running average.

Let’s break it down step by step:

1. DATESINPERIOD(Dates[Transaction_Date], MAX(Dates[Transaction_Date]), -30, DAY) –This function, DATESINPERIOD, is used to create a table of dates within a specified period. It takes four arguments

  • the column that contains the dates (‘Dates[Transaction_Date]’),
  • the end date of the period (MAX(Dates[Transaction_Date])),
  • the number of periods to go back (-30),
  • and the granularity of the period (DAY).

It returns a table of dates starting from the end date and going back by 30 days at a daily granularity.

2. CALCULATE([Total Sales], DATESINPERIOD(Dates[Transaction_Date], MAX(Dates[Transaction_Date]), -30, DAY)) – The CALCULATE function is used to modify the context in which the [Total Sales] measure is calculated. It takes two arguments: the measure to calculate [Total Sales], and the filter or context to apply – the DATESINPERIOD syntax. This modifies the calculation of [Total Sales] to only include the dates within the period defined by the DATESINPERIOND syntax.

3. /30 – This division is performed to obtain the average. Without this division, the syntax will only return the running total. Since the period is set to 30 days, dividing the value CALCULATE syntax returns by 30 provides the average sales per day within that period.

When you add the measure to a visual, you’ll get the running total for each date in the visual.

In general, the DATESINPERIOD and AVERAGEX moving average measures operate similarly. However, there is a subtle distinction in the values observed for the first 29 days.

The DATESINPERIOD measure divides the running total values by 30, while the AVERAGEX function takes the sum of the running total values and returns their arithmetic mean.

Nevertheless, as the duration for which the running average is being calculated progresses, the values for both measures tend to converge. Notably, on the 30th of January, the values for both measures become similar.

Another notable distinction between the DATESINPERIOD and AVERAGEX measures lies in their Total values. Specifically, the two measures yield different totals due to the way they handle blank rows within the data.

The AVERAGEX function excludes dates with blank rows from the calculation, leading to a discrepancy in the total values. On the other hand, the DATESINPERIOD measure does not exclude dates with blank rows.

In this table, the AVERAGEX function excludes April 30 from the calculation due to the presence of a blank row. As a result, the running total for that date is divided by 29 days instead of the intended 30-day period.

When selecting a syntax for calculating your rolling average measure, it is essential to consider this factor. While both options are viable, opting for the DATESINPERIOD syntax is the recommended choice if uncertainty exists regarding dates with blank values.

This ensures a more accurate and reliable calculation, considering any potential gaps in the data.

Conclusions

Power BI offers a comprehensive set of tools and techniques that are vital for analyzing data trends and patterns effectively.

By leveraging the functionalities of the DATESINPERIOD and AVERAGEX functions, you gain the ability to create insightful time-based calculations that greatly aid in identifying and understanding trends.

While the DATESINPERIOD and AVERAGEX measures operate similarly in many aspects, they do exhibit differences in handling the initial days and total values.

The DATESINPERIOD measure divides running total values by the specified period, whereas the AVERAGEX function calculates the arithmetic mean of the running total values, excluding blank rows.

Understanding these nuances is crucial for accurately interpreting and comparing results. It is also important to consider the specific requirements of the analysis and choose the appropriate measure accordingly.

Moving average formula

File

Download Worksheet (32.42 KB)

Excel formula: Moving average formula

Summary

To calculate a moving or rolling average, you can use a simple formula based on the AVERAGE function with relative references. In the example shown, the formula in E7 is:

=AVERAGE(C5:C7) 

As the formula is copied down, it calculates a 3-day moving average based on the sales value for the current day and the two previous days.

Below is a more flexible option based on the OFFSET function which handles variable periods.

About moving averages

A moving average (also called a rolling average) is an average based on subsets of data at given intervals. Calculating an average at specific intervals smooths out the data by reducing the impact of random fluctuations. This makes it easier to see overall trends, especially in a chart. The larger the interval used to calculate a moving average, the more smoothing that occurs, since more data points are included in each calculated average.

Explanation

The formulas shown in the example all use the AVERAGE function with a relative reference set up for each specific interval. The 3-day moving average in E7 is calculated by feeding AVERAGE a range that includes the current day and the two previous days like this:

=AVERAGE(C5:C7) // 3-day average 

The 5-day and 7-day averages are calculated in the same way. In each case, the range provided to AVERAGE is enlarged to include the required number of days:

=AVERAGE(C5:C9) // 5-day average =AVERAGE(C5:C11) // 7-day average 

All formulas use a relative reference for the range supplied to the AVERAGE function. As the formulas are copied down the column, the range changes at each row to include the values needed for each average.

When the values are plotted in a line chart, the smoothing effect is clear:

Moving average chart example

Insufficient data

If you start the formulas in the first row of the table, the first few formulas won't have enough data to calculate a complete average, because the range will extend above the first row of data:

Moving average range problem

This may or may not be an issue, depending on the structure of the worksheet, and whether it's important that all averages are based on the same number of values. The AVERAGE function will automatically ignore text values and empty cells, so it will continue to calculate an average with fewer values. This is why it "works" in E5 and E6.

One way to clearly indicate insufficient data is to check the current row number and abort with #NA when there are less than n values. For example, for the 3-day average, you could use:

=IF(ROW()-ROW($C$5)+1<3,NA(),AVERAGE(C3:C5)) 

The first part of the formula simply generates a "normalized" row number, starting with 1:

ROW()-ROW($C$5)+1 // relative row number 

In row 5, the result is 1, in row 6 the result is 2, and so on.

When the current row number is less than 3, the formula returns #N/A. Otherwise, the formula returns a moving average as before. This mimics the behavior of the Analysis Toolpak version of Moving Average, which outputs #N/A until the first complete period is reached.

Moving average with #n/a for insufficient data

However, as the number of periods increases, you will eventually run out of rows above the data and won't be able to enter the required range inside AVERAGE. For example, you can't set up a moving 7-day average with the worksheet as shown, since you can't enter a range that extends 6 rows above C5.

Variable periods with OFFSET

A more flexible way to calculate a moving average is with the OFFSET function. OFFSET can create a dynamic range, which means we can set up a formula where the number of periods is variable. The general form is:

=AVERAGE(OFFSET(A1,0,0,-n,1)) 

where n is the number of periods to include in each average. As above, OFFSET returns a range that is passed into the AVERAGE function. Below you can see this formula in action, where n is the named range E2. Starting at cell C5, OFFSET constructs a range that extends back to previous rows. This is accomplished by using a height equal to negative n. When E5 is changed to another number, the moving average recalculates on all rows:

Moving average with OFFSET function

The formula in E5, copied down, is:

=AVERAGE(OFFSET(C5,0,0,-n,1)) 

Like the original formula above, the version with OFFSET will also have the problem of insufficient data in the first few rows, depending on how many periods are given in E5.

In the example shown, the averages calculate successfully because the AVERAGE function automatically ignores text values and blank cells, and there are no other numeric values above C5. So, while the range passed into AVERAGE in E5 is C1:C5, there is only one value to average, 100. However, as periods increase, OFFSET will continue to create a range that extends above the start of the data, eventually running into the top of the worksheet and returning a #REF error.

One solution is to "cap" the size of the range to the number of data points available. This can be done by using the MIN function to restrict the number used for height as seen below:

Moving average with OFFSET function and capped range

=AVERAGE(OFFSET(C5,0,0,-(MIN(ROW()-ROW($C$5)+1,n)),1)) 

This looks pretty scary but is actually quite simple. We are limiting the height passed into OFFSET with the MIN function:

MIN(ROW()-ROW($C$5)+1,n) 

Inside MIN, the first value is a relative row number, calculated with:

ROW()-ROW($C$5)+1 // relative row number..1,2,3, etc. 

The second value given to MIN is the number of periods, n. When the relative row number is less than n, MIN returns the current row number to OFFSET for height. When the row number is greater than n, MIN returns n. In other words, MIN simply returns the smaller of the two values.

A nice feature of the OFFSET option is that n can be easily changed. If we change n to 7 and plot the results, we get a chart like this:

Moving average chart with OFFSET function

Note: A quirk with the OFFSET formulas above is that they won't work in Google Sheets, because the OFFSET function in Sheets won't allow a negative value for height or width. The attached spreadsheet has workaround formulas for Google Sheets.

Related formulas

Excel formula: Average last n rows

Average last n rows

In the worksheet shown, we have a list of values in column C. The goal is to dynamically average the last n values using the numbers in cell E5 for n . Since the list may grow over time, the key requirement is to average amounts by position. For convenience only, the values to average are in the.

Excel formula: Sum top n values

Sum top n values

In this example, the goal is to sum the largest n values in a set of data, where n is a variable that can be easily changed. For convenience, the range B5:B16 is named data . At a high level, the solution breaks down into two steps: (1) extract the n largest values from the data set and (2) sum the.

Excel formula: Sum top n values with criteria

Sum top n values with criteria

In this example, the goal is to sum the largest n values in a set of data after applying specific criteria. In the worksheet shown, we want to sum the three largest values, so n is equal to 3. At a high level, this problem breaks down into three separate steps: Apply criteria to select specific.

Excel formula: Average last 3 numeric values

Average last 3 numeric values

In this example, the goal is to average the last 3 numeric values in a set of data. The best solution depends on the version of Excel you have available. In the current version of Excel, this can be nicely solved with a formula based on the AVERAGE function , the FILTER function , and the TAKE.

https://www.powertechtips.com/rolling-average-power-bi/

Читать статью  Форекс советники pro

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *