Heat map technique with data smoothing

More than a year ago, a question was asked on the Power BI community in regards to showing a heat map in Power BI. Not just a simple heat map in the matrix visual, but a heat map based on data smoothing. I cannot find the link to that post (sorry for that!), but I want to share what I learned from it and how I solved it.

The business logic was to smooth data by taking the average of the surrounding two rows and two columns in both directions and to take the average of the value based on the bin size on the columns. In our case, we are assuming that the bin size value for the column is set to 5.

To explain this further, assume we have the following data in Excel and we want to calculate the average of Cell C3; we need to look at the adjacent two rows and two columns.

Smoothing Avg Data Example

In this case, the total values will be 243 (SUM A1:E5). For C3, which is in row 3, we are adding the values from row 1 to row 5, and similarly for column C we are adding the values from column A to column E; that’s what translates to A1 to E5 (SUM A1:E5). The average value for this cell (C3) will be 24 – 243 divided by 10. You get 10 by multiplying the bin size (5) by 2, and then 243 divided by 10 rounds to 24.

I hope this gives you an idea on how we are calculating the average of each cell to smooth our data. Let’s see how it all fits into Power BI and how the heat map looks after data smoothing.

Here is some sample data with three columns:

  • Length
  • Weight
  • Qty
Raw Data

In this example, Length will be on rows with bin size of 150 and Weight will be on columns with bin size of 5.

Let’s first add a bins column to our table in order to add the bin for the Length column.

  • Right click on the Length column
  • Select New Group
How to add bin?
  • In the New Groups window, add the bin name – in our case it is called Length (bins) and give it a Bin Size, i.e. 150
Bin Settings

Similarly, we are going to add Weight (bin) with the Bin Size value set to 5. Once the bins are added, we will have two new bin columns in our model: Weight (bin) and Length (bin).

Learn more about Grouping and Binning here.

Now it’s time to add a measure to calculate the average for each cell as explained in the beginning of this post, using the Excel sheet as an example.

Smoothing Avg =
VAR __weightBinSize = 5   --bin size of our columns
VAR __totalWeightBin = __weightBinSize * 2
VAR __myWeightBin = SELECTEDVALUE( FlatData[Weight (bins)] )  --get current weight bin
//establish range for weight bin. Go back two columns (bins) and go forward two columns (bins)
VAR __minWeightBin = __myWeightBin - __totalWeightBin   --go back 2 columns
VAR __maxWeightBin = __myWeightBin + __totalWeightBin   --go forward 2 columns
VAR __lengthBinSize = 150   --bin size of length columns
VAR __totalLengthBin = __lengthBinSize * 2
VAR __myconcSpan = SELECTEDVALUE( FlatData[Length (bins)] )  --get current length bin
//establish range for length bin, go back two rows and go forward two rows
VAR __minLengthBin = __myconcSpan - __totalLengthBin    --go back 2 rows
VAR __maxLengthBin = __myconcSpan + __totalLengthBin    --go forward 2 rows
//sum the quantity for the range, it is similar to A1..E5 in our excel example above
VAR __totalQty =
    CALCULATE(
        SUM ( FlatData[Qty] ),
        FlatData[Weight (bins)] >= __minWeightBin ,
        FlatData[Weight (bins)] <= __maxWeightBin ,
        FlatData[Length (bins)] >= __minLengthBin,
        FlatData[Length (bins)] <= __maxLengthBin
    )
RETURN
    DIVIDE( __totalQty, __totalWeightBin )  --calculate the average

Alright, now that our measure is in place we can create a matrix with a heat map. To do so, select the Matrix Visual and add Length (bins) on rows, Weight (bins) on columns, and our average measure (Smoothing Avg) in the Value section. This is what it will look like:

Matrix

Once all of the above is done, this is what the matrix will look like. I love numbers, but this looks super boring:

Raw Data

So, now it’s time to create a heat map, and for this we will use conditional formatting. Before that, however, let’s see what a heat map without data smoothing would look like:

Raw Data Heat Map

Looks pretty good, but let’s check out how it looks after we use the data smoothing average measure.

Heat Map with Smoothing Avg Measure

Looks better – but let’s take it even further. We really don’t need to see those numbers; in the conditional formatting window, let’s change the foreground and background to the same colour. Here is the output:

Heat Map with Smoothing Avg no number

I find the above heat map to be much more useful than a heat map without data smoothing.

And this is what we are doing in the conditional formatting window for both the foreground and background colour:

Conditional Formatting for Heat Map

Read more about conditional formatting here.

We can do a lot of amazing things in Power BI to visualize our data and this is just one example. How will you use it? As always, I’m looking forward to your feedback and learning from you all!

Useful Links

Learn more about Grouping and Binning here.
Read more about conditional formatting here.

Self-serve data prep at lightening fast speed with Power BI dataflows and Azure Data Lake

Power BI is an amazing tool as it fulfills all level of BI needs, whether you are using it for data exploration, as self-service BI to analyze the data, as an Enterprise BI solution, or embedding into your internal or external portal.

I have been working with Power BI since it was made available back in 2015 and have worked with many kinds of organizations and users. Power BI has come a long way since then and I am wholeheartedly enjoying this journey. Although I have seen different businesses and users using Power BI in different ways, I have noticed that there is one thing they all have in common – that users need to wait on the IT/technical team to prepare the data and store it somewhere before they can connect to the data and start working on analyzing/visualizing it. As we all know, data is the fuel in any BI tool and on time data availability with the correct shape/structure makes data analysis much easier. Without proper data, a BI tool is like having a Ferrari but not having the proper fuel to drive it. Sometime data prep takes longer than expected and business miss the opportunity to make smart and timely decisions. This is where Power BI dataflows come into the picture.

I have been working with Power BI dataflows for the last 9 months or so and worked on two major projects where dataflows helped in data preparation – including complex data transformation which used to take hours and can now be completed within minutes. It saved a lot of time for the business to try and get value out of their data and the business was now able to deliver the solution on time – leading to happy clients 😊. Of course, there were a few challenges on the way since this is a comparatively new technology, but at the end of the day everything worked out great.

In an upcoming meetup here in Vancouver, Canada, I will be speaking about my experience and journey using dataflows:

  • how has it helped organizations to prepare data at lightning speed?
  • what challenges have I faced?
  • what are some things to avoid?
  • what are the best practices?
  • and what is in the future?

The presentation I have prepared for this meetup is available to download at the link below. If you have any further questions, don’t hesitate to reach out by email or leave your comments below and I will get back to you ASAP!

Implement Row Level Security based on business rules in Power BI

Here is another interesting use case of Row Level Security (RLS). It’s funny that my first post was also about RLS – when you use RLS, lot of different challenges show up (which are not relevant and do not have any use cases without RLS).

In this post we are going to look at how we can implement RLS based on a certain business logic/condition.

Basics of Row Level Security (aka RLS)

Just a quick note on Row Level Security: it is essentially a technique to filter rows based on the user who is viewing the report. Here is a typical example: a Sales table contains data for all company sales with salesperson information. A single Power BI report is viewed by all the salespeople but we want each salesperson to see only their own sales. This is where RLS comes into the picture.

Based on the salesperson who is currently viewing the report, the data for other salespeople gets filtered out even though the dataset contains data for all the salespeople.

Parts of RLS

Who is the current user: Identify which user is currently logged in. Luckily, we have a DAX function called USERNAME() which returns the current user login name. Details of this function are provided in the link at the end of this post.

RLS role: It is basically a DAX expression that uses business logic to filter the rows from the user; it returns either TRUE() or FALSE(). The rows that return FALSE() value are hidden from the user.

Problem Statement

So now we know the basics of RLS, how it works, and what the key components are to successfully implement RLS.

We have the following table that contains Name and Room Size for each user. The problem we are trying to solve here is that based on the current user logged in, we want to take the room size of that user and show all the rows that are between -20% to +20% of the current user’s room size.

Here is the sample data:

Sample Data

Based on the above table, if user “A” is logged in we want to show all the rows where the room size is +/- 20% of the current user’s room size. User “A’s” room size is 20 and the range is going to be 16 to 24 (the lower range is 20 x 80% and the upper range is 20 x 120%). This means that we want to show all the rows where the room size is between 16 and 24, and as per our sample table we expect to see following rows when user “A” is logged in.

Data based on user “A”

In the case where user “C” is looking at the report, we will see the following rows (range 80 to 120)

Data based on user “C”

Solution

To implement this, we need the following measures:

  • Current user’s room size
  • Lower Range
  • Upper Range

In RLS role, return TRUE() if the room size falls between the lower and upper range – otherwise return FALSE() (hide rows from the user).

Quick list of steps to create a new role:

  1. Go to “Modelling” tab
  2. Click “Manage roles”
  3. Click “Create”
  4. Enter role name
  5. Select the table on which the business rule is going to be applied
  6. Enter role formula
  7. Click “Save”

Here is the DAX expression which will do the magic for us:

--get current user name
VAR __user = USERNAME()
--get current user's room size
VAR __roomSize =
CALCULATE (
    [Room Size],
    ALL ( 'Table' ),
    'Table'[Name] = __user
)
--get range
VAR __upperRange = __roomSize * 1.20
VAR __lowerRange = __roomSize * 0.80
--check room size in the range, return true or false
VAR __rls =
[Room Size] >= __lowerRange &&
[Room Size] <= __upperRange
--final return
RETURN __rls 


Let’s see how it works when we are viewing as user “A” or “C.”

Role Testing

I hope this post is helpful and provides some insight on how you can implement your business logic in RLS.

Your valuable and constructive feedback is always welcome and I am always open to learning from your interaction with my posts!

Credit

I would like to thank my daughter for helping me with this post. She is currently a third-year BCom student at UBC Sauder.