Compare the same data from two columns in Power Query
Sometimes, it is useful to compare the same data from two columns. So, here is a simple function to do it in Power Query. The columns may be in the same table or in different tables. It provides data profiling in a very simple way.
How to compare the same data from two columns in Power Query
- Open Power Query and create a new blank query
- Copy the code down below and paste it in the ‘Advanced editor’
- Then select tables and columns and click on ‘Invoke’ button
After you executed the function, the result of the analysis for each value will be one of the following statuses :
- Duplicated value in both tables, in first table only, in second table only
- Unique value in both tables, in first table only, in second table only
The function is auto explained and you may change the last step easily to fit to your needs.
(Table1 as table, Column1 as text, Table2 as table, Column2 as text) =>
let
/* For tests only
Table1 = Table.FromList({"x", "x"}, null, {"First column"}),
Column1 = "First column",
Table2 = Table.FromList({"x", "y"}, null, {"Second column"}),
Column2 = "Second column",
*/
Source = Table.SelectColumns(Table1,{Column1}),
#"Added Table" = Table.AddColumn(Source, "Table", each "First table"),
#"Renamed First table column" = Table.RenameColumns(#"Added Table",{{Column1, Column2}}),
#"Appended second table" = Table.Combine({#"Renamed First table column", Table.SelectColumns(Table2, {Column2})}),
#"Renamed Compared value" = Table.RenameColumns(#"Appended second table",{{Column2, "Compared value"}}),
#"Replaced null" = Table.ReplaceValue(#"Renamed Compared value",null,"Second table", Replacer.ReplaceValue, {"Table"}),
#"Added 1" = Table.AddColumn(#"Replaced null", "Custom", each 1),
#"Pivoted Column" = Table.Pivot(#"Added 1", List.Distinct(#"Added 1"[Table]), "Table", "Custom", List.Count),
#"Changed Type" = Table.TransformColumnTypes(#"Pivoted Column",{{"First table", Int64.Type}, {"Second table", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Analysis", each
if [First table] > 1 and [Second table] > 1 then "Duplicated value in both tables" else
if [First table] > 1 then "Duplicated value in first table only" else
if [Second table] > 1 then "Duplicated value in second table only" else
if [First table] = 1 and [Second table] = 0 then "Unique value in first table only" else
if [First table] = 0 and [Second table] = 1 then "Unique value in second table only" else
if [First table] = 1 and [Second table] = 1 then "Unique value in both tables" else
"")
in
#"Added Custom"
At last, after you compare the same data from two columns, you may filter on the status to reach your goals.
In conclusion, this useful function may help in many different cases. The simplest one is to find duplicates.
Check an other way to find duplicates : Duplicates detection using different methods in Power Query
Have a look at some videos about data profiling in Power Query : Watch videos with Google search
Custom analyser
With ‘Custom analyser’ feature, Power BI Sidetools users can use external tools built by the…
DAX debugger in Power BI Sidetools
DAX debugger is meant to help Power BI report developers displaying sample data from a…
Manage your external tools easily with Power BI Sidetools
Since the Power BI team allowed to launch external tools from the Power BI desktop…
Easily export to Excel with Power BI Sidetools
While working in Power BI desktop, we often need to export to Excel some data…
Powerful text and file search in Visual Studio Code
As Power BI Sidetools creates files from your report, you are able to search through…
Compare DAX and M formulas in Visual Studio Code
Compare DAX and M formulas in Visual Studio Code is really easy with the help…
Hello,
Thank you for this very useful Power Query. Can you please to adapt it in order to be not case sensitive, I mean “A=a” and not A≠a for example.
Thank you in advance
Best Regards
Hello,
I suggest you convert your columns to uppercase or to lowercase before you call the function.
Regards
Hi, thank you for the useful function.
I’m using it to check 2 columns within same table.
However, First Table returned correct result while the Second Table returned all 0.
I did not modify any function. I just specify the Table 1 & 2 and Column names.
Appreciate your help to explain where could have gone wrong and how to correct the error.
Thanks,
Aloysius
Hello Aloysius,
I suppose your columns have a text data type. Could you please send to me a sample data in order I can reproduce the error.
Regards