compare the same data from two columns

Compare the same data from two columns in Power Query

Sometimes, it is useful to compare the same data from two columns. So, here is a simple function to do it in Power Query. The columns may be in the same table or in different tables. It provides data profiling in a very simple way.

How to compare the same data from two columns in Power Query

  • Open Power Query and create a new blank query
  • Copy the code down below and paste it in the ‘Advanced editor’
compare
  • Then select tables and columns and click on ‘Invoke’ button

After you executed the function, the result of the analysis for each value will be one of the following statuses :

  • Duplicated value in both tables, in first table only, in second table only
  • Unique value in both tables, in first table only, in second table only

The function is auto explained and you may change the last step easily to fit to your needs.

(Table1 as table, Column1 as text, Table2 as table, Column2 as text) => 

let
/* For tests only
    Table1 = Table.FromList({"x", "x"}, null, {"First column"}),
    Column1 = "First column",
    Table2 = Table.FromList({"x", "y"}, null, {"Second column"}),
    Column2 = "Second column",
*/
    
    Source = Table.SelectColumns(Table1,{Column1}),
    #"Added Table" = Table.AddColumn(Source, "Table", each "First table"),
    #"Renamed First table column" = Table.RenameColumns(#"Added Table",{{Column1, Column2}}),
    #"Appended second table" = Table.Combine({#"Renamed First table column", Table.SelectColumns(Table2, {Column2})}),
    #"Renamed Compared value" = Table.RenameColumns(#"Appended second table",{{Column2, "Compared value"}}),
    #"Replaced null" = Table.ReplaceValue(#"Renamed Compared value",null,"Second table", Replacer.ReplaceValue, {"Table"}),
    #"Added 1" = Table.AddColumn(#"Replaced null", "Custom", each 1),
    #"Pivoted Column" = Table.Pivot(#"Added 1", List.Distinct(#"Added 1"[Table]), "Table", "Custom", List.Count),
    #"Changed Type" = Table.TransformColumnTypes(#"Pivoted Column",{{"First table", Int64.Type}, {"Second table", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Analysis", each
        if [First table] > 1 and [Second table] > 1 then "Duplicated value in both tables" else 
        if [First table] > 1 then "Duplicated value in first table only" else
        if [Second table] > 1 then "Duplicated value in second table only" else
        
        if [First table] = 1 and [Second table] = 0 then "Unique value in first table only" else 
        if [First table] = 0 and [Second table] = 1 then "Unique value in second table only" else 
        if [First table] = 1 and [Second table] = 1 then "Unique value in both tables" else 
         "")
in
    #"Added Custom"

At last, after you compare the same data from two columns, you may filter on the status to reach your goals.

In conclusion, this useful function may help in many different cases. The simplest one is to find duplicates.

Check an other way to find duplicates : Duplicates detection using different methods in Power Query

Have a look at some videos about data profiling in Power Query : Watch videos with Google search

Custom analyser

Custom analyser

With ‘Custom analyser’ feature, Power BI Sidetools users can use external tools built by the…

Read More

4 Comments

  1. Hello,
    Thank you for this very useful Power Query. Can you please to adapt it in order to be not case sensitive, I mean “A=a” and not A≠a for example.
    Thank you in advance
    Best Regards

  2. Hi, thank you for the useful function.
    I’m using it to check 2 columns within same table.
    However, First Table returned correct result while the Second Table returned all 0.
    I did not modify any function. I just specify the Table 1 & 2 and Column names.

    Appreciate your help to explain where could have gone wrong and how to correct the error.

    Thanks,
    Aloysius

Leave a Reply

Your email address will not be published. Required fields are marked *