compare the same data from two columns

Compare the same data from two columns in Power Query

Sometimes, it is useful to compare the same data from two columns. So, here is a simple function to do it in Power Query. The columns may be in the same table or in different tables. It provides data profiling in a very simple way.

How to

  • Open Power Query and create a new blank query
  • Copy the code down below and paste it in the ‘Advanced editor’
  • Then select tables and columns and click on ‘Invoke’ button

After you executed the function, the result of the analysis for each value will be one of the following statuses :

  • Duplicated value in both tables, in first table only, in second table only
  • Unique value in both tables, in first table only, in second table only

The function is auto explained and you may change the last step easily to fit to your needs.

(Table1 as table, Column1 as text, Table2 as table, Column2 as text) => 

let
/* For tests only
    Table1 = Table.FromList({"x", "x"}, null, {"First column"}),
    Column1 = "First column",
    Table2 = Table.FromList({"x", "y"}, null, {"Second column"}),
    Column2 = "Second column",
*/
    
    Source = Table.SelectColumns(Table1,{Column1}),
    #"Added Table" = Table.AddColumn(Source, "Table", each "First table"),
    #"Renamed First table column" = Table.RenameColumns(#"Added Table",{{Column1, Column2}}),
    #"Appended second table" = Table.Combine({#"Renamed First table column", Table.SelectColumns(Table2, {Column2})}),
    #"Renamed Compared value" = Table.RenameColumns(#"Appended second table",{{Column2, "Compared value"}}),
    #"Replaced null" = Table.ReplaceValue(#"Renamed Compared value",null,"Second table", Replacer.ReplaceValue, {"Table"}),
    #"Added 1" = Table.AddColumn(#"Replaced null", "Custom", each 1),
    #"Pivoted Column" = Table.Pivot(#"Added 1", List.Distinct(#"Added 1"[Table]), "Table", "Custom", List.Count),
    #"Changed Type" = Table.TransformColumnTypes(#"Pivoted Column",{{"First table", Int64.Type}, {"Second table", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Analysis", each
        if [First table] > 1 and [Second table] > 1 then "Duplicated value in both tables" else 
        if [First table] > 1 then "Duplicated value in first table only" else
        if [Second table] > 1 then "Duplicated value in second table only" else
        
        if [First table] = 1 and [Second table] = 0 then "Unique value in first table only" else 
        if [First table] = 0 and [Second table] = 1 then "Unique value in second table only" else 
        if [First table] = 1 and [Second table] = 1 then "Unique value in both tables" else 
         "")
in
    #"Added Custom"

At last, after you compare the same data from two columns, you may filter on the status to reach your goals.

In conclusion, this useful function may help in many different cases. The simplest one is to find duplicates.

Check an other way to find duplicates : Duplicates detection using different methods in Power Query

Have a look at some videos about data profiling in Power Query : Watch videos with Google search

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.