{"id":219,"date":"2019-01-20T15:30:15","date_gmt":"2019-01-20T14:30:15","guid":{"rendered":"https:\/\/didier-terrien.000webhostapp.com\/?p=219"},"modified":"2020-12-16T13:52:15","modified_gmt":"2020-12-16T12:52:15","slug":"duplicates-detection-using-different-methods-in-power-query","status":"publish","type":"post","link":"https:\/\/thebipower.fr\/index.php\/2019\/01\/20\/duplicates-detection-using-different-methods-in-power-query\/","title":{"rendered":"Duplicates detection in Power Query using different methods"},"content":{"rendered":"\n<p>As duplicates are a nightmare in databases, I wrote a function to identify them between two tables. Duplicates detection in Power Query is extremely useful for data consistency. Additionally, it helps to avoid relationships isues.<\/p>\n\n\n\n<p>You may also use the same function with only one table. The function is optionally insensitive to case, separators and accents. It uses three methods : <\/p>\n\n\n\n<ul><li>Power query fuzzy merge (not the best results but fast)<\/li><li>Cartesian product (provides distance and best results but very slow)<\/li><li>Ngram (provides distance but very slow)<\/li><\/ul>\n\n\n\n<p>If some of you have better solutions, I would be glad to know.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How to implement duplicates detection in Power Query<\/h2>\n\n\n\n<ul><li>Download the PBIX file :   <a rel=\"noreferrer noopener\" label=\"\" class=\"rank-math-link\" href=\"https:\/\/github.com\/didierterrien\/Power-BI-assistant\" target=\"_blank\">&#8216;Data cleansing.pbix&#8217;<\/a><\/li><\/ul>\n\n\n\n<ul><li>Open Power Query<\/li><li>Copy parameters into your own PBIX file : Fuzzy matching threshold, Duplicates detection method<\/li><li>Copy functions into your own PBIX file : Fn Cartesian product, Fn Ngram, Fn Duplicates analysis<\/li><li>Then, call the main function like this (documentation is inside the function) :<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>#\u201dFn Duplicates analysis\u201d(Table1 as table, Column1 as text, Index1 as text, Table2 as table, Column2 as text, Index2 as text, Options as text)<\/code><\/pre>\n\n\n\n<ul><li>Select the method in &#8216;Duplicates detection method&#8217; parameter<\/li><li>Select the threshold in &#8216;Fuzzy matching threshold&#8217; parameter (between 0 and 1)<\/li><li>Apply changes and refresh the report <\/li><\/ul>\n\n\n\n<p> I hope this duplicates detection in Power Query will help you. Please share your methods in the comments down below.<\/p>\n\n\n\n<p>Check an other way to find duplicates :&nbsp;  <a href=\"https:\/\/thebipower.fr\/index.php\/2019\/03\/08\/compare-the-same-data-from-two-columns-in-power-query\/\">Compare the same data from two columns in Power Query<\/a><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\t\t\t<div class=\"wp-block-uagb-post-carousel uagb-post-grid  uagb-post__image-position-top uagb-post__image-enabled uagb-block-b4d81b5c     uagb-post__arrow-outside uagb-slick-carousel uagb-post__items uagb-post__columns-3 is-carousel uagb-post__columns-tablet-2 uagb-post__columns-mobile-1\" data-total=\"5\" style=\"\">\n\n\t\t\t\t\t\t\t\t\t\t\t\t<article class=\"uagb-post__inner-wrap\">\t\t\t\t\t\t\t\t<div class='uagb-post__image'>\n\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2022\/10\/22\/custom-analyser\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" width=\"674\" height=\"193\" src=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2022\/10\/Custom-analyser.png\" class=\"attachment-large size-large\" alt=\"Custom analyser\" srcset=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2022\/10\/Custom-analyser.png 674w, https:\/\/thebipower.fr\/wp-content\/uploads\/2022\/10\/Custom-analyser-300x86.png 300w\" sizes=\"(max-width: 674px) 100vw, 674px\" \/>\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<h4 class=\"uagb-post__title uagb-post__text\">\n\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2022\/10\/22\/custom-analyser\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Custom analyser<\/a>\n\t\t\t<\/h4>\n\t\t\t\t\t\t<div class='uagb-post__text uagb-post-grid-byline'>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class='uagb-post__text uagb-post__excerpt'>\n\t\t\t\t\t<p>With &#8216;Custom analyser&#8217; feature, Power BI Sidetools users can use external tools built by the&#8230;\t\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"uagb-post__text uagb-post__cta wp-block-button\">\n\t\t\t\t<a class=\"wp-block-button__link uagb-text-link\" href=\"https:\/\/thebipower.fr\/index.php\/2022\/10\/22\/custom-analyser\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Read More<\/a>\n\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/article>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<article class=\"uagb-post__inner-wrap\">\t\t\t\t\t\t\t\t<div class='uagb-post__image'>\n\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2021\/04\/05\/dax-debugger\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"592\" src=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2021\/04\/DAX-debugger-1024x592.png\" class=\"attachment-large size-large\" alt=\"DAX debugger\" srcset=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2021\/04\/DAX-debugger-1024x592.png 1024w, https:\/\/thebipower.fr\/wp-content\/uploads\/2021\/04\/DAX-debugger-300x173.png 300w, https:\/\/thebipower.fr\/wp-content\/uploads\/2021\/04\/DAX-debugger-768x444.png 768w, https:\/\/thebipower.fr\/wp-content\/uploads\/2021\/04\/DAX-debugger-1320x763.png 1320w, https:\/\/thebipower.fr\/wp-content\/uploads\/2021\/04\/DAX-debugger.png 1452w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<h4 class=\"uagb-post__title uagb-post__text\">\n\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2021\/04\/05\/dax-debugger\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">DAX debugger in Power BI Sidetools<\/a>\n\t\t\t<\/h4>\n\t\t\t\t\t\t<div class='uagb-post__text uagb-post-grid-byline'>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class='uagb-post__text uagb-post__excerpt'>\n\t\t\t\t\t<p>DAX debugger is meant to help Power BI report developers displaying sample data from a&#8230;\t\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"uagb-post__text uagb-post__cta wp-block-button\">\n\t\t\t\t<a class=\"wp-block-button__link uagb-text-link\" href=\"https:\/\/thebipower.fr\/index.php\/2021\/04\/05\/dax-debugger\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Read More<\/a>\n\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/article>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<article class=\"uagb-post__inner-wrap\">\t\t\t\t\t\t\t\t<div class='uagb-post__image'>\n\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/12\/01\/manage-your-external-tools\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" width=\"817\" height=\"495\" src=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/12\/External-tools-manager.png\" class=\"attachment-large size-large\" alt=\"External tools manager\" srcset=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/12\/External-tools-manager.png 817w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/12\/External-tools-manager-300x182.png 300w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/12\/External-tools-manager-768x465.png 768w\" sizes=\"(max-width: 817px) 100vw, 817px\" \/>\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<h4 class=\"uagb-post__title uagb-post__text\">\n\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/12\/01\/manage-your-external-tools\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Manage your external tools easily with Power BI Sidetools<\/a>\n\t\t\t<\/h4>\n\t\t\t\t\t\t<div class='uagb-post__text uagb-post-grid-byline'>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class='uagb-post__text uagb-post__excerpt'>\n\t\t\t\t\t<p>Since the Power BI team allowed to launch external tools from the Power BI desktop&#8230;\t\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"uagb-post__text uagb-post__cta wp-block-button\">\n\t\t\t\t<a class=\"wp-block-button__link uagb-text-link\" href=\"https:\/\/thebipower.fr\/index.php\/2020\/12\/01\/manage-your-external-tools\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Read More<\/a>\n\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/article>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<article class=\"uagb-post__inner-wrap\">\t\t\t\t\t\t\t\t<div class='uagb-post__image'>\n\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/10\/30\/easily-export-to-excel-with-power-bi-sidetools\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"414\" src=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/10\/Export-to-Excel-1024x414.png\" class=\"attachment-large size-large\" alt=\"Export to Excel\" srcset=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/10\/Export-to-Excel-1024x414.png 1024w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/10\/Export-to-Excel-300x121.png 300w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/10\/Export-to-Excel-768x311.png 768w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/10\/Export-to-Excel-1536x621.png 1536w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/10\/Export-to-Excel-1320x534.png 1320w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/10\/Export-to-Excel.png 1854w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<h4 class=\"uagb-post__title uagb-post__text\">\n\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/10\/30\/easily-export-to-excel-with-power-bi-sidetools\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Easily export to Excel with Power BI Sidetools<\/a>\n\t\t\t<\/h4>\n\t\t\t\t\t\t<div class='uagb-post__text uagb-post-grid-byline'>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class='uagb-post__text uagb-post__excerpt'>\n\t\t\t\t\t<p>While working in Power BI desktop, we often need to export to Excel some data&#8230;\t\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"uagb-post__text uagb-post__cta wp-block-button\">\n\t\t\t\t<a class=\"wp-block-button__link uagb-text-link\" href=\"https:\/\/thebipower.fr\/index.php\/2020\/10\/30\/easily-export-to-excel-with-power-bi-sidetools\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Read More<\/a>\n\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/article>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<article class=\"uagb-post__inner-wrap\">\t\t\t\t\t\t\t\t<div class='uagb-post__image'>\n\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/09\/03\/powerful-text-and-file-search\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"209\" src=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-6-1024x209.png\" class=\"attachment-large size-large\" alt=\"search\" srcset=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-6-1024x209.png 1024w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-6-300x61.png 300w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-6-768x156.png 768w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-6.png 1286w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<h4 class=\"uagb-post__title uagb-post__text\">\n\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/09\/03\/powerful-text-and-file-search\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Powerful text and file search in Visual Studio Code<\/a>\n\t\t\t<\/h4>\n\t\t\t\t\t\t<div class='uagb-post__text uagb-post-grid-byline'>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class='uagb-post__text uagb-post__excerpt'>\n\t\t\t\t\t<p>As Power BI Sidetools creates files from your report, you are able to search through&#8230;\t\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"uagb-post__text uagb-post__cta wp-block-button\">\n\t\t\t\t<a class=\"wp-block-button__link uagb-text-link\" href=\"https:\/\/thebipower.fr\/index.php\/2020\/09\/03\/powerful-text-and-file-search\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Read More<\/a>\n\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/article>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<article class=\"uagb-post__inner-wrap\">\t\t\t\t\t\t\t\t<div class='uagb-post__image'>\n\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/09\/02\/compare-dax-and-m-formulas-in-visual-studio-code\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"294\" src=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-5-1024x294.png\" class=\"attachment-large size-large\" alt=\"Visual Studio Code\" srcset=\"https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-5-1024x294.png 1024w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-5-300x86.png 300w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-5-768x220.png 768w, https:\/\/thebipower.fr\/wp-content\/uploads\/2020\/09\/image-5.png 1269w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<h4 class=\"uagb-post__title uagb-post__text\">\n\t\t\t\t<a href=\"https:\/\/thebipower.fr\/index.php\/2020\/09\/02\/compare-dax-and-m-formulas-in-visual-studio-code\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Compare DAX and M formulas in Visual Studio Code<\/a>\n\t\t\t<\/h4>\n\t\t\t\t\t\t<div class='uagb-post__text uagb-post-grid-byline'>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class='uagb-post__text uagb-post__excerpt'>\n\t\t\t\t\t<p>Compare DAX and M formulas in Visual Studio Code is really easy with the help&#8230;\t\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"uagb-post__text uagb-post__cta wp-block-button\">\n\t\t\t\t<a class=\"wp-block-button__link uagb-text-link\" href=\"https:\/\/thebipower.fr\/index.php\/2020\/09\/02\/compare-dax-and-m-formulas-in-visual-studio-code\/\" target=\"_self\" rel=\"bookmark noopener noreferrer\">Read More<\/a>\n\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t<\/article>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t","protected":false},"excerpt":{"rendered":"<p>As duplicates are a nightmare in databases, I wrote a function to identify them between two tables. Duplicates detection in Power Query is extremely useful for data consistency. Additionally, it helps to avoid relationships isues. You may also use the same function with only one table. The function is optionally insensitive to case, separators and&#8230;<\/p>\n","protected":false},"author":2,"featured_media":1455,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_uag_custom_page_level_css":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"footnotes":""},"categories":[3],"tags":[67,16,17,19,26],"uagb_featured_image_src":{"full":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1.jpg",421,248,false],"thumbnail":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1-150x150.jpg",150,150,true],"medium":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1-300x177.jpg",300,177,true],"medium_large":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1.jpg",421,248,false],"large":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1.jpg",421,248,false],"1536x1536":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1.jpg",421,248,false],"2048x2048":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1.jpg",421,248,false],"mailpoet_newsletter_max":["https:\/\/thebipower.fr\/wp-content\/uploads\/2019\/01\/Duplicates-detection-1.jpg",421,248,false]},"uagb_author_info":{"display_name":"Didier TERRIEN","author_link":"https:\/\/thebipower.fr\/index.php\/author\/masterpower\/"},"uagb_comment_info":0,"uagb_excerpt":"As duplicates are a nightmare in databases, I wrote a function to identify them between two tables. Duplicates detection in Power Query is extremely useful for data consistency. Additionally, it helps to avoid relationships isues. You may also use the same function with only one table. The function is optionally insensitive to case, separators and...","_links":{"self":[{"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/posts\/219"}],"collection":[{"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/comments?post=219"}],"version-history":[{"count":5,"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/posts\/219\/revisions"}],"predecessor-version":[{"id":3697,"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/posts\/219\/revisions\/3697"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/media\/1455"}],"wp:attachment":[{"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/media?parent=219"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/categories?post=219"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thebipower.fr\/index.php\/wp-json\/wp\/v2\/tags?post=219"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}