Jump to content
  • 0

Duplicate Data


LWSChad

Question

Hello,

 

I need my users to be able to easily identify duplicate data.

 

Please consider the following scenarios:

 

Lead                               Address

John Daly                        11 Main St

Kerry Webb                     11 Main St

            

Byron Nelson                   22 Wall St

Sam Snead                      22 Wall St

 

John and Kerry are married and should be merged.

Byron and Sam are roommates and should not be merged.

 

A computer is not capable of determining who should be merged and who shouldn't; rules like same last name = merged or different last name = not-merged don't work in the real world, so I need want to perform a search the identifies duplicate data and allow the users to decide what to do next.

 

Consider this sample list:

Lead                               Address

Albert                              11 Main St

Betty                               33 3rd

Chris                               22 Wall St

Dan                                 44 Monroe

Eve                                 55 Post St

Frank                              11 Main St

Gloria                              66 6th St

Harry                               77 Sky Dr

Isla                                  44 Monroe

Jenny                              88 Broadway

Kyle                                 99 Bourbon St

Lenny                              33 3rd

 

The results of the duplicate identification should be:

Albert & Frank - 11 Main St

Chris & Lenny - 33 3rd

Dan & Isla - 44 Monroe

 

The user will then decide what to do with each match.

 

Any ideas?

 

Thanks

CHAD

 

Link to comment
Share on other sites

12 answers to this question

Recommended Posts

  • 0

Hello CHAD,

 

As far as I know, the Filter option allows users to find all duplicate records.

 

If you open a table and click Filter, you will see the Show duplicates in field check box. You can select a field and click Apply, that only duplicate records will be displayed.

 

Does it work for you?

Or do you want to display duplicate records on a Report DataPage?

Link to comment
Share on other sites

  • 0

Hello CHAD,

 

I think, you can try the following script:

​<script type="text/javascript">

function compare_three_values(first,second,third)
{
if ((first == second)||(first == third)) return true;
else return false;
}

//create the list of all cells
var elems = document.getElementsByClassName("cbResultSetTableCell");

//insert the number of columns
var number_columns = 2;

//insert the order of the column with address, the first column is 0, the second is 1...
var correct_column = 0;

//select only required cells from all cells
var k = 0; var n = 0; var list_tds = [];
for (var i=0, m=elems.length; i<m; i++) 
 {
   if (k == correct_column)
   { list_tds[n] = elems[i]; n = n + 1;   }
   k = k + 1;
   if (k == number_columns) { k = 0; }
 }

//the list for rows, true means the data is duplicate, false means that data is not duplicate
var display_row = [];
var index_display_row = 1;

//check the first row
display_row[index_display_row] = compare_three_values(list_tds[0].innerHTML, list_tds[1].innerHTML,"");
index_display_row = index_display_row + 1;

//check other rows
for (var i=1, m=list_tds.length-1; i<m; i++) 
{
   display_row[index_display_row] = compare_three_values(list_tds[i].innerHTML,list_tds[i-1].innerHTML,list_tds[i+1].innerHTML);
index_display_row = index_display_row + 1;
 }

//check the last row
var i = list_tds.length-1;
display_row[index_display_row] = compare_three_values(list_tds[i].innerHTML,list_tds[i-1].innerHTML,"");
index_display_row = index_display_row + 1;

//create the list of elements Row
var list_rows = document.getElementsByTagName("tr");

//if the data is not duplicate, to hide the row
for (var i=1, m=index_display_row; i<=m; i++) 
{
if (!display_row[i]) { list_rows[i].style.display='none'; }
}

</script>

This script hides all not duplicated records.

Please note, that the script works only if:

- the Tabular Report DataPage is used;

- the table is sorted by the correspond field (you can select your field in the Default Sort order field and deselect "Interactive Sorting" on the Search Results Options step).

 
Please, enter your values instead 0 and 2 in the following lines:
//insert the number of columns
var number_columns = 2;

//insert the order of the column with address, the first column is 0, the second is 1...
var correct_column = 0;
 

I hope it helps.

Link to comment
Share on other sites

  • 0

Hi - If you want to hide duplicate records/rows in the Results Page, you can use this workaround:

1. Insert a Calculated Field for the formula to check if the record is a duplicate or has multiple records.

ROW_NUMBER() OVER (PARTITION BY [@FIELDNAME] ORDER BY [@FIELDNAME] DESC)

2. Insert an HTML Block to hide the rows that will be greater than 1. The solution comes from this post: 

<div id="hide-this-[@cbRecordIndex]"></div>

<script>
var isi = document.getElementById("hide-this-[@cbRecordIndex]");
if('[@calcfield:1]' > 1){
 isi.parentNode.parentNode.style.display = 'none';
}
</script>

3. Hide the column Calculated Field. The solution comes from this post: 

 

Link to comment
Share on other sites

  • 0

Hello! If you want to highlight duplicate records/column in the Results Page, you can use this workaround:

1. Insert a Calculated Field that will check the record (specific field) if it has duplicates or not. The formula would be:

CASE WHEN (SELECT COUNT(FIELD_NAME) FROM TABLE_NAME WHERE FIELD_NAME=target.[@field:FIELD_NAME]) > 1
THEN 'Has Duplicate'
ELSE
'No Duplicate'
END

This formula uses a CASE WHEN statement that will determine if the COUNT of the field is more than 1. More than 1 means that the record has duplicates.

References:
- https://howto.caspio.com/function-reference/#:~:text=Return-,Case,-Evaluates a list
- https://howto.caspio.com/datapages/reports/advanced-reporting/calculations-in-forms-and-reports/

2. Insert an HTML Block that will highlight the column. Insert this code:

<div id="visi[@cbRecordIndex#]"></div>

<script>

var isi = document.getElementById("visi[@cbRecordIndex#]");

if('[@calcfield:1]' == 'Has Duplicate'){
isi.parentNode.parentNode.querySelector("td:nth-child(4)").style.cssText = 'background: yellow';
}

</script>

Make sure to disable the HTML Editor on the Advanced tab before pasting the code: https://howto.caspio.com/datapages/datapage-components/disabling-html-editor-in-datapage-header-footer-and-html-blocks/

The code means that it will check the value of the Calculated Field and highlight the specific column 4 if it meets the criteria. You can change the number 4 based on the arrangement of columns. To know the number, you can select the column and then copy its selector. For more information: https://www.scrapingbee.com/tutorials/how-to-extract-css-selectors-using-chrome/#:~:text=The process is very simple,and click on Copy selector. 

3. Hide the column Calculated Field. The solution comes from this post: 

I hope this helps!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...