What is considered duplicate record?

An episode or visit is considered to be a unique combination of (Organisation ID), Clinic ID, Patient ID and Attendance Date within a Submission.  Where different records hold the same combination of these items, they are considered duplicate records.

It is a requirement that patient level information captures systems use a common Clinic Identifier and Patient Identifier across their systems, so that patients and clinics can be collated across both GU and SRH records. 

When are records de-duplicated?

The Transpose process will automatically de-duplicate data and consolidate all activity for a single episode or visit into one record within any file uploaded.

Where multiple files are uploaded whether or not they are Transposed, the records are de-duplicated across all the files uploaded within the month of Submission for that organisation.

It is therefore possible to upload duplicate data into different organisations or in different months of submission without it being de-duplicated.  Although this might not be normal practice this feature is in place so that providers can upload data for testing or analysis purposes.

How are records de-duplicated?

Records within an organisations submission with the same Clinic ID, Patient ID and Attendance Date are collated.  Each field will be over written by the latest record to hold data for that field (a blank field will not over write a field that already contains some data).

For example, image three duplicate records, with the same Clinic ID, Patient ID and Attendance Date, but with the following populated data fields:

 

Is DuplicateClinic IDPatient IDAttendance DateSHHAPT 1SHHAPT 2...SRH Care Activity 1
TRUE RBC01 24681357 2017-04-01 T4 D2B    
TRUE RBC01 24681357 2017-04-01       1
TRUE RBC01 24681357 2017-04-01 T4 C4    
FALSE RBC01 24681357 2017-04-01 T4 C4   1

 

The T4 has been carried into the final record, the D2B has been overwritten by the C4 in the third record, the SRH Care Activity 1 is also carried into the final record, even though the third record has no value for this field.

The Accepted Data tab in the submission offers a data extract that identifies any duplicate records in a column headed Is Duplicate.  In the extract the final record will be labelled Is Duplicate = FALSE and contain the finally accepted data items for that record.  In the example, above Record 3 would be altered to contain the finally accepted record value (shown in light blue above).

Record Count

Whilst de-duplication is runing and files are being uplaoded the record count will adjust according to the duplicates found.

Caution!

If a GUMCAD file that contains a single column for activity data is not Transposed, then the activity in each record will be over written by the next record and the final record for each episode / visit will only contain a single activity code which will be the last one recorded in the file for that patient episode / visit.  In this circumstance, providers will witness a significant fall in charges.