Removing duplicates 

To remove a duplicate, let's do some setup first with the following queries:

  1. Let's create a copy of the schools table with the following query: 
USE lahmansbaseballdb;
DROP TABLE IF EXISTS schools_copy;
CREATE TABLE schools_copy
SELECT *
FROM schools
WHERE 1=0;
  1. Now, let's insert some data into the schools_copy table with the following query: 
USE lahmansbaseballdb;
INSERT INTO schools_copy VALUES
('adelphi','Adelphi University','Garden City','NY','USA'),
('adelphi1','Adelphi University','Garden City','NY','USA'),
('akron','University of Akron','Akron','OH','USA'),
('alabama','University of Alabama','Tuscaloosa','AL','USA'),
('alabamast','Alabama State University','Montgomery','AL','USA');
  1. Now that we have a table set up so that we can remove duplicates, execute the following query: 
USE lahmansbaseballdb;
SELECT name_full
FROM schools_copy
GROUP BY name_full
HAVING count(*) >= 2;

The preceding query gives you the results shown in the following screenshot: 

Bethel College is a possible duplicate. 

  1. If you execute the following query, you will see all the fields in the table to tell whether they are genuinely duplicates: 
USE lahmansbaseballdb;
SELECT * FROM schools_copy
WHERE name_full = 'Bethel College';

The preceding query gives you the results shown in the following screenshot: 

Here, we can see that Bethel College truly is a duplicate value, even though schoolID is different. It's just that the schoolID column can't have duplicates, so maybe someone inserted it with a different schoolID at some point, not realizing that it already existed with another schoolID. You can delete one of these rows based on schoolid, as shown in the following query:

USE lahmansbaseballdb;
DELETE FROM schools_copy
WHERE schoolid = 'inbethel';

Execute this query again: 

SELECT name_full
FROM schools_copy
GROUP BY name_full
HAVING count(*) >= 2;

You will see that you no longer have any duplicates in the school_copy table. 

Execute this query again: 

USE lahmansbaseballdb;
SELECT * FROM schools_copy
WHERE name_full = 'Bethel College';

The preceding query gives you the results shown in the following screenshot:

Now, you only have the one result for Bethel College since you removed the duplicate.

If you didn't have a unique ID for each row, then this will become much more difficult. This is why it's crucial to design tables so that you have a unique identifier for each row or a combination of values, or make a unique identifier enforced by a primary key. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.177.14