In my work, I often need to perform certain data-engineering tasks frequently. One such frequent task is to create one-off database queries( aka sql queries) for data migration purposes. The process is often like this.
- Oh, I need to figure out how to write this query as part of my data migration script due to this new feature I need to add.
- Google and find the right StackOverflow answer.
- Copy, paste, and modify till it works.
- Promptly forget how it’s done.
- When I need the query again, repeat step 1-3.
I want to break the cycle of Google-StackOverflow-Copy-Paste-Forget. Looking ahead, I also expect to talk in greater depth about my work involving data engineering work. Hence, I want to fulfil both functions with this page.
For the time being, I will simply append the queries to this page as and when I encounter the need for one-off sql queries during my work.
When the list gets too long, I may reorganise this page. For now, my cycle is now Google-StackOverflow-Copy-Paste-Append-To-This-Page.
Reset the Auto-increment to Max ID + 1
Situation: The table has auto-increment for the id field which uses int. You want to reset the auto-increment counter so that it carries on from the maximum id field that’s already being used.
The issue here is that you need to use stored procedures. So there are basically two steps.
First, create the stored procedure. Second, run the stored procedure.
Step 1: Create Stored Procedure
DELIMITER // CREATE PROCEDURE reset_autoincrement(IN tablename varchar(200)) BEGIN SET @get_next_inc = CONCAT('SELECT @next_inc := max(id) + 1 FROM ',tablename,';'); PREPARE stmt FROM @get_next_inc; EXECUTE stmt; SELECT @next_inc AS result; DEALLOCATE PREPARE stmt; SET @alter_statement = CONCAT('ALTER TABLE ', tablename, ' AUTO_INCREMENT = ', @next_inc, ';'); PREPARE stmt FROM @alter_statement; EXECUTE stmt; DEALLOCATE PREPARE stmt; END // DELIMITER ;
Step 2: Call the stored procedure
-- To be determined
Insert Into on Duplicate Update with Select
Situation: You want to insert a bunch of records into a table using data from other tables. At the same time, when there’s duplication in terms identity values, you want to use update instead of insert.
INSERT INTO table1 (val1, val2) SELECT val_a, val_b FROM table2 ON DUPLICATE KEY UPDATE val1=table2.val_a, val2=table2.val_b;
Will add this later.
Insert Into on Duplicate Update with Group By Clause
Situation: Similar situation as the case above. You want to insert a bunch of records into a table using data from other tables. At the same time, when there’s duplication in terms identity values, you want to use update instead of insert. The tricky part is that the inserted is an aggregation of data from other tables. This typically requires Group By clause in the select subquery.
INSERT INTO table1(val1, val2) SELECT table2sum.val_a, table2sum.val_b FROM (SELECT SUM(val_a) as val_a, val_b from table2 group by table2.val_b) as table2sum ON DUPLICATE KEY UPDATE val1=table2sum.val_a, val2=table2sum.val_b;
Will add this later.
Update with Select SUM from another table
Situation: You have a table that is the parent of another child table. You want to sum a field in the child table and update the value to the corresponding parent record in the parent table.
UPDATE parent_table pt INNER JOIN ( SELECT pt_id, COALESCE(SUM(field1),0) as field1_sum FROM child_table GROUP BY pt_id ) i ON pt.id = i.pt_id SET pt.total_field1 = i.field1_sum [WHERE predicate];
UPDATE parent_table pt SET pt.total_field1 = i.field1_sum FROM ( SELECT pt_id, COALESCE(SUM(field1),0) as field1_sum FROM child_table GROUP BY pt_id ) i WHERE i.pt_id = pt.id [and more WHERE predicate];
Update One Table with Data From Another Related Table
Situation: You want to update the fields for a particular row in the child table with data from the parent table.
UPDATE parent_table p INNER JOIN child_table c ON p.id = c.parent_id SET c.name = concat(p.name, ' > Child' WHERE c.parent_id = 123;
UPDATE child_table AS c SET name = concat(p.name, ' > Child') FROM parent AS p WHERE c.parent_id= p.id and c.parent_id = 123;
Update Two Tables That Are Related
Situation: You want to update the fields for a particular row in both the parent and child table.
UPDATE parent_table p INNER JOIN child_table c ON p.id = c.parent_id SET p.parent_name = 'parent', c.child_name = 'child' WHERE p.serial_number = '1234ST';
Postgres does not have a similar
update-inner-join statement like MySQL.
You can do this in two ways. Do it in 2 separate queries. Or do it in 1 query. The 2 queries way is easier to understand.
UPDATE parent_table AS p SET parent_name = 'parent' WHERE p.serial_number = '1234ST'; UPDATE child_table AS c SET child_name = 'child' FROM parent_table p WHERE c.parent_id = p.id and p.serial_number = '1234ST';
The technique below is also known as the writeable common table query but honestly, I find it not as intuitive.
The advantage of this over the 2 queries method is that you only need to state the WHERE clause condition (
WHERE serial_number = '1234ST') once.
WITH parent_update as ( UPDATE parent_table SET parent_name = 'parent' WHERE serial_number = '1234ST' returning id, serial_number ) UPDATE child_table SET child_name = 'child' WHERE (parent_id, serial_number) IN (select id, serial_number from parent_update);
Order by Length of Field
Situation: You want to see the records of a table (table1) ordered by the string length of a field (field1) which is a text or char field. You want to do it by descending order too.
SELECT * from table1 ORDER BY CHAR_LENGTH(field1) desc;
SELECT * from table1 ORDER BY LENGTH(field1) desc;
TRUNCATE and CASCADE
Situation: You want to truncate parent table but it has child table using foreign key reference on it. You want to cascade the truncate.
TRUNCATE TABLE parent CASCADE;