SQL Queries

In my work, I often need to perform certain data-engineering tasks frequently. One such frequent task is to create one-off database queries( aka sql queries) for data migration purposes. The process is often like this.

  1. Oh, I need to figure out how to write this query as part of my data migration script due to this new feature I need to add.
  2. Google and find the right StackOverflow answer.
  3. Copy, paste, and modify till it works.
  4. Promptly forget how it’s done.
  5. When I need the query again, repeat step 1-3.

I want to break the cycle of Google-StackOverflow-Copy-Paste-Forget. Looking ahead, I also expect to talk in greater depth about my work involving data engineering work. Hence, I want to fulfil both functions with this page.

For the time being, I will simply append the queries to this page as and when I encounter the need for one-off sql queries during my work.

When the list gets too long, I may reorganise this page. For now, my cycle is Google => StackOverflow => Copy-Paste => Modify => Append-To-This-Page.

Delete Duplicate Rows

This has its own page. I have yet to figure out whether to have separate pages for individual data engineering use cases. I’m experimenting with having some of the queries be in their own pages. And this is one of them.

*New* Select First Matching Row For All Variations

Another query with its own page.

Get the Auto-increment Identity and Set it to an Arbitrary Value

Situation: You have a table using autoincrement for its id primary key field. You want to know what’s the next value that comes for the id. Or you want to reset it to an arbitrary value. Sometimes, the auto-increment is also known as identity.

Note: if you want to set the current auto-increment to the max id + 1, see the next section.


Postgres uses SEQUENCE as a way to track the autoincrement for the primary key field. The general form is <tablename>_<autoincrement_field>_seq.

To retrieve the next available value for the autoincrement in table1 and the field is id:

SELECT nextval('table1_id_seq');

To reset the next value for the same field and table to 1337:

SELECT setval('table1_id_seq', 1337, false);  

Note: you need to set the last argument to false, so the next value will be set arbitrarily to 1337. If it’s not included, it will take 1338 instead.


MySQL uses AUTOINCREMENTinside INFORMATION_SCHEMA.TABLES to track the autoincrement.

To retrieve the next available value for the autoincrement in table1 and the database is database_name:

WHERE TABLE_SCHEMA = 'database_name'
AND   TABLE_NAME   = 'table1';

To reset the next value for the same table to 1337:


Reset the Auto-increment to Max ID + 1

Situation: The table has auto-increment for the id field which uses int. You want to reset the auto-increment counter so that it carries on from the maximum id field that’s already being used.


The issue here is that you need to use stored procedures. So there are basically two steps.

First, create the stored procedure. Second, run the stored procedure.

Step 1: Create Stored Procedure


    CREATE PROCEDURE reset_autoincrement(IN tablename varchar(200))


     SET @get_next_inc = CONCAT('SELECT @next_inc := max(id) + 1 FROM ',tablename,';');

     PREPARE stmt FROM @get_next_inc; 

     EXECUTE stmt; 

     SELECT @next_inc AS result;


     SET @alter_statement = CONCAT('ALTER TABLE ', tablename, ' AUTO_INCREMENT = ', @next_inc, ';');

      PREPARE stmt FROM @alter_statement;

      EXECUTE stmt;


    END //

Before you run the stored procedure, you shouldn’t be able to find it.
No such stored procedure when search for `reset`
When I run, I simply select the stored procedure and press Run Selection.
Running the Stored Procedure
Note: the delimiters part are crucial.
After I run, I should be able to find the stored procedure.
Now the stored procedure exists
Note: If you want to change the stored procedure, you need to delete the stored procedure, then change it and run again.

Step 2: Call the stored procedure

This time you can simply use normal MySQL queries to call the stored procedure.
call reset_autoincrement('products');


The table is productsand the autoincrement field is id.

SELECT setval('products_id_seq', (SELECT max(id) from products), true);  

Note: you need to set the last argument to true, so the next value will be max(id) + 1.

Insert Into on Duplicate Update with Select

Situation: You want to insert a bunch of records into a table using data from other tables. At the same time, when there’s duplication in terms identity values, you want to use update instead of insert.


INSERT INTO table1 (val1, val2) SELECT val_a, val_b FROM table2 ON DUPLICATE KEY UPDATE val1=table2.val_a, val2=table2.val_b;


Will add this later.

Insert Into on Duplicate Update with Group By Clause

Situation: Similar situation as the case above. You want to insert a bunch of records into a table using data from other tables. At the same time, when there’s duplication in terms identity values, you want to use update instead of insert. The tricky part is that the inserted is an aggregation of data from other tables. This typically requires Group By clause in the select subquery.


INSERT INTO table1(val1, val2) SELECT table2sum.val_a, table2sum.val_b FROM (SELECT SUM(val_a) as val_a, val_b from table2 group by table2.val_b) as table2sum ON DUPLICATE KEY UPDATE val1=table2sum.val_a, val2=table2sum.val_b;


Will add this later.

Update One Table with Data From Another Related Table

There are at least 3 variations of this type of query. That includes updating a parent table based on aggregated data in the child table. This now has its own page.

Update Two Tables That Are Related

Situation: You want to update the fields for a particular row in both the parent and child table.


UPDATE parent_table p
    child_table c
    p.id = c.parent_id
    p.parent_name = 'parent',
    c.child_name = 'child'
WHERE p.serial_number = '1234ST';


Postgres does not have a similar update-inner-join statement like MySQL.

You can do this in two ways. Do it in 2 separate queries. Or do it in 1 query. The 2 queries way is easier to understand.

2 queries

UPDATE parent_table AS p
SET parent_name = 'parent'
WHERE p.serial_number = '1234ST';

UPDATE child_table AS c
SET child_name = 'child'
FROM parent_table p
WHERE c.parent_id = p.id and p.serial_number = '1234ST';

1 query

The technique below is also known as the writeable common table query but honestly, I find it not as intuitive.

The advantage of this over the 2 queries method is that you only need to state the WHERE clause condition (WHERE serial_number = '1234ST') once.

WITH parent_update as (
  UPDATE parent_table
   SET parent_name = 'parent'
  WHERE serial_number = '1234ST'
  returning id, serial_number
UPDATE child_table 
  SET child_name = 'child'
WHERE (parent_id, serial_number) IN (select id, serial_number from parent_update);

Order by Length of Field

Situation: You want to see the records of a table (table1) ordered by the string length of a field (field1) which is a text or char field. You want to do it by descending order too.


SELECT * from table1 ORDER BY CHAR_LENGTH(field1) desc;


SELECT * from table1 ORDER BY LENGTH(field1) desc;


Situation: You want to truncate parent table but it has child table using foreign key reference on it. You want to cascade the truncate.





Return Different String Based on Boolean Field

Situation: You have a boolean field but instead of returning that, you want to display different strings based on the value.




SELECT (CASE WHEN column <> 0 THEN 'yes' ELSE 'no' END) As Value

Update Field With String Replacement

Situation: You have a string field and you want to use it to update another string field with replacements for certain characters such as empty space and hyphens.

E.g. I want to remove all empty spaces and hyphens before shortening the string to 6 characters long


# | is or operator
# g means global
# postgres uses 1-indexed
UPDATE table_a set shortname = SUBSTR(regexp_replace(name, '( |-)', '', 'g'), 1, 6);