Skip to content

Database Notes

  • In relational databases, a relation is basically a table.
  • It consists of rows (tuples) and columns (attributes).

👉 Example: A STUDENT relation

student_idnameagemajor
1Alice20Biology
2Bob21Computer Science
3Carol22Math
  • The schema is the structure (definition) of a relation.
  • It describes what attributes (columns) the relation has and their data types.
  • Think of it as the blueprint of a table.

👉 Example:
STUDENT(student_id INT, name VARCHAR(50), age INT, major VARCHAR(50))

This schema says:

  • student_id �?integer
  • name �?string up to 50 characters
  • age �?integer
  • major �?string up to 50 characters
  • An instance is the actual content (data rows) in a table at a given moment.
  • The schema is fixed, but the instance can change over time (when you insert/update/delete rows).

👉 Example:
The current instance of STUDENT might be:

student_idnameagemajor
1Alice20Biology
2Bob21Computer Science

If tomorrow you insert Carol, the instance changes.

  • An attribute is a column of a relation.
  • Each attribute has a name and a domain (data type).

👉 Example:
In STUDENT(student_id, name, age, major)

  • student_id is an attribute (domain: integer)
  • name is an attribute (domain: string)
  • etc.
  • A tuple is a row (record) in a relation.
  • It represents one entry in the table.

👉 Example:
(1, 'Alice', 20, 'Biology') is a tuple in STUDENT.

DDL: definition

DML: manipulation

RDBMS

primary key (sid)

do not have to be one column

foreign key => primary key (by default)

you can practice these on Leetcode, or simply ask AI

FROM

WHERE

SELECT DISTINCT / AVG

ORDER BY DESC / ASC

LIMIT

GROUP BY

HAVING COUNT (*) > 2

SELECT seller, SUM(amount) AS total_sales
FROM sales
GROUP BY seller
HAVING SUM(amount) > 200;

FROM : identify table

WHRER : apply selections (eliminate rows)

SELECT : project away columns

GROUP BY : form groups & aggregate

HAVING : eliminate groups

DISTINCT : eliminate duplicates

A list of tables => combination of tuples => WHERE selection

self-join

SELECT x.name, x.age, y.sname AS sname2, y.age AS age2
FROM Sailors AS X, Sailors AS y
WHERE x.age > y.age

you can combine with arithmetic expressions

SELECT S.sname
FROM Sailors S
WHERE S.name ~ 'B.*'

AND / OR

UNION / INTERSECT / EXCEPT

{UNION, INTERSECT, EXCEPT} ALL

ANY

SELECT *
FROM Sailors S
WHERE S.rating > ANY
(SELECT S2.rating
FROM Sailors S2
WHERE S2.sname='Popeye')

IN => subquery NOT IN

EXISTS

correlated subquery

SELECT S.name
FROM Sailors S
WHERE EXISTS
(SELECT *
FROM RESERVES R
WHERE R.bid=102 AND S.sid=R.sid)

Find sailors who’ve reserved all boats. (sailors with no counterexample missing boats)

SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS
(SELECT B.bid
FROM Boats B
WHERE NOT EXISTS (SELECT R.bid
FROM Reserves R
WHERE R.bid=B.bid
AND R.sid=S.sid))

commonly used in optimization problems

-- just the max
SELECT MAX(S.rating)
FROM Sailors S;
-- illegal query
SELECT S.*, MAX(S.rating)
FROM Sailors S;

right implementations

SELECT *
FROM S.rating >= ALL
(SELECT s2.rating
FROM Sailors S2)
-- single return, not entirely deterministic
SELECT *
FROM Sailors S
ORDER BY rating DESC
LIMIT 1;

INNER JOIN

SELECT s.*, r.bid
FROM Sailors s, Reserves r
WHERE s.sid = r.sid
AND ...
SELECT s.*, r.bid
FROM Sailors s INNER JOIN Reserves r
ON s.sid = r.sid
WHERE ...

others

-- INNER is default
SELECT <column expression list>
FROM table name
[INNER | NATURAL
| {LEFT | RIGHT | FULL} {OUTER}] JOIN
table name
ON <qualification_list>
WHERE ...

generally speaking, natural join is not a good a idea to use

create a virtual table

CREATE VIEW Redcount
...
SELECT * FROM Redcount;

view on the fly

SELECT bname, scount
FROM Boats2 B,
(SELECT B.bid, COUNT (*)
FROM Boats2 B, Reserves2 R
WHERE R.bid = B.bid AND B.color = 'red'
GROUP BY B.bid) AS Reds(bid, scount)
WHERE Reds.bid = B.bid
AND scount < 10

another “view on the fly” syntax, you can have many queries in WITH

WITH Reds(bid, scount) AS
(SELECT B.bid, COUNT (*)
FROM Boats2 B, Reserves2 R
WHERE R.bid = B. bid AND B.color = 'red'
GROUP BY B.bid),
UnpopularReds AS
SELECT bname, scount
FROM Boats2 B, Reds
WHERE Reds.bid = B.bid
AND scount < 10
SELECT * FROM UnpopularReds;

ARGMAX GROUP BY

for each age, we’re going to group the max rating

WITH maxratings(age, maxrating) AS
(SELECT age, MAX(rating)
FROM Sailors
GROUP BY age)
SELECT S.*
FROM Sailors S, maxratings m
WHERE S.age = m.age
AND S.rating = m.maxrating;

Rule: (x op NULL) evaluates to … NULL!

With Boolean Logic

NULL AND TRUE => NULL;
NULL AND FALSE => FALSE;
NULL OR TRUE => TRUE;
NULL OR FALSE => FALSE;

Aggregation: NULL will be ignored

SELECT count(*) FROM sailors; -- should consider NULL
SELECT count(rating) FROM sailors;
SELECT SUM(rating) FROM sailors;
SELECT AVG(rating) FROM sailors;