Joins as Matrices

Warning: if you're unfamiliar with SQL Joins, go have a look at the Venn Diagram explanation here.

We'll start with the last example, Cartesian Joins. Recall the definition of a Cartesian Product:

[math]X\times Y = \{\,(x,y)\mid x\in X \ \text{and} \ y\in Y\,\}.[/math]

In other words, it's a set of every element from the first set paired with every element of the second set. It's fairly easy to transform this into a matrix. Supposing you have two column vectors A and B, which are the sets X and Y, do a standard matrix multiplication of A and B transposed. For a 3x3 matrix:

[math]\begin{bmatrix}a \\ b \\ c\end{bmatrix} * {\begin{bmatrix}d \\ e \\ f\end{bmatrix}}^T = \begin{bmatrix}ad & ae & af \\ bd & be & bf \\ cd & ce & cf\end{bmatrix}[/math]

So using the table data from Atwood's post, a cross join / Cartesian product looks like this:


select * from tableA cross join tableB;

+----+-----------+----+-------------+

| id | name      | id | name        |

+----+-----------+----+-------------+

|  1 | Pirate    |  1 | Rutabaga    |

|  2 | Monkey    |  1 | Rutabaga    |

|  3 | Ninja     |  1 | Rutabaga    |

|  4 | Spaghetti |  1 | Rutabaga    |

|  1 | Pirate    |  2 | Pirate      |

|  2 | Monkey    |  2 | Pirate      |

|  3 | Ninja     |  2 | Pirate      |

|  4 | Spaghetti |  2 | Pirate      |

|  1 | Pirate    |  3 | Darth Vader |

|  2 | Monkey    |  3 | Darth Vader |

|  3 | Ninja     |  3 | Darth Vader |

|  4 | Spaghetti |  3 | Darth Vader |

|  1 | Pirate    |  4 | Ninja       |

|  2 | Monkey    |  4 | Ninja       |

|  3 | Ninja     |  4 | Ninja       |

|  4 | Spaghetti |  4 | Ninja       |

+----+-----------+----+-------------+

Note: the cross join here can be rewritten as select * from tableA, tableB;

which can be turned into a Matrix with each cell containing the tuple (idA, nameA, idB, nameB):

[math]M = \begin{bmatrix}(1, Pirate, 1, Rutabaga) & (1, Pirate, 2, Pirate) & (1, Pirate, 3, Darth Vader) & (1, Pirate, 4, Ninja) \\
(2, Monkey, 1, Rutabaga) & (2, Monkey, 2, Pirate) & (2, Monkey, 3, Darth Vader) & (2, Monkey, 4, Ninja) \\
(3, Ninja, 1, Rutabaga) & (3, Ninja, 2, Pirate) & (3, Ninja, 3, Darth Vader) & (3, Ninja, 4, Ninja) \\
(4, Spaghetti, 1, Rutabaga) & (4, Spaghetti, 2, Pirate) & (4, Spaghetti, 3, Darth Vader) & (4, Spaghetti, 4, Ninja)\end{bmatrix}[/math]

Phew!

The advantages of constructing this matrix is that it makes it very easy to see what "cells" (aka records) are selected based on the join condition. The entire set of records is taken on no joins, if we stipulate a join condition that "nameA = nameB" then only two cells are returned: $$M(0,1)$$ and $$M(2,3)$$ (the matching Pirates and Ninjas) (where $$M(i,j)$$ returns the cell in the ith row and jth column).

It's also easy to see how to construct the result set from "weird" join conditions; that is, join conditions that aren't strict equality checks on keys that can take advantage of DB indexing magic. As an example, consider joining when the distance between idA and idB is more than or equal to 2. That is, select * from tableA, tableB where abs(tableA.id - tableB.id) >= 2. (Alternatively cross join on abs....) The grid should light up like this, where black cells are matching cells:

(Kind of pretty, no?)

The main disadvantage to this matrix approach is when we have big tables with lots of data. Computers have finite memory; good luck dealing with millions by millions size matrices! There is a very clever solution to this problem, however, which can make use of the MapReduce framework. It's called the 1-bucket theta algorithm! (And its cousins, M-bucket output/input.) I'll write about that when I understand it better and have a simple demo program or at least some cool numbers.

Posted on 2011-10-25 by Jach

Tags: databases, programming

Permalink: https://www.thejach.com/view/id/217

Trackback URL: https://www.thejach.com/view/2011/10/joins_as_matrices

Back to the top

Back to the first comment

Comment using the form below

TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)

Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Joins as Matrices

Posted on 2011-10-25 by Jach

Archives

Selected Posts

Recent Posts

Recent Comments

Better Websites

Email:
Password:
Remember Me

TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)

Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Joins as Matrices

Posted on 2011-10-25 by Jach

Archives

Selected Posts

Recent Posts

Recent Comments

Tag Cloud

aliens

altruism

Anarchy

anime

anti-anarchy

artificial intelligence

assembly

atheism

awesome

bash

basics

bayes

books

business

c

c++

cars

circuit analysis

clojure

cloning

cognition

college

comics

computer engineering

couchdb

cryonics

curl

daily life

databases

debate

demo

design

disaster

economics

evolution

existentialism

FFP Machine

fiction

flex

fodder

food

forth

FPGA

free will

french

furcadia

future

games

git

government

grammar

hacking

HDL

hiring

history

immigration

intellectual property

italy

japan

java

jMonkeyEngine

language

learning

lisp

LucidDB

management

math

medicine

memo