SQL Joins with Sequelize

Posted by Alex on October 11, 2020

While working on an API for my day job last week, I needed to do a SQL Inner Join with Sequelize and Typescript in a web API. Here’s how I achieved it.

Environment:

Things move quickly in the JS ecosystem. Here are the library versions used in the subject API at the time of writing:

1
2
3
4
5
6
"dependencies": {
    "@types/express": "~4.0.39",
    "@types/sequelize": "^4.27.21",
    "express": "~4.16.2",
    "sequelize": "4.38.0",
  }

Context:

I have an identifier in one system that I need to correlate with an identifier in a second system, but I could only do that via an intermediate identifier, also in the second system. A good case for a SQL Join.

System 1 System 2 System 2
Electronic Identifier Animal Identifier Legacy Animal Identifier
ABCD 555 WXYZ 1234

In System 1, I have enough information to send this request:

1
2
3
4
5
6
7
{
    "items": [
        "ABC",
        "DEF",
        "GHI"
    ]
}

And in response I want:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
    "links": [
        {
            "rel": "self",
            "href": "https://api.com/123-i-am-a-uuid"
        }
    ],
    "reference": "123-i-am-a-uuid",
    "items": [
        {
            "legacyId": 123,
            "electronicId": "ABC"
        },
        {
            "legacyId": 456,
            "electronicId": "DEF"
        },
        {
            "legacyId": 789,
            "electronicId": "GHI"
        }
    ]
}

The initial approach was to make three repository queries to get the data I needed, and then go through three javascript map operations to splice the data into the above format. Using a series of javascript maps is inefficient, particularly as the data set grows.

A better way is to get SQL to do the grunt work, as it’s far more time and memory efficient than the javascript I would have needed to write.

As an aside, I’m not too fond of Javascript on the back end. I don’t feel it’s performant enough and as your codebase grows the lack of a robust type system hurts more and more. Typescript helps address the latter (assuming your libraries support it) but does nothing for the former. A typed language like C# or Java is preferable for web programming in a large enterprise due to better performance, more coherent ecosystem (see: left-pad) and robust type system.

I need to make a map of identifiers from an electronicIdEvent entity (that deals with electronic identifiers) and an animal entity that deals with animal information.

  electronicId animalId legacyAnimalId
electronicIdEvent  
animal  

I need the legacyAnimalId field. To find a legacyAnimalId given an electronicIdEvent, I have to:

  1. Find all electronicIdEvent rows that match an electronicId.
  2. Find all animal rows that match the animalId of the electronicIdEvent where an electronicId is present.
  3. Return all legacyAnimalId rows that match the animalId of both the animal and the electronicIdEvent.

Hopefully, you can see why using javascript maps, and SQL selects for this would be tedious and inefficient.

Here’s a SQL query that demonstrates what I want if you’re more familiar with SQL then Javascript:

1
2
3
4
select event.eid, event.animalId, animal.id, animal.animalKey
from "electronicIdEvent" event
    inner join "animal" animal
on event."animalId" = animal.id

For Sequelize to generate the right SQL, it needs to know about the relationships between your data.

Here’s the code that creates associations in Sequelize:

1
2
3
4
5
6
7
8
this.animal.hasMany(this.electronicIdEvent, {
  foreignKeyConstraint: true,
  foreignKey: "animalId",
});
this.electronicIdEvent.belongsTo(this.animal, {
  foreignKeyConstraint: true,
  foreignKey: "animalId",
});

Associations allow the ORM to understand your relations and map One To One, One To Many and Many To Many relationships to appropriate SQL code. The docs describe this here

2. Create a Sequelize repository method that uses this association to form a SQL query that returns the correct data:

There is some stuff in here that’s not great, but it works well enough. If I had more time, I’d figure out a way of using typescript’s not-null override (!), and ideally find a way of making event a typed response.

The pseudocode I want is:

  • First return all the rows for the electronicId and animalId columns in electronicIdEvent, where the electronicId is present in my electronicIdList
  • Next join the rows for those columns with the rows for the legacyAnimalId column, where the animalId is the same in both the electronicIdEvent and animal entities.
  • When the data comes back, drop the animalId column. I don’t need it once it has done the job of being the glue between the other entities and columns.

Or, as Javascript:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// imports omitted for brevity

export class AnimalRepository {
  constructor(
    private db: AnimalTimelineDbContext,
  ) {}

getLegacyAnimalIdentifierByElectronicIdentifier(electronicIdList: Array<string>): Promise<Array<IdentifierMap>> {
    return this.db.electronicIdEvent
      .findAll({
        attributes: ["electronicId", "animalId"],
        where: {
          electronicId: electronicIdList,
        },
        include: [ 
          {
            model: this.db.animal,
            required: true,
            attributes: ["legacyAnimalId"],
          },
        ],
      })
      .then((events) =>
        events.map((event) => ({
          eid: event.electronicId,
          legacyAnimalId: event.Animal!.legacyAnimalId,
        }))
      );
  }
}

Note: Setting required: true in the include block forces the query to only return records which have an associated model, this converts the query from an OUTER JOIN to an INNER JOIN.

Because we only return complete rows (no partials) event.Animal cannot be null, I can override the typescript hint, i.e. event.Animal!.legacyAnimalId.

I’m done at this point. We’ve submitted a list of electronic identifiers and gotten back an Array<IdentifierMap>, which is syntactic sugar for Array<[string, integer]>.

3. Convert the raw data into my map object, and fire the response back.

Because Sequelize is doing most of the grunt work in formatting the data, all I need in my web layer is to add the UUID that the user-submitted and save that map. Once I’ve saved the map and returned the reference to the calling system, the calling system can ask for that map and get the data (this is the most RESTful way of handling the data exchange and I’m omitting that setup for a follow-up post).