Convert SQLite JSON columns to JSONB#1224
Conversation
d958f6b to
0df950c
Compare
Alright, so this one's been some cleanup that I've been intending for a while. Previously, sqlc made using JSONB in SQLite completely unusable because it sent back the binary format which you're explicitly not supposed to parse yourself. I ended up fixing this like a year ago in [1], but getting a sqlc release out the door took so long that I ended up forgetting about it. Here, modify our SQLite definitions so that JSON becomes JSONB, and add migration to convert existing installation values to the same. Luckily, this doesn't require a table rewrite because both JSON and JSONB are SQLite BLOB types. I didn't want to add a migration version just for SQLite, so to keep everything in sync I also added a version 7 for Postgres that's just a no-op with a comment. [1] sqlc-dev/sqlc#3968
0df950c to
bfe5ba0
Compare
bgentry
left a comment
There was a problem hiding this comment.
LGTM aside from the broader concern of no-op migrations for the vast majority of users.
| -- No-op migration to keep version numbers in sync across all drivers. | ||
| -- The SQLite driver uses this version for a JSONB conversion. |
There was a problem hiding this comment.
I hadn't considered this possibility before but tbh I don't love no-op migrations for the vast majority of users just to facilitate something in sqlite. We could consider pulling in some of these other items to make the v7 migration broadly useful everywhere? That would at least punt the issue for now. https://github.com/riverqueue/river/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22needs%20db%20migration%22
There was a problem hiding this comment.
Yeah, I figured we'd probably combined this with #1115 at the very least (which contains fixes for all three items you linked).
I don't love the idea of shipping a DB migration that isn't totally necessary, but it's made a little more complicated by the fact that the longer we go without the migration, the more people will have to migrate later, so there is some cost to not doing this.
The good news with #1224 and #1115 is that we may be able to ship the migration, but make a clear note saying it's not a critical migration for Postgres, so if people want to defer it, they can. If there's ever another migration later on, they could always run it to get the cleanups at that point in time.
A fairly long-standing issue with SQLite compared to Postgres is the lack of a listen/notify system because SQLite doesn't offer such a mechanism. This has forced SQLite to operate in poll-only mode which means there's a longer delay before it sees jobs that it can work. Honker [1] hit Hacker News a few weeks/months back. It implements a listen/notify-like mechanism, and when reading about how it's implemented, I realized it wouldn't be all that crazy to do something similar for River. Basically, it's poll based, and relies on polling a single table fairly actively. That's something that's not great in general, but we already do that for poll-only, except that it's multiple tables instead of just one (i.e. `river_job` + `river_leader`). Here, I propose that we implement listen/notify using a similar strategy by adding a `river_notification` in SQLite. Like listen/notify, this table has a listen/notify, along with a timestamp that we can use to track event creation and prune older rows. We add a SQLite `Listener` similar to the one for Postgres, but which polls `river_notification`, remembering the last ID that it so it can make sure to only fetch new notifications in each River process. A new maintenance service that runs only in SQLite takes care of pruning events in the table older than 5 minutes. A nice property of `river_notification` is that like listen/notify, it all works transactionally as well. New rows are hidden until the transaction that inserted them commits, keeping them invisible from the listener. After commit, the listener sees them and responds. We track a `lastID` in the listener for more efficient reads, which seems like it might be a problem due to out-of-order IDs, but for better or worse this is not possible in SQLite due its single-writer model. IDs always commit in order. You might think that this would create a table that's overly high volume because 1,000 job insertions would create 1,000 notifications, but luckily we've been deduplicating notifications through the use of a notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a long time now. Leadership notifications are also low volume, so this table should never have to be very big at any given time, which is nice. Requires a database migration, but that might dovetail fairly nicely with #1224 (convert SQLite json to jsonb), which also does. [1] https://github.com/russellromney/honker
A fairly long-standing issue with SQLite compared to Postgres is the lack of a listen/notify system because SQLite doesn't offer such a mechanism. This has forced SQLite to operate in poll-only mode which means there's a longer delay before it sees jobs that it can work. Honker [1] hit Hacker News a few weeks/months back. It implements a listen/notify-like mechanism, and when reading about how it's implemented, I realized it wouldn't be all that crazy to do something similar for River. Basically, it's poll based, and relies on polling a single table fairly actively. That's something that's not great in general, but we already do that for poll-only, except that it's multiple tables instead of just one (i.e. `river_job` + `river_leader`). Here, I propose that we implement listen/notify using a similar strategy by adding a `river_notification` in SQLite. Like listen/notify, this table has a listen/notify, along with a timestamp that we can use to track event creation and prune older rows. We add a SQLite `Listener` similar to the one for Postgres, but which polls `river_notification`, remembering the last ID that it so it can make sure to only fetch new notifications in each River process. A new maintenance service that runs only in SQLite takes care of pruning events in the table older than 5 minutes. A nice property of `river_notification` is that like listen/notify, it all works transactionally as well. New rows are hidden until the transaction that inserted them commits, keeping them invisible from the listener. After commit, the listener sees them and responds. We track a `lastID` in the listener for more efficient reads, which seems like it might be a problem due to out-of-order IDs, but for better or worse this is not possible in SQLite due its single-writer model. IDs always commit in order. You might think that this would create a table that's overly high volume because 1,000 job insertions would create 1,000 notifications, but luckily we've been deduplicating notifications through the use of a notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a long time now. Leadership notifications are also low volume, so this table should never have to be very big at any given time, which is nice. Requires a database migration, but that might dovetail fairly nicely with #1224 (convert SQLite json to jsonb), which also does. [1] https://github.com/russellromney/honker
A fairly long-standing issue with SQLite compared to Postgres is the lack of a listen/notify system because SQLite doesn't offer such a mechanism. This has forced SQLite to operate in poll-only mode which means there's a longer delay before it sees jobs that it can work. Honker [1] hit Hacker News a few weeks/months back. It implements a listen/notify-like mechanism, and when reading about how it's implemented, I realized it wouldn't be all that crazy to do something similar for River. Basically, it's poll based, and relies on polling a single table fairly actively. That's something that's not great in general, but we already do that for poll-only, except that it's multiple tables instead of just one (i.e. `river_job` + `river_leader`). Here, I propose that we implement listen/notify using a similar strategy by adding a `river_notification` in SQLite. Like listen/notify, this table has a listen/notify, along with a timestamp that we can use to track event creation and prune older rows. We add a SQLite `Listener` similar to the one for Postgres, but which polls `river_notification`, remembering the last ID that it so it can make sure to only fetch new notifications in each River process. A new maintenance service that runs only in SQLite takes care of pruning events in the table older than 5 minutes. A nice property of `river_notification` is that like listen/notify, it all works transactionally as well. New rows are hidden until the transaction that inserted them commits, keeping them invisible from the listener. After commit, the listener sees them and responds. We track a `lastID` in the listener for more efficient reads, which seems like it might be a problem due to out-of-order IDs, but for better or worse this is not possible in SQLite due its single-writer model. IDs always commit in order. You might think that this would create a table that's overly high volume because 1,000 job insertions would create 1,000 notifications, but luckily we've been deduplicating notifications through the use of a notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a long time now. Leadership notifications are also low volume, so this table should never have to be very big at any given time, which is nice. Requires a database migration, but that might dovetail fairly nicely with #1224 (convert SQLite json to jsonb), which also does. I've experimented with a few versions of this. This one also adds a `river_notification` to Postgres, though doesn't use it. The rationale here is (1) that the Postgres `river_notification` makes it easier to test the client as if it was SQLite but without having to bring a SQLite driver into the top-level project as a dependency, and (2) it may be a useful alternative to poll-only even in Postgres down the line, where something like a bouncer makes listen/notify difficult to use. [1] https://github.com/russellromney/honker
A fairly long-standing issue with SQLite compared to Postgres is the lack of a listen/notify system because SQLite doesn't offer such a mechanism. This has forced SQLite to operate in poll-only mode which means there's a longer delay before it sees jobs that it can work. Honker [1] hit Hacker News a few weeks/months back. It implements a listen/notify-like mechanism, and when reading about how it's implemented, I realized it wouldn't be all that crazy to do something similar for River. Basically, it's poll based, and relies on polling a single table fairly actively. That's something that's not great in general, but we already do that for poll-only, except that it's multiple tables instead of just one (i.e. `river_job` + `river_leader`). Here, I propose that we implement listen/notify using a similar strategy by adding a `river_notification` in SQLite. Like listen/notify, this table has a listen/notify, along with a timestamp that we can use to track event creation and prune older rows. We add a SQLite `Listener` similar to the one for Postgres, but which polls `river_notification`, remembering the last ID that it so it can make sure to only fetch new notifications in each River process. A new maintenance service that runs only in SQLite takes care of pruning events in the table older than 5 minutes. A nice property of `river_notification` is that like listen/notify, it all works transactionally as well. New rows are hidden until the transaction that inserted them commits, keeping them invisible from the listener. After commit, the listener sees them and responds. We track a `lastID` in the listener for more efficient reads, which seems like it might be a problem due to out-of-order IDs, but for better or worse this is not possible in SQLite due its single-writer model. IDs always commit in order. You might think that this would create a table that's overly high volume because 1,000 job insertions would create 1,000 notifications, but luckily we've been deduplicating notifications through the use of a notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a long time now. Leadership notifications are also low volume, so this table should never have to be very big at any given time, which is nice. Requires a database migration, but that might dovetail fairly nicely with #1224 (convert SQLite json to jsonb), which also does. I've experimented with a few versions of this. This one also adds a `river_notification` to Postgres, though doesn't use it. The rationale here is (1) that the Postgres `river_notification` makes it easier to test the client as if it was SQLite but without having to bring a SQLite driver into the top-level project as a dependency, and (2) it may be a useful alternative to poll-only even in Postgres down the line, where something like a bouncer makes listen/notify difficult to use. [1] https://github.com/russellromney/honker
Alright, so this one's been some cleanup that I've been intending for a
while. Previously, sqlc made using JSONB in SQLite completely unusable
because it sent back the binary format which you're explicitly not
supposed to parse yourself. I ended up fixing this like a year ago in
[1], but getting a sqlc release out the door took so long that I ended
up forgetting about it.
Here, modify our SQLite definitions so that JSON becomes JSONB, and add
migration to convert existing installation values to the same. Luckily,
this doesn't require a table rewrite because both JSON and JSONB are
SQLite BLOB types.
I didn't want to add a migration version just for SQLite, so to keep
everything in sync I also added a version 7 for Postgres that's just a
no-op with a comment.
[1] sqlc-dev/sqlc#3968