Skip to content

Convert SQLite JSON columns to JSONB#1224

Open
brandur wants to merge 1 commit into
masterfrom
brandur-sqlite-jsonb-2
Open

Convert SQLite JSON columns to JSONB#1224
brandur wants to merge 1 commit into
masterfrom
brandur-sqlite-jsonb-2

Conversation

@brandur
Copy link
Copy Markdown
Contributor

@brandur brandur commented Apr 21, 2026

Alright, so this one's been some cleanup that I've been intending for a
while. Previously, sqlc made using JSONB in SQLite completely unusable
because it sent back the binary format which you're explicitly not
supposed to parse yourself. I ended up fixing this like a year ago in
[1], but getting a sqlc release out the door took so long that I ended
up forgetting about it.

Here, modify our SQLite definitions so that JSON becomes JSONB, and add
migration to convert existing installation values to the same. Luckily,
this doesn't require a table rewrite because both JSON and JSONB are
SQLite BLOB types.

I didn't want to add a migration version just for SQLite, so to keep
everything in sync I also added a version 7 for Postgres that's just a
no-op with a comment.

[1] sqlc-dev/sqlc#3968

@brandur brandur force-pushed the brandur-sqlite-jsonb-2 branch from d958f6b to 0df950c Compare April 21, 2026 22:50
Alright, so this one's been some cleanup that I've been intending for a
while. Previously, sqlc made using JSONB in SQLite completely unusable
because it sent back the binary format which you're explicitly not
supposed to parse yourself. I ended up fixing this like a year ago in
[1], but getting a sqlc release out the door took so long that I ended
up forgetting about it.

Here, modify our SQLite definitions so that JSON becomes JSONB, and add
migration to convert existing installation values to the same. Luckily,
this doesn't require a table rewrite because both JSON and JSONB are
SQLite BLOB types.

I didn't want to add a migration version just for SQLite, so to keep
everything in sync I also added a version 7 for Postgres that's just a
no-op with a comment.

[1] sqlc-dev/sqlc#3968
@brandur brandur force-pushed the brandur-sqlite-jsonb-2 branch from 0df950c to bfe5ba0 Compare April 21, 2026 22:56
@brandur
Copy link
Copy Markdown
Contributor Author

brandur commented Apr 21, 2026

@bgentry Thoughts on this? I'm kinda wondering if we should try to bundle it up with #1115 (database cleanup) so we don't waste any migration versions. 007 would still be optional for Postgres since the cleanup wouldn't be strictly necessary.

@brandur brandur requested a review from bgentry April 21, 2026 22:59
@brandur brandur added the needs db migration Fixing this issue would require a new database migration label May 9, 2026
Copy link
Copy Markdown
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the broader concern of no-op migrations for the vast majority of users.

Comment on lines +1 to +2
-- No-op migration to keep version numbers in sync across all drivers.
-- The SQLite driver uses this version for a JSONB conversion.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered this possibility before but tbh I don't love no-op migrations for the vast majority of users just to facilitate something in sqlite. We could consider pulling in some of these other items to make the v7 migration broadly useful everywhere? That would at least punt the issue for now. https://github.com/riverqueue/river/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22needs%20db%20migration%22

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured we'd probably combined this with #1115 at the very least (which contains fixes for all three items you linked).

I don't love the idea of shipping a DB migration that isn't totally necessary, but it's made a little more complicated by the fact that the longer we go without the migration, the more people will have to migrate later, so there is some cost to not doing this.

The good news with #1224 and #1115 is that we may be able to ship the migration, but make a clear note saying it's not a critical migration for Postgres, so if people want to defer it, they can. If there's ever another migration later on, they could always run it to get the cleanups at that point in time.

brandur added a commit that referenced this pull request Jun 5, 2026
A fairly long-standing issue with SQLite compared to Postgres is the
lack of a listen/notify system because SQLite doesn't offer such a
mechanism. This has forced SQLite to operate in poll-only mode which
means there's a longer delay before it sees jobs that it can work.

Honker [1] hit Hacker News a few weeks/months back. It implements a
listen/notify-like mechanism, and when reading about how it's
implemented, I realized it wouldn't be all that crazy to do something
similar for River. Basically, it's poll based, and relies on polling a
single table fairly actively. That's something that's not great in
general, but we already do that for poll-only, except that it's multiple
tables instead of just one (i.e. `river_job` + `river_leader`).

Here, I propose that we implement listen/notify using a similar strategy
by adding a `river_notification` in SQLite. Like listen/notify, this
table has a listen/notify, along with a timestamp that we can use to
track event creation and prune older rows. We add a SQLite `Listener`
similar to the one for Postgres, but which polls `river_notification`,
remembering the last ID that it so it can make sure to only fetch new
notifications in each River process.

A new maintenance service that runs only in SQLite takes care of pruning
events in the table older than 5 minutes.

A nice property of `river_notification` is that like listen/notify, it
all works transactionally as well. New rows are hidden until the
transaction that inserted them commits, keeping them invisible from the
listener. After commit, the listener sees them and responds. We track a
`lastID` in the listener for more efficient reads, which seems like it
might be a problem due to out-of-order IDs, but for better or worse this
is not possible in SQLite due its single-writer model. IDs always commit
in order.

You might think that this would create a table that's overly high volume
because 1,000 job insertions would create 1,000 notifications, but
luckily we've been deduplicating notifications through the use of a
notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a
long time now. Leadership notifications are also low volume, so this
table should never have to be very big at any given time, which is nice.

Requires a database migration, but that might dovetail fairly nicely
with #1224 (convert SQLite json to jsonb), which also does.

[1] https://github.com/russellromney/honker
brandur added a commit that referenced this pull request Jun 5, 2026
A fairly long-standing issue with SQLite compared to Postgres is the
lack of a listen/notify system because SQLite doesn't offer such a
mechanism. This has forced SQLite to operate in poll-only mode which
means there's a longer delay before it sees jobs that it can work.

Honker [1] hit Hacker News a few weeks/months back. It implements a
listen/notify-like mechanism, and when reading about how it's
implemented, I realized it wouldn't be all that crazy to do something
similar for River. Basically, it's poll based, and relies on polling a
single table fairly actively. That's something that's not great in
general, but we already do that for poll-only, except that it's multiple
tables instead of just one (i.e. `river_job` + `river_leader`).

Here, I propose that we implement listen/notify using a similar strategy
by adding a `river_notification` in SQLite. Like listen/notify, this
table has a listen/notify, along with a timestamp that we can use to
track event creation and prune older rows. We add a SQLite `Listener`
similar to the one for Postgres, but which polls `river_notification`,
remembering the last ID that it so it can make sure to only fetch new
notifications in each River process.

A new maintenance service that runs only in SQLite takes care of pruning
events in the table older than 5 minutes.

A nice property of `river_notification` is that like listen/notify, it
all works transactionally as well. New rows are hidden until the
transaction that inserted them commits, keeping them invisible from the
listener. After commit, the listener sees them and responds. We track a
`lastID` in the listener for more efficient reads, which seems like it
might be a problem due to out-of-order IDs, but for better or worse this
is not possible in SQLite due its single-writer model. IDs always commit
in order.

You might think that this would create a table that's overly high volume
because 1,000 job insertions would create 1,000 notifications, but
luckily we've been deduplicating notifications through the use of a
notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a
long time now. Leadership notifications are also low volume, so this
table should never have to be very big at any given time, which is nice.

Requires a database migration, but that might dovetail fairly nicely
with #1224 (convert SQLite json to jsonb), which also does.

[1] https://github.com/russellromney/honker
brandur added a commit that referenced this pull request Jun 5, 2026
A fairly long-standing issue with SQLite compared to Postgres is the
lack of a listen/notify system because SQLite doesn't offer such a
mechanism. This has forced SQLite to operate in poll-only mode which
means there's a longer delay before it sees jobs that it can work.

Honker [1] hit Hacker News a few weeks/months back. It implements a
listen/notify-like mechanism, and when reading about how it's
implemented, I realized it wouldn't be all that crazy to do something
similar for River. Basically, it's poll based, and relies on polling a
single table fairly actively. That's something that's not great in
general, but we already do that for poll-only, except that it's multiple
tables instead of just one (i.e. `river_job` + `river_leader`).

Here, I propose that we implement listen/notify using a similar strategy
by adding a `river_notification` in SQLite. Like listen/notify, this
table has a listen/notify, along with a timestamp that we can use to
track event creation and prune older rows. We add a SQLite `Listener`
similar to the one for Postgres, but which polls `river_notification`,
remembering the last ID that it so it can make sure to only fetch new
notifications in each River process.

A new maintenance service that runs only in SQLite takes care of pruning
events in the table older than 5 minutes.

A nice property of `river_notification` is that like listen/notify, it
all works transactionally as well. New rows are hidden until the
transaction that inserted them commits, keeping them invisible from the
listener. After commit, the listener sees them and responds. We track a
`lastID` in the listener for more efficient reads, which seems like it
might be a problem due to out-of-order IDs, but for better or worse this
is not possible in SQLite due its single-writer model. IDs always commit
in order.

You might think that this would create a table that's overly high volume
because 1,000 job insertions would create 1,000 notifications, but
luckily we've been deduplicating notifications through the use of a
notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a
long time now. Leadership notifications are also low volume, so this
table should never have to be very big at any given time, which is nice.

Requires a database migration, but that might dovetail fairly nicely
with #1224 (convert SQLite json to jsonb), which also does.

I've experimented with a few versions of this. This one also adds a
`river_notification` to Postgres, though doesn't use it. The rationale
here is (1) that the Postgres `river_notification` makes it easier to
test the client as if it was SQLite but without having to bring a SQLite
driver into the top-level project as a dependency, and (2) it may be a
useful alternative to poll-only even in Postgres down the line, where
something like a bouncer makes listen/notify difficult to use.

[1] https://github.com/russellromney/honker
brandur added a commit that referenced this pull request Jun 5, 2026
A fairly long-standing issue with SQLite compared to Postgres is the
lack of a listen/notify system because SQLite doesn't offer such a
mechanism. This has forced SQLite to operate in poll-only mode which
means there's a longer delay before it sees jobs that it can work.

Honker [1] hit Hacker News a few weeks/months back. It implements a
listen/notify-like mechanism, and when reading about how it's
implemented, I realized it wouldn't be all that crazy to do something
similar for River. Basically, it's poll based, and relies on polling a
single table fairly actively. That's something that's not great in
general, but we already do that for poll-only, except that it's multiple
tables instead of just one (i.e. `river_job` + `river_leader`).

Here, I propose that we implement listen/notify using a similar strategy
by adding a `river_notification` in SQLite. Like listen/notify, this
table has a listen/notify, along with a timestamp that we can use to
track event creation and prune older rows. We add a SQLite `Listener`
similar to the one for Postgres, but which polls `river_notification`,
remembering the last ID that it so it can make sure to only fetch new
notifications in each River process.

A new maintenance service that runs only in SQLite takes care of pruning
events in the table older than 5 minutes.

A nice property of `river_notification` is that like listen/notify, it
all works transactionally as well. New rows are hidden until the
transaction that inserted them commits, keeping them invisible from the
listener. After commit, the listener sees them and responds. We track a
`lastID` in the listener for more efficient reads, which seems like it
might be a problem due to out-of-order IDs, but for better or worse this
is not possible in SQLite due its single-writer model. IDs always commit
in order.

You might think that this would create a table that's overly high volume
because 1,000 job insertions would create 1,000 notifications, but
luckily we've been deduplicating notifications through the use of a
notify limiter (i.e. `insertNotifyLimiter.ShouldTrigger(queue)`) for a
long time now. Leadership notifications are also low volume, so this
table should never have to be very big at any given time, which is nice.

Requires a database migration, but that might dovetail fairly nicely
with #1224 (convert SQLite json to jsonb), which also does.

I've experimented with a few versions of this. This one also adds a
`river_notification` to Postgres, though doesn't use it. The rationale
here is (1) that the Postgres `river_notification` makes it easier to
test the client as if it was SQLite but without having to bring a SQLite
driver into the top-level project as a dependency, and (2) it may be a
useful alternative to poll-only even in Postgres down the line, where
something like a bouncer makes listen/notify difficult to use.

[1] https://github.com/russellromney/honker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs db migration Fixing this issue would require a new database migration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants