[Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

Started by Adam Brusselback4 months ago9 messageshackers

adambrusselback@gmail.com

4 months ago

Attached is a patch implementing support for a WHERE clause in REFRESH
MATERIALIZED VIEW.

The syntax allows for targeted refreshes:
REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}');
REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42;
REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01';

I was inspired to implement this feature after watching the Hacking
Postgres discussion on the topic:
https://www.youtube.com/watch?v=6cZvHjDrmlQ

This allows the user to restrict the refresh operation to a subset of the
view. The qualification is applied to the view's output columns. The
optimizer can then push this condition down to the underlying base tables,
avoiding a full scan when only a known subset of data has changed.

Implementation notes:

1. The grammar accepts an optional WHERE clause. We forbid volatile
functions in the clause to ensure correctness.

2. Non-Concurrent Partial Refresh: When `CONCURRENTLY` is not specified,
the operation performs an in-place modification using a `ROW EXCLUSIVE`
lock.
* This mode requires a unique index to ensure constraint violations
are handled correctly (e.g., when a row's values change such that it
"drifts" into or out of the `WHERE` clause scope).
* It executes a Prune + Upsert strategy:
* `DELETE` all rows in the materialized view that match the `WHERE`
clause.
* `INSERT` the new data from the source query.
* It uses `ON CONFLICT DO UPDATE` during the insert phase to handle
concurrency edge cases, ensuring the refresh is robust against constraint
violations.

3. Concurrent Partial Refresh: When `CONCURRENTLY` is specified, it uses
the existing diff/merge infrastructure (`refresh_by_match_merge`), limiting
the scope of the diff (and the temporary table population) to the rows
matching the predicate. This requires an `EXCLUSIVE` lock and a unique
index, consistent with existing concurrent refresh behavior. It is much
slower than `Non-Concurrent Partial Refresh`

4. The execution logic uses SPI to inject the predicate into the source
queries during execution.

I have attached a benchmark suite to validate performance and correctness:

* `setup.sql`: Creates a schema `mv_benchmark` modeling an invoicing
system (`invoices` and `invoice_lines`). It includes an aggregated
materialized view (`invoice_summary`) and a control table
(`invoice_summary_table`).
* `workload_*.sql`: pgbench scripts simulating a high-churn environment
(45% inserts, 10% updates, 45% deletes) to maintain roughly stable dataset
sizes while generating significant refresh work.
* `run_benchmark_comprehensive.sh`: Orchestrates the benchmark across
multiple scale factors and concurrency levels.

The benchmark compares strategies for keeping a summary up to date (vs
baseline):
* Partial Refresh: Triggers on the base table collect modified IDs and
execute `REFRESH MATERIALIZED VIEW ... WHERE ...`.
* Materialized Table (Control): A standard table maintained via complex
PL/pgSQL triggers (the traditional manual workaround).
* Full Refresh (Legacy): Manually refresh the view after changes.

Results are below:
Concurrency: 1 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 5309.05 | 0.002x 0.437x 0.470x
20000 50 | 1209.32 | 0.010x 0.600x 0.598x
20000 1000 | 56.05 | 0.164x 0.594x 0.576x
400000 1 | 5136.91 | 0 x 0.450x 0.487x
400000 50 | 1709.17 | 0 x 0.497x 0.482x
400000 1000 | 110.35 | 0.006x 0.507x 0.460x

Concurrency: 4 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 19197.50 | 0x 0.412x 0.435x
20000 50 | 1016.14 | 0.007x 0.966x 1.036x
20000 1000 | 9.94 | 0.708x 1.401x 1.169x
400000 1 | 19637.36 | 0x 0.436x 0.483x
400000 50 | 4669.32 | 0x 0.574x 0.566x
400000 1000 | 23.26 | 0.029x 1.147x 0.715x

Concurrency: 8 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 30358.32 | 0x 0.440x 0.457x
20000 50 | 262.75 | 0.026x 2.943x 2.740x
20000 1000 | 11.28 | 0.575x 0.840x 0.578x
400000 1 | 36007.15 | 0x 0.430x 0.464x
400000 50 | 6664.58 | 0x 0.563x 0.494x
400000 1000 | 11.61 | 0.058x 1.000x 1.277x

In these tests, the partial refresh behaves as O(delta) rather than
O(total), performing comparably to the manual PL/pgSQL approach but with
significantly lower code complexity for the user.

I recognize that adding a WHERE clause to REFRESH is an extension to the
SQL standard. I believe the syntax is intuitive, but I am open to
discussion regarding alternative implementation strategies or syntax if the
community feels a different approach is warranted.

New regression tests are included in the patch.

This is my first time submitting a patch to PostgreSQL, so please bear with
me if I've missed anything or made any procedural mistakes. I'm happy to
address any feedback.

Thanks,
Adam Brusselback

Kirk Wolak

wolakk@gmail.com

4 months ago

In reply to: Adam Brusselback (#1)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

On Mon, Dec 8, 2025 at 3:58 PM Adam Brusselback <adambrusselback@gmail.com>
wrote:

Attached is a patch implementing support for a WHERE clause in REFRESH
MATERIALIZED VIEW.

The syntax allows for targeted refreshes:
REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}');
REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42;
REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01';

I was inspired to implement this feature after watching the Hacking
Postgres discussion on the topic:
https://www.youtube.com/watch?v=6cZvHjDrmlQ

+1 (But I was in that hacking session).

Our situation was a wonderful MV with all the columns we needed (some
hard to calculate) to augment search data done millions of times/day. It
was a thing of beauty. Until we realized we needed to update 1 record
(vendor inventory UPDATE date/time) every time we processed a file
(something we do 24x7, a hundred times each hour!

For that ONE field, we ended up doing REFRESH MV concurrently; OVER
2,000 times per day.

Our understanding is that many people run into this exact issue. The
cache needs small frequent updates.
(After reading the code that handles MVs, we just created our own TABLE,
and maintain it with a scheduler to rebuild HOURLY,
and when we process the file, a Simple UPDATE is issued for the one column).

While this "Works", the CONCEPT of this patch (untested by me, as of
yet), would have fixed this with far less effort,
and would be easier to maintain.

After I review the code, I will add additional comments.

I am curious what others think? (And FWIW, I believe that the larger the
MV, the MORE this feature is needed,
vs refreshing the ENTIRE view).

Regards...

Show quoted text

Nico Williams

nico@cryptonector.com

4 months ago

In reply to: Kirk Wolak (#2)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

On Tue, Dec 09, 2025 at 12:08:58AM -0500, Kirk Wolak wrote:

Our understanding is that many people run into this exact issue. The
cache needs small frequent updates.
(After reading the code that handles MVs, we just created our own TABLE,
and maintain it with a scheduler to rebuild HOURLY,
and when we process the file, a Simple UPDATE is issued for the one column).

While this "Works", the CONCEPT of this patch (untested by me, as of
yet), would have fixed this with far less effort,
and would be easier to maintain.

After I review the code, I will add additional comments.

I am curious what others think? (And FWIW, I believe that the larger the
MV, the MORE this feature is needed,
vs refreshing the ENTIRE view).

What I do is I have my own materialized view infrastructure, written
entirely in PlPgSQL, and I completely avoid PG's MV support. This
alternative MV scheme creates an actual table for each MV, which means:

- one can update the MV directly (and I do, via triggers)

- one can have triggers on the MV (e.g., to record history)

This is has been very handy for me.

I also have a state table in which to keep track of whether an MV needs
a refresh, and I have a function i can use to mark an MV as needing a
refresh. Marking an MV as needing a refresh sends a NOTIFY, and then I
have a daemon that will refresh views as needed (with some
debouncing/coalescing of notifications).

This way I can have MVs with very complex underlying queries for which
some kinds of updates I can easily write fast triggers for and others
where I can't (or where they would slow down transactions too much) I
simply mark the MV as needing a refresh.

Typical MV queries I have that this works very well for include
transitive reachability closure computations (e.g., all the groups a
thing is a member of, directly and indirectly, or vice versa --
recursive CTEs basically). Though I do now have triggers that can do a
reasonably good job of synchronously and quickly updating MVs with such
queries, it's just I didn't always.

Refreshes are always 'concurrent'.

This is my 80/20 solution to the "Incremental View Maintenance" (IVM)
problem.

A not very current version is here:

https://github.com/twosigma/postgresql-contrib/blob/master/mat_views.sql

If you like it I might be able to get a newer version out. The version
above has a few minor issues:

- it uses DELETE FROM instead of TRUNCATE for its' sort-of temp tables

- using TRUNCATE ends up requiring some care to avoid occasional
deadlocks with VACUUM that are due to using tables as types of the
columns of the deltas tables

- logging -- lots of logging in the newest version

Another issue is that I rely on NATURAL FULL OUTER JOIN to avoid having
to generate ON conditions, but that means that all columns of the
underlying VIEW must not have NULLs. As I've not needed to support
nullable columns in these MVs, I don't mind.

Nico
--

Adam Brusselback

adambrusselback@gmail.com

4 months ago

In reply to: Kirk Wolak (#2)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

Our situation was a wonderful MV with all the columns we needed (some
hard to calculate) to augment search data done millions of times/day. It
was a thing of beauty. Until we realized we needed to update 1 record
(vendor inventory UPDATE date/time) every time we processed a file
(something we do 24x7, a hundred times each hour!

For that ONE field, we ended up doing REFRESH MV concurrently; OVER
2,000 times per day.

Thanks for the feedback and the use case, Kirk.

Regarding that specific scenario where a single column ("last updated" or
similar) churns significantly faster than the heavy-computation columns:
Even with this patch, you might find it beneficial to separate that
high-velocity column into its own small materialized view (or regular view)
and join it to the main MV at query time. That will reduce the bloat you
get on the main MV by quite a lot, especially if you have very wide rows
(which it seems like you do).

I initially tried to implement logic that would allow for direct UPDATEs
(which would enable HOT updates). However, to handle rows that matched the
predicate but were no longer present in the new source data, I had to run
an anti-join to identify them for deletion. That approach caused
performance issues, so I settled on the "Prune + Upsert" strategy (DELETE
matching rows, then INSERT from source).

Because this patch performs a delete/insert cycle, updating that one
timestamp column will still result in rewriting the whole tuple in the MV.

For that ONE field, we ended up doing REFRESH MV concurrently; OVER
2,000 times per day.

That said, 2,000 refreshes per day is nothing for this implementation,
provided your updates are selective enough and your queries allow for
predicate push-down to the base tables.

I look forward to your thoughts after reviewing the code.

Thanks,
Adam Brusselback

Adam Brusselback

adambrusselback@gmail.com

3 months ago

In reply to: Adam Brusselback (#4)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

Hi all,

I've been running some more concurrency tests against this patch
(specifically looking for race conditions), and I found a flaw in the
implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY).

I believe the issue is that the DELETE -> INSERT strategy leaves a
consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
reads, the moment we delete the rows, we lose the physical lock on them. If
a concurrent transaction inserts a colliding row during that gap, the
materialized view ends up inconsistent with the base query (or hits a
constraint violation).

I initially was using SELECT ... FOR UPDATE to lock the rows before
modification, but that lock is (now that I know) obviously lost when the
row is deleted.

My plan is to replace that row-locking strategy with transaction-level
advisory locks inside the refresh logic:

Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid,
hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.

This effectively locks the "logical" ID of the row, preventing concurrent
refreshes on the same ID even while the physical tuple is temporarily gone.
Hash collisions should not have any correctness issues that I can think of.

However, before I sink time into implementing that fix:

Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE ...
in core?
If the community feels this feature is a footgun or conceptually wrong for
Postgres, I'd rather know now before spending more time on this.

If the feature concept is sound, does the advisory lock approach seem like
the right way to handle the concurrency safety here?

Thanks,
Adam Brusselback

Dharin Shah

dharinshah95@gmail.com

3 months ago

In reply to: Adam Brusselback (#5)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

Hey Adam,

Apologies for the delay, and as promised on discord, I did a review of the
current patch (cf 6305) and wanted to share findings that line up with the
thread’s design discussion, plus one additional correctness bug that I
could reproduce.

1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is
built using the unique index metadata. The code currently uses indnatts
when building the ON Conflict (...) target list. That includes INCLUDE
columns, so for an index like:

CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra);
the generated statement becomes effectively ON CONFLICT (id, extra) ...,
which fails with:
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
specification

The fix appears straightforward: use indnkeyatts (key attributes only) when
generating the conflict target, and also when deciding which columns are
“key” for the UPDATE SET clause. I’ve attached a minimal repro SQL script
(repro_include_issue.sql)

2. Another small test quality issue: the regression script has a comment
“Subqueries -> Error” but the expected output shows no error for the
schema-qualified subquery. There is no explicit check forbidding subqueries
in transformRefreshWhereClause(), so schema-qualified subqueries appear
allowed.

Moving on to broader questions

I believe the issue is that the DELETE -> INSERT strategy leaves a
consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
reads, the moment we delete the rows, we lose the physical lock on them. If
a concurrent transaction inserts a colliding row during that gap, the
materialized view ends up inconsistent with the base query (or hits a
constraint violation).

Consistency gap in the non-concurrent mode matches what I’d expect: with
ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE →
INSERT approach can create a window where the old tuple is gone and a
concurrent session can insert a conflicting logical row.

That said, I think it would help the patch to explicitly define the
intended safety model:
1. Is the goal to be safe against concurrent DML on base tables only (i.e.,
refresh sees a snapshot and updates MV accordingly), or also to be safe
against concurrent partial refreshes and direct writes to the MV (when
maintenance is enabled)?
2. Should the non-concurrent partial refresh be “best effort” like normal
DML (user coordinates), or should it be “maintenance-like” (serialized /
logically safe by default)?

If the intent is “safe by default”, I’d encourage documenting very clearly
what’s guaranteed, and adding regression/README-style notes for footguns

From a reviewer standpoint, I think the feature concept is sound and
valuable, but it needs a crisp statement of semantics and safety
boundaries. The tricky part is exactly what you called out: incremental
refresh implies concurrency questions that aren’t present with full rebuild
+ strong locks.

I’m happy to keep reviewing iterations (especially around the advisory lock
approach), and I’ll attach the reproduction scripts and notes I used.

As a possible staging approach: it might be simplest to start with a
conservative serialization model for non-concurrent WHERE (while still
allowing readers), and then iterate toward finer-grained logical locking
if/when needed for throughput.

Thanks,
Dharin

On Sun, Jan 4, 2026 at 3:56 AM Adam Brusselback <adambrusselback@gmail.com>
wrote:

Show quoted text

Hi all,

I've been running some more concurrency tests against this patch
(specifically looking for race conditions), and I found a flaw in the
implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY).

I believe the issue is that the DELETE -> INSERT strategy leaves a
consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
reads, the moment we delete the rows, we lose the physical lock on them. If
a concurrent transaction inserts a colliding row during that gap, the
materialized view ends up inconsistent with the base query (or hits a
constraint violation).

I initially was using SELECT ... FOR UPDATE to lock the rows before
modification, but that lock is (now that I know) obviously lost when the
row is deleted.

My plan is to replace that row-locking strategy with transaction-level
advisory locks inside the refresh logic:

Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid,
hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.

This effectively locks the "logical" ID of the row, preventing concurrent
refreshes on the same ID even while the physical tuple is temporarily gone.
Hash collisions should not have any correctness issues that I can think of.

However, before I sink time into implementing that fix:

Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE
... in core?
If the community feels this feature is a footgun or conceptually wrong for
Postgres, I'd rather know now before spending more time on this.

If the feature concept is sound, does the advisory lock approach seem like
the right way to handle the concurrency safety here?

Thanks,
Adam Brusselback

Dharin Shah

dharinshah95@gmail.com

3 months ago

In reply to: Dharin Shah (#6)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

(repro_include_issue.sql)

Typo fix : test_include_bug.sql (attached file)

Thanks,
Dharin

On Thu, Jan 15, 2026 at 7:46 PM Dharin Shah <dharinshah95@gmail.com> wrote:

Show quoted text

Hey Adam,

Apologies for the delay, and as promised on discord, I did a review of the
current patch (cf 6305) and wanted to share findings that line up with the
thread’s design discussion, plus one additional correctness bug that I
could reproduce.

1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is
built using the unique index metadata. The code currently uses indnatts
when building the ON Conflict (...) target list. That includes INCLUDE
columns, so for an index like:

CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra);
the generated statement becomes effectively ON CONFLICT (id, extra) ...,
which fails with:
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
specification

The fix appears straightforward: use indnkeyatts (key attributes only)
when generating the conflict target, and also when deciding which columns
are “key” for the UPDATE SET clause. I’ve attached a minimal repro SQL
script (repro_include_issue.sql)

2. Another small test quality issue: the regression script has a comment
“Subqueries -> Error” but the expected output shows no error for the
schema-qualified subquery. There is no explicit check forbidding subqueries
in transformRefreshWhereClause(), so schema-qualified subqueries appear
allowed.

Moving on to broader questions

I believe the issue is that the DELETE -> INSERT strategy leaves a
consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
reads, the moment we delete the rows, we lose the physical lock on them. If
a concurrent transaction inserts a colliding row during that gap, the
materialized view ends up inconsistent with the base query (or hits a
constraint violation).

Consistency gap in the non-concurrent mode matches what I’d expect: with
ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE →
INSERT approach can create a window where the old tuple is gone and a
concurrent session can insert a conflicting logical row.

That said, I think it would help the patch to explicitly define the
intended safety model:
1. Is the goal to be safe against concurrent DML on base tables only
(i.e., refresh sees a snapshot and updates MV accordingly), or also to be
safe against concurrent partial refreshes and direct writes to the MV (when
maintenance is enabled)?
2. Should the non-concurrent partial refresh be “best effort” like normal
DML (user coordinates), or should it be “maintenance-like” (serialized /
logically safe by default)?

If the intent is “safe by default”, I’d encourage documenting very clearly
what’s guaranteed, and adding regression/README-style notes for footguns

From a reviewer standpoint, I think the feature concept is sound and
valuable, but it needs a crisp statement of semantics and safety
boundaries. The tricky part is exactly what you called out: incremental
refresh implies concurrency questions that aren’t present with full rebuild
+ strong locks.

I’m happy to keep reviewing iterations (especially around the advisory
lock approach), and I’ll attach the reproduction scripts and notes I used.

As a possible staging approach: it might be simplest to start with a
conservative serialization model for non-concurrent WHERE (while still
allowing readers), and then iterate toward finer-grained logical locking
if/when needed for throughput.

Thanks,
Dharin

On Sun, Jan 4, 2026 at 3:56 AM Adam Brusselback <adambrusselback@gmail.com>
wrote:

Hi all,

I've been running some more concurrency tests against this patch
(specifically looking for race conditions), and I found a flaw in the
implementation for the REFRESH ... WHERE ... mode (without CONCURRENTLY).

I believe the issue is that the DELETE -> INSERT strategy leaves a
consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
reads, the moment we delete the rows, we lose the physical lock on them. If
a concurrent transaction inserts a colliding row during that gap, the
materialized view ends up inconsistent with the base query (or hits a
constraint violation).

I initially was using SELECT ... FOR UPDATE to lock the rows before
modification, but that lock is (now that I know) obviously lost when the
row is deleted.

My plan is to replace that row-locking strategy with transaction-level
advisory locks inside the refresh logic:

Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid,
hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.

This effectively locks the "logical" ID of the row, preventing concurrent
refreshes on the same ID even while the physical tuple is temporarily gone.
Hash collisions should not have any correctness issues that I can think of.

However, before I sink time into implementing that fix:

Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE
... in core?
If the community feels this feature is a footgun or conceptually wrong
for Postgres, I'd rather know now before spending more time on this.

If the feature concept is sound, does the advisory lock approach seem
like the right way to handle the concurrency safety here?

Thanks,
Adam Brusselback

Adam Brusselback

adambrusselback@gmail.com

18 days ago

In reply to: Dharin Shah (#7)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

Hi Dharin, thanks for the review.

1. indnatts vs indnkeyatts

Good catch. Will fix.

2. Subqueries -> Error

That comment is wrong, I never added a check for that because it turned out
to be unnecessary. Will remove.

3. Concurrency gap / safety model

To answer your questions directly:

1. The goal is to be safe against concurrent partial refreshes on
overlapping rows, not just concurrent DML on base tables.
2. The intent is maintenance-like and safe by default.

Because we lose the physical lock on the row after the DELETE, I plan to
enforce that safety default via transaction-level advisory locks acquired
before the DELETE with somethin like:

SELECT pg_advisory_xact_lock(matviewOid, hashtext(ROW(key_cols)::text))
FROM matview
WHERE (condition);

Concurrent refreshes on the same logical rows will serialize while
non-overlapping rows still run in parallel.

This also made me think about whether the CONCURRENTLY keyword is doing the
right thing here. Here's how the guarantees break down across all the
refresh modes:

Because of this, the `CONCURRENTLY` distinction gets inverted with a
`WHERE` clause. With a full refresh, `CONCURRENTLY` is the more permissive
option (allowing readers). But here, the bare `WHERE` path allows both
reads and writes, while `CONCURRENTLY WHERE` blocks writers. Non-concurrent
ends up being the more permissive option, which goes against what the
keyword generally implies.

One option is to swap the two implementations to restore that intuition.
`CONCURRENTLY WHERE` becomes the advisory locks approach (maximum
throughput), and bare `WHERE` becomes the diff approach (conservative,
blocks writers). On the other hand, `CONCURRENTLY` has historically meant
the diff-based algorithm specifically, not just a lower lock level.

I don't have a strong opinion here and would rather let the community
decide. The updated patch will leave the algorithms as-is for now. Happy to
swap them if that's the preferred direction.

Will post an updated patch soon.

Thanks,
Adam Brusselback

Adam Brusselback

adambrusselback@gmail.com

about 20 hours ago

In reply to: Adam Brusselback (#8)

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

Hi Dharin,

Attached is an updated patch addressing your review comments and fixing the
concurrency model for the non-concurrent path.

Regarding the items you brought up:
1. Fixed the indnatts vs indnkeyatts issue for generating the ON CONFLICT
target.
2. Removed the incorrect regression test comment regarding subqueries.

Regarding the concurrency gap and safety model:
In my last email, I mentioned planning to use transaction-level advisory
locks to fix the consistency gap. After prototyping it, I had to abandon
that approach. Testing revealed that it falls over at scale, quickly
hitting `max_locks_per_transaction` limits and causing issues with bulk
operations. I worked on this for a while before deciding it wasn't workable.

Instead, I went a different direction. The non-concurrent partial refresh
now uses a different two-step strategy:

1. It first executes a `SELECT FROM mv WHERE ... FOR UPDATE` to lock
existing rows matching the predicate. This serializes concurrent partial
refreshes on overlapping rows while allowing non-overlapping refreshes to
proceed in parallel.
2. It then executes a single CTE that evaluates the underlying query,
upserts the results into the matview, and deletes rows that no longer match
the predicate via an anti-join.

In my testing, this approach had similar performance to the original
implementation, but hasn't exhibited the same correctness issues.

Thanks,
Adam Brusselback

[Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

Attachments:

Attachments:

Attachments: