Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures
Hi hackers, Alexander,
An assertion failure (server crash in assert-enabled builds) occurs when
WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void
procedures.
Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();
The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that
requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776
Attached patches for both the test case and a potential fix. Please review.
Thanks,
Satya
Attachments:
0002-Add-regression-tests-for-WAIT-FOR-LSN-inside-PL-pgSQ.patchapplication/octet-stream; name=0002-Add-regression-tests-for-WAIT-FOR-LSN-inside-PL-pgSQ.patchDownload+35-1
0001-Fix-assertion-failure-in-WAIT-FOR-LSN-when-used-insi.patchapplication/octet-stream; name=0001-Fix-assertion-failure-in-WAIT-FOR-LSN-when-used-insi.patchDownload+11-1
Hi, Satya!
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.
Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.
------
Regards,
Alexander Korotkov
Supabase
On Thu, Apr 9, 2026 at 2:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Satya!
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.
Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.
--
Best,
Xuneng
Hi Satya, Alexander,
On Thu, Apr 9, 2026 at 2:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
On Thu, Apr 9, 2026 at 2:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Satya!
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.
Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.
Opus, sorry for clicking send incidentally before typing anything...
I had looked at these patches before and didn’t see anything
particularly wrong except:
1. patch 1 unconditionally nulled ActivePortal->portalSnapshot
whenever it was non-NULL after the pop;
2. patch 2 used RAISE NOTICE after WAIT FOR, which seems not excercise
the bug straightforwardly.
I didn't realized the safety implications of releasing a procedure/DO
snapshot during PL execution. I’ve noticed several warnings in the
tree advising against this.
/*
* Ensure there's an active snapshot whilst we execute whatever's
* involved here. Note that this is *not* sufficient to make the
* world safe for TOAST pointers to be included in the returned data:
* the referenced data could have gone away while we didn't hold a
* snapshot. Hence, it's incumbent on PLs that can do COMMIT/ROLLBACK
* to not return TOAST pointers, unless those pointers were fetched
* after the last COMMIT/ROLLBACK in the procedure.
*
* XXX that is a really nasty, hard-to-test requirement. Is there a
* way to remove it?
*/
EnsurePortalSnapshotExists();
--
Best,
Xuneng
Hi Alexnader,
On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
Hi, Satya!
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:An assertion failure (server crash in assert-enabled builds) occurs when
WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void
procedures.Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that
requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please
review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.
Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.
Thanks,
Satya
On Thu, Apr 9, 2026 at 10:04 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
On Thu, Apr 9, 2026 at 2:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
On Thu, Apr 9, 2026 at 2:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Satya!
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.
Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.Opus, sorry for clicking send incidentally before typing anything...
I had looked at these patches before and didn’t see anything
particularly wrong except:
1. patch 1 unconditionally nulled ActivePortal->portalSnapshot
whenever it was non-NULL after the pop;
2. patch 2 used RAISE NOTICE after WAIT FOR, which seems not excercise
the bug straightforwardly.I didn't realized the safety implications of releasing a procedure/DO
snapshot during PL execution. I’ve noticed several warnings in the
tree advising against this./*
* Ensure there's an active snapshot whilst we execute whatever's
* involved here. Note that this is *not* sufficient to make the
* world safe for TOAST pointers to be included in the returned data:
* the referenced data could have gone away while we didn't hold a
* snapshot. Hence, it's incumbent on PLs that can do COMMIT/ROLLBACK
* to not return TOAST pointers, unless those pointers were fetched
* after the last COMMIT/ROLLBACK in the procedure.
*
* XXX that is a really nasty, hard-to-test requirement. Is there a
* way to remove it?
*/
EnsurePortalSnapshotExists();
Regarding functions, function may be part of bigger query running with
particular snapshot. Note that single query is always executed within
the single snapshot (barring EvalPlanQual()) even in read committed
mode. Releasing a snapshot in the middle of the query could cause, if
even we somehow re-acquire a new snapshot, could cause rest of query
to be executed inconsistently with its beginning.
It is probably different for procedures, which don't have to stick to
a single snapshot. But I can still imagine WAIT FOR LSN to be inside
a loop over some query results or something similar. We are now past
FF for PG19. I suggest we should now forbid WAIT FOR LSN both in
functions and procedures. For PG20 we can reconsider some cases when
running WAIT FOR LSN inside the stored procedure is safe.
------
Regards,
Alexander Korotkov
Supabase
On Thu, Apr 9, 2026 at 10:27 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.
Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.
Thank you. I've slightly revised your patch. I'm going to push it if
no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v3-0001-Explicitly-forbid-WAIT-FOR-inside-functions-and-p.patchapplication/octet-stream; name=v3-0001-Explicitly-forbid-WAIT-FOR-inside-functions-and-p.patchDownload+36-7
Hi Alexander,
On Thu, Apr 9, 2026 at 5:28 AM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Thu, Apr 9, 2026 at 10:27 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:An assertion failure (server crash in assert-enabled builds) occurs
when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void
procedures.Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement
that requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please
review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.Thank you. I've slightly revised your patch. I'm going to push it if
no objections.
LGTM, thank you!
At 2026-04-09 20:28:16, "Alexander Korotkov" <aekorotkov@gmail.com> wrote:
On Thu, Apr 9, 2026 at 10:27 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.
Repro:
-- Run this on a standby
CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:
TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776Attached patches for both the test case and a potential fix. Please review.
Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.Thank you. I've slightly revised your patch. I'm going to push it if
no objections.------
Regards,
Alexander Korotkov
Supabase
Hi,
This patch looks good to me overall. I spotted a typo in the commit message:
"it could pass this check causing an error elsewhere. This commit implments"
implments -> implements, missing an "e".
Regard,
Xiaopeng Wang