Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

Started by SATYANARAYANA NARLAPURAM1 day ago9 messageshackers
Jump to latest
#1SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com

Hi hackers, Alexander,

An assertion failure (server crash in assert-enabled builds) occurs when
WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void
procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that
requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please review.

Thanks,
Satya

Attachments:

0002-Add-regression-tests-for-WAIT-FOR-LSN-inside-PL-pgSQ.patchapplication/octet-stream; name=0002-Add-regression-tests-for-WAIT-FOR-LSN-inside-PL-pgSQ.patchDownload+35-1
0001-Fix-assertion-failure-in-WAIT-FOR-LSN-when-used-insi.patchapplication/octet-stream; name=0001-Fix-assertion-failure-in-WAIT-FOR-LSN-when-used-insi.patchDownload+11-1
#2Alexander Korotkov
aekorotkov@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#1)
Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

Hi, Satya!

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

------
Regards,
Alexander Korotkov
Supabase

#3Xuneng Zhou
xunengzhou@gmail.com
In reply to: Alexander Korotkov (#2)
Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

On Thu, Apr 9, 2026 at 2:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi, Satya!

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

--
Best,
Xuneng

#4Xuneng Zhou
xunengzhou@gmail.com
In reply to: Xuneng Zhou (#3)
Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

Hi Satya, Alexander,

On Thu, Apr 9, 2026 at 2:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

On Thu, Apr 9, 2026 at 2:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi, Satya!

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

Opus, sorry for clicking send incidentally before typing anything...

I had looked at these patches before and didn’t see anything
particularly wrong except:
1. patch 1 unconditionally nulled ActivePortal->portalSnapshot
whenever it was non-NULL after the pop;
2. patch 2 used RAISE NOTICE after WAIT FOR, which seems not excercise
the bug straightforwardly.

I didn't realized the safety implications of releasing a procedure/DO
snapshot during PL execution. I’ve noticed several warnings in the
tree advising against this.

/*
* Ensure there's an active snapshot whilst we execute whatever's
* involved here. Note that this is *not* sufficient to make the
* world safe for TOAST pointers to be included in the returned data:
* the referenced data could have gone away while we didn't hold a
* snapshot. Hence, it's incumbent on PLs that can do COMMIT/ROLLBACK
* to not return TOAST pointers, unless those pointers were fetched
* after the last COMMIT/ROLLBACK in the procedure.
*
* XXX that is a really nasty, hard-to-test requirement. Is there a
* way to remove it?
*/
EnsurePortalSnapshotExists();

--
Best,
Xuneng

#5SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Alexander Korotkov (#2)
Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

Hi Alexnader,

On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com>
wrote:

Hi, Satya!

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs when

WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void
procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that

requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please

review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.

Thanks,
Satya

Attachments:

v2-0001-waitforlsn-forbid-in-functions.patchapplication/octet-stream; name=v2-0001-waitforlsn-forbid-in-functions.patchDownload+16-3
v2-0001-waitforlsn-tests.patchapplication/octet-stream; name=v2-0001-waitforlsn-tests.patchDownload+1-1
#6Alexander Korotkov
aekorotkov@gmail.com
In reply to: Xuneng Zhou (#4)
Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

On Thu, Apr 9, 2026 at 10:04 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:

On Thu, Apr 9, 2026 at 2:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

On Thu, Apr 9, 2026 at 2:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi, Satya!

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

Opus, sorry for clicking send incidentally before typing anything...

I had looked at these patches before and didn’t see anything
particularly wrong except:
1. patch 1 unconditionally nulled ActivePortal->portalSnapshot
whenever it was non-NULL after the pop;
2. patch 2 used RAISE NOTICE after WAIT FOR, which seems not excercise
the bug straightforwardly.

I didn't realized the safety implications of releasing a procedure/DO
snapshot during PL execution. I’ve noticed several warnings in the
tree advising against this.

/*
* Ensure there's an active snapshot whilst we execute whatever's
* involved here. Note that this is *not* sufficient to make the
* world safe for TOAST pointers to be included in the returned data:
* the referenced data could have gone away while we didn't hold a
* snapshot. Hence, it's incumbent on PLs that can do COMMIT/ROLLBACK
* to not return TOAST pointers, unless those pointers were fetched
* after the last COMMIT/ROLLBACK in the procedure.
*
* XXX that is a really nasty, hard-to-test requirement. Is there a
* way to remove it?
*/
EnsurePortalSnapshotExists();

Regarding functions, function may be part of bigger query running with
particular snapshot. Note that single query is always executed within
the single snapshot (barring EvalPlanQual()) even in read committed
mode. Releasing a snapshot in the middle of the query could cause, if
even we somehow re-acquire a new snapshot, could cause rest of query
to be executed inconsistently with its beginning.

It is probably different for procedures, which don't have to stick to
a single snapshot. But I can still imagine WAIT FOR LSN to be inside
a loop over some query results or something similar. We are now past
FF for PG19. I suggest we should now forbid WAIT FOR LSN both in
functions and procedures. For PG20 we can reconsider some cases when
running WAIT FOR LSN inside the stored procedure is safe.

------
Regards,
Alexander Korotkov
Supabase

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: SATYANARAYANA NARLAPURAM (#5)
Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

On Thu, Apr 9, 2026 at 10:27 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.

Thank you. I've slightly revised your patch. I'm going to push it if
no objections.

------
Regards,
Alexander Korotkov
Supabase

Attachments:

v3-0001-Explicitly-forbid-WAIT-FOR-inside-functions-and-p.patchapplication/octet-stream; name=v3-0001-Explicitly-forbid-WAIT-FOR-inside-functions-and-p.patchDownload+36-7
#8SATYANARAYANA NARLAPURAM
satyanarlapuram@gmail.com
In reply to: Alexander Korotkov (#7)
Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

Hi Alexander,

On Thu, Apr 9, 2026 at 5:28 AM Alexander Korotkov <aekorotkov@gmail.com>
wrote:

On Thu, Apr 9, 2026 at 10:27 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com>

wrote:

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs

when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void
procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement

that requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please

review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.

Thank you. I've slightly revised your patch. I'm going to push it if
no objections.

LGTM, thank you!

#9wang.xiao.peng
wxp_728@163.com
In reply to: Alexander Korotkov (#7)
Re:Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

At 2026-04-09 20:28:16, "Alexander Korotkov" <aekorotkov@gmail.com> wrote:

On Thu, Apr 9, 2026 at 10:27 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

On Wed, Apr 8, 2026 at 11:00 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:

An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.

Repro:

-- Run this on a standby

CREATE PROCEDURE test_wait()
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
WAIT FOR LSN '0/1234' INTO result;
RAISE NOTICE '%', result;
END;
$$;
CALL test_wait();

The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:

TRAP: failed Assert("portal->portalSnapshot == NULL"),
File: "pquery.c", Line: 1776

Attached patches for both the test case and a potential fix. Please review.

Thank you for reporting. But I doubt the fix is correct. Even that
this particular might work OK, I don't think it's safe to release
snapshots belonging to functions/procedures: it might affect them. I
tend to think we must forbid wrapping WAIT FOR LSN with
functions/procedures. I'll explore more on this today.

Agreed, attached a v2 patch with your suggestion on preventing it running
from procedures.

Thank you. I've slightly revised your patch. I'm going to push it if
no objections.

------
Regards,
Alexander Korotkov

Supabase

Hi,
This patch looks good to me overall. I spotted a typo in the commit message:

"it could pass this check causing an error elsewhere. This commit implments"

implments -> implements, missing an "e".

Regard,
Xiaopeng Wang