fix(sql): fix bug where queries using `IN` clauses with INT/LONG columns containing `-1` return incorrect results. by kafka1991 · Pull Request #6439 · questdb/questdb

kafka1991 · 2025-11-24T07:20:19Z

Fixes a bug where queries using IN clauses with INT/LONG columns containing -1 could return incorrect results.

Example

 SELECT * FROM table WHERE int_column IN (-1, 1);

Root cause: The internal HashSet implementation used -1 as a special marker value, which conflicted with actual -1 values in the data, causing incorrect query results.

close #6438

coderabbitai · 2025-11-24T07:20:27Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce a new DirectLongHashSet off-heap hash set implementation for long keys with configurable memory management and load factors, replace the usage of LongHashSet with DirectLongHashSet in InLongFunctionFactory, and add comprehensive tests including a test case that reproduces the original bug involving mixed positive and negative values in IN clauses.

Changes

Cohort / File(s)	Summary
New DirectLongHashSet Data Structure `core/src/main/java/io/questdb/std/DirectLongHashSet.java`	Added new off-heap hash set implementation for long keys with support for zero-key handling via dedicated flag, configurable load factor and capacity, insertion/lookup/removal/clear semantics, linear probing collision handling, rehashing on capacity overflow, iteration-friendly serialization via toSink, and lifecycle methods (close, clear). Implements Closeable, Mutable, and Sinkable interfaces.
InLongFunctionFactory Refactoring `core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java`	Replaced LongHashSet with DirectLongHashSet for in-value collections, introduced memory tagging and explicit RSS memory management, added try-catch around DirectLongHashSet population for proper deallocation on failure, updated helper methods to accept DirectLongHashSet, extended InLongConstFunction and InLongRuntimeConstFunction with close() and init() methods for resource management.
DirectLongHashSet Unit Tests `core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java`	Added comprehensive test suite covering basic add/contains behavior, zero-key handling, size tracking, clear operations, excludes behavior, collision handling, mixed positive/negative values, rehashing/scaling, removal operations, sentinel values (Long.MIN_VALUE, Long.MAX_VALUE), and serialization via toSink.
IN Predicate Tests `core/src/test/java/io/questdb/test/griffin/InTest.java`	Added testIntInWithSentinelValues test method validating IN predicates with sentinel values including normal values, null, and extreme values with parameterized binding.
Regex Replace Function Test Utility `core/src/test/java/io/questdb/test/griffin/engine/functions/regex/RegexpReplaceStrFunctionFactoryTest.java`	Replaced hard-coded long SQL string in testStackOverFlowError with dynamically constructed SQL using StringSink and loop to generate 2000 "address" tokens, preserving failure expectation validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

DirectLongHashSet memory management: Off-heap allocation, deallocation paths, rehashing logic with linear probing, and zero-key special handling require careful verification
InLongFunctionFactory resource lifecycle: Integration of DirectLongHashSet with factory pattern, close() and init() methods, and try-catch error handling paths
Collision and edge case handling: Verify rehashing triggers, boundary conditions with zero keys, mixed positive/negative values, and sentinel values
Test coverage validation: Ensure test cases comprehensively cover the bug scenario (#6438) involving mixed positive and negative values in IN clauses

Suggested reviewers

bluestreak01

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Out of Scope Changes check	❓ Inconclusive	Changes to RegexpReplaceStrFunctionFactoryTest and modifications to InTest appear unrelated to the sentinel value fix for InLongFunction and DirectLongHashSet implementation.	Clarify whether test modifications in RegexpReplaceStrFunctionFactoryTest and InTest are necessary for this PR or should be in separate commits.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR fully addresses issue #6438 by replacing LongHashSet with DirectLongHashSet using Long.MAX_VALUE as sentinel, enabling IN lists with negative values to return correct results.
Title check	✅ Passed	The title accurately describes the main fix: resolving a bug in IN clause queries where -1 values in INT/LONG columns were incorrectly handled due to sentinel value conflicts.
Description check	✅ Passed	The description clearly explains the bug, provides a concrete example, identifies the root cause (sentinel value conflict with -1), and references the related issue.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java

kafka1991 · 2025-11-24T10:05:04Z

@coderabbitai review

coderabbitai · 2025-11-24T10:05:10Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java (1)
120-121: Avoid double load‑factor scaling when constructing DirectLongHashSet

DirectLongHashSet’s ctor already scales its capacity argument by 1 / loadFactor:
this.capacity = Math.max(Numbers.ceilPow2((int) (capacity / loadFactor)), MIN_CAPACITY);
But callers here pre-scale as well:
DirectLongHashSet inVals =
    new DirectLongHashSet((int) (argCount / DirectLongHashSet.DEFAULT_LOAD_FACTOR),
                          DirectLongHashSet.DEFAULT_LOAD_FACTOR,
                          MemoryTag.NATIVE_FUNC_RSS);

this.inSet =
    new DirectLongHashSet((int) ((valueFunctions.size() - 1) / DirectLongHashSet.DEFAULT_LOAD_FACTOR),
                          DirectLongHashSet.DEFAULT_LOAD_FACTOR,
                          MemoryTag.NATIVE_FUNC_RSS);
This effectively applies the 1 / loadFactor factor twice, significantly over-allocating off-heap memory for large IN-lists.

If the ctor’s capacity parameter is intended to be “expected element count” (which matches the name), callers should pass the raw element count and let the ctor do the scaling:
- DirectLongHashSet inVals = new DirectLongHashSet(
-         (int) (argCount / DirectLongHashSet.DEFAULT_LOAD_FACTOR),
-         DirectLongHashSet.DEFAULT_LOAD_FACTOR,
-         MemoryTag.NATIVE_FUNC_RSS);
+ DirectLongHashSet inVals = new DirectLongHashSet(
+         argCount,
+         DirectLongHashSet.DEFAULT_LOAD_FACTOR,
+         MemoryTag.NATIVE_FUNC_RSS);
...
- this.inSet = new DirectLongHashSet(
-         (int) ((valueFunctions.size() - 1) / DirectLongHashSet.DEFAULT_LOAD_FACTOR),
-         DirectLongHashSet.DEFAULT_LOAD_FACTOR,
-         MemoryTag.NATIVE_FUNC_RSS);
+ this.inSet = new DirectLongHashSet(
+         valueFunctions.size() - 1,
+         DirectLongHashSet.DEFAULT_LOAD_FACTOR,
+         MemoryTag.NATIVE_FUNC_RSS);
This keeps the load factor behavior while avoiding unnecessary off-heap over-allocation.

Also applies to: 230-231
core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java (1)
220-231: Minor: avoid fully‑qualified StringSink where it’s already imported

Here you can rely on the existing import instead of the fully qualified name:
-            io.questdb.std.str.StringSink sink = new io.questdb.std.str.StringSink();
+            StringSink sink = new StringSink();
This keeps the test code a bit cleaner and consistent with the other tests using StringSink.
core/src/main/java/io/questdb/std/DirectLongHashSet.java (1)

37-70: DirectLongHashSet implementation looks correct; only minor API/robustness nits

Algorithmically this is solid:

Open addressing with linear probing and backward‑shift deletion (removeAt + canMoveBack) is implemented correctly.

Zero is handled via hasZero so it doesn’t conflict with the internal 0‑as‑empty sentinel.

size(), clear(), rehash(), and toSink() all behave consistently with that design, including coverage for sentinel values and ordering (as validated in the tests).

Two small nits to consider:

Clarify/lock in ctor semantics

The ctor currently interprets its capacity argument as an expected element count and scales it by 1 / loadFactor internally. It would be good to document this explicitly in the Javadoc (and keep callers from pre‑scaling), e.g., “capacity is the expected number of elements”.

Optional: guard against use after close()

Methods like add, contains, remove, and keyIndex assume memStart != 0. If someone accidentally calls them after close(), they’ll operate on addresses derived from 0. Adding a lightweight guard (e.g., if (memStart == 0) throw new IllegalStateException("closed")) would make misuse fail fast instead of doing undefined off‑heap access. This is optional but can help future refactors.

Also applies to: 83-103, 115-188, 190-257, 259-285

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 555f6d4 and fa2ef16.

📒 Files selected for processing (5)

core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java (6 hunks)
core/src/main/java/io/questdb/std/DirectLongHashSet.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/InTest.java (2 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/regex/RegexpReplaceStrFunctionFactoryTest.java (2 hunks)
core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-19T12:21:00.062Z

Learnt from: jerrinot
Repo: questdb/questdb PR: 6413
File: core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:11982-12002
Timestamp: 2025-11-19T12:21:00.062Z
Learning: QuestDB Java tests use a deterministic random seed. The test utilities (e.g., io.questdb.test.tools.TestUtils and io.questdb.std.Rnd) produce reproducible sequences, so rnd_* functions (including rnd_uuid4) yield deterministic outputs across runs. Do not flag tests in core/src/test/** that assert against values produced by rnd_* as flaky due to randomness.

Applied to files:

core/src/test/java/io/questdb/test/griffin/InTest.java
core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java

📚 Learning: 2025-07-31T08:21:17.390Z

Learnt from: mtopolnik
Repo: questdb/questdb PR: 5997
File: core/src/main/java/io/questdb/griffin/engine/functions/array/DoubleArrayAccessFunctionFactory.java:127-131
Timestamp: 2025-07-31T08:21:17.390Z
Learning: In DoubleArrayAccessFunctionFactory.java, overflow validation for array index casting from long to int is performed in the validateIndexArgs() method rather than at each individual cast site. The validateIndexArgs() method is called early in newInstance() to validate all index arguments, making subsequent casts safe.

Applied to files:

core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java

🧬 Code graph analysis (4)

core/src/test/java/io/questdb/test/griffin/InTest.java (1)

core/src/main/java/io/questdb/std/Numbers.java (1)

Numbers (44-3523)

core/src/main/java/io/questdb/std/DirectLongHashSet.java (5)

core/src/main/java/io/questdb/std/MemoryTag.java (1)

MemoryTag (27-184)

core/src/main/java/io/questdb/std/Numbers.java (1)

Numbers (44-3523)

core/src/main/java/io/questdb/std/Unsafe.java (1)

Unsafe (40-542)

core/src/main/java/io/questdb/std/Vect.java (1)

Vect (27-501)

core/src/main/java/io/questdb/std/Hash.java (1)

Hash (30-229)

core/src/test/java/io/questdb/test/griffin/engine/functions/regex/RegexpReplaceStrFunctionFactoryTest.java (1)

core/src/main/java/io/questdb/std/str/StringSink.java (1)

StringSink (31-194)

core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java (2)

core/src/main/java/io/questdb/std/DirectLongHashSet.java (1)

DirectLongHashSet (36-286)

core/src/main/java/io/questdb/std/Rnd.java (1)

Rnd (39-481)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: New pull request (Check Changes Check changes)
GitHub Check: build
GitHub Check: build

🔇 Additional comments (4)

core/src/test/java/io/questdb/test/griffin/engine/functions/regex/RegexpReplaceStrFunctionFactoryTest.java (1)

90-100: Dynamic SQL construction via StringSink is fine

The looped construction preserves the original SQL structure while avoiding a massive literal, and the comma/infix logic (i > 0) looks correct. Initial capacity 8192 is just a hint given StringSink’s growth semantics, so no risk there.

core/src/test/java/io/questdb/test/griffin/InTest.java (1)

27-29: Sentinel/null IN regression test is well-targeted

This test covers INT IN-lists with negative values, zero, and both literal null and Numbers.INT_NULL via bind variables, plus asserts the execution plan shows a sorted, null-aware in-set. That’s exactly the scenario that previously failed and should guard the new DirectLongHashSet-based implementation going forward. (Deterministic rnd_* usage is fine here.)

Also applies to: 246-305

core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java (1)

38-42: Off-heap set usage and lifecycle wiring look correct

Using DirectLongHashSet removes the sentinel collision issue and the ownership model is clear:

Const case: the set is built once, freed on failure in newInstance, and later freed in InLongConstFunction.close() via Misc.free(inSet).

Runtime-const case: the set is allocated in the ctor, repopulated in init() via clear() + parseToLong, and freed in close().

getBool() and toPlan() simply delegate to inSet.contains() / inSet.toSink(), so INT/LONG/null value semantics are now driven by the set rather than magic sentinels.

No correctness issues spotted in these paths.

Also applies to: 120-127, 183-217, 220-255

core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java (1)

35-272: Good behavioral and edge‑case coverage for DirectLongHashSet

These tests exercise all the important aspects of the new set: duplicates, zero handling, positive/negative/sentinel values, rehashing, removal (including collision cases), and toSink ordering. The use of try‑with‑resources keeps off‑heap memory safe.

bluestreak01 · 2025-11-24T17:13:58Z

hey @kafka1991 could you update PR description to make it use-facing. "fix" will be included in public release notes

UtkarshDashora · 2025-11-24T18:21:59Z

✅ Passed checks (3 passed)

kafka1991 · 2025-11-25T00:41:29Z

hey @kafka1991 could you update PR description to make it use-facing. "fix" will be included in public release notes

fixed

core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java

core/src/main/java/io/questdb/std/DirectLongHashSet.java

core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java

glasstiger · 2025-11-25T20:16:07Z

[PR Coverage check]

😍 pass : 115 / 122 (94.26%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java	11	14	78.57%
🔵	io/questdb/std/DirectLongHashSet.java	104	108	96.30%

fix Longhashset usage in InlongFunction

033a7ba

kafka1991 changed the title ~~fix(sql): fix LongHashSet's sentinel for InLongFunction~~ fix(sql): fix LongHashSet's sentinel value for InLongFunction Nov 24, 2025

puzpuzpuz reviewed Nov 24, 2025

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java Outdated Show resolved Hide resolved

kafka1991 added 3 commits November 24, 2025 16:01

fix flake tests

5528eaa

use DirectLongHashSet instead of LongHashSet

9943828

code cleanup

fa2ef16

kafka1991 force-pushed the victor_longhashset_fix branch from 543916e to fa2ef16 Compare November 24, 2025 10:02

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

kafka1991 added 2 commits November 24, 2025 22:02

code review

f02ad17

Merge branch 'master' into victor_longhashset_fix

e6a3a95

puzpuzpuz added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Nov 24, 2025

kafka1991 changed the title ~~fix(sql): fix LongHashSet's sentinel value for InLongFunction~~ fix(sql): fix bug where queries using IN clauses with INT columns containing -1 return incorrect results. Nov 25, 2025

Merge branch 'master' into victor_longhashset_fix

88de536

kafka1991 changed the title ~~fix(sql): fix bug where queries using IN clauses with INT columns containing -1 return incorrect results.~~ fix(sql): fix bug where queries using IN clauses with INT/LONG columns containing -1 return incorrect results. Nov 25, 2025

code format

6c9f8bd

puzpuzpuz reviewed Nov 25, 2025

View reviewed changes

kafka1991 added 2 commits November 25, 2025 19:34

code reviews.

a85522e

Merge branch 'master' into victor_longhashset_fix

cbc6334

puzpuzpuz reviewed Nov 25, 2025

View reviewed changes

core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java Outdated Show resolved Hide resolved

kafka1991 and others added 2 commits November 25, 2025 22:48

remove remove() in DirectLongHashSet

050d3f3

Merge branch 'master' into victor_longhashset_fix

009e80a

puzpuzpuz approved these changes Nov 25, 2025

View reviewed changes

Merge branch 'master' into victor_longhashset_fix

7d07cb2

bluestreak01 merged commit 9968b7c into master Nov 25, 2025
41 checks passed

bluestreak01 deleted the victor_longhashset_fix branch November 25, 2025 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql): fix bug where queries using `IN` clauses with INT/LONG columns containing `-1` return incorrect results.#6439

fix(sql): fix bug where queries using `IN` clauses with INT/LONG columns containing `-1` return incorrect results.#6439
bluestreak01 merged 13 commits intomasterfrom
victor_longhashset_fix

kafka1991 commented Nov 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

kafka1991 commented Nov 24, 2025

Uh oh!

coderabbitai bot commented Nov 24, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

bluestreak01 commented Nov 24, 2025

Uh oh!

UtkarshDashora commented Nov 24, 2025

Uh oh!

kafka1991 commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kafka1991 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

kafka1991 commented Nov 24, 2025

Uh oh!

coderabbitai bot commented Nov 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

bluestreak01 commented Nov 24, 2025

Uh oh!

UtkarshDashora commented Nov 24, 2025

Uh oh!

kafka1991 commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Nov 25, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kafka1991 commented Nov 24, 2025 •

edited

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading