fix(sql): fix bug where queries using IN clauses with INT/LONG columns containing -1 return incorrect results.#6439
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce a new DirectLongHashSet off-heap hash set implementation for long keys with configurable memory management and load factors, replace the usage of LongHashSet with DirectLongHashSet in InLongFunctionFactory, and add comprehensive tests including a test case that reproduces the original bug involving mixed positive and negative values in IN clauses. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
LongHashSet's sentinel for InLongFunctionLongHashSet's sentinel value for InLongFunction
core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java
Outdated
Show resolved
Hide resolved
543916e to
fa2ef16
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java (1)
120-121: Avoid double load‑factor scaling when constructingDirectLongHashSet
DirectLongHashSet’s ctor already scales itscapacityargument by1 / loadFactor:this.capacity = Math.max(Numbers.ceilPow2((int) (capacity / loadFactor)), MIN_CAPACITY);But callers here pre-scale as well:
DirectLongHashSet inVals = new DirectLongHashSet((int) (argCount / DirectLongHashSet.DEFAULT_LOAD_FACTOR), DirectLongHashSet.DEFAULT_LOAD_FACTOR, MemoryTag.NATIVE_FUNC_RSS); this.inSet = new DirectLongHashSet((int) ((valueFunctions.size() - 1) / DirectLongHashSet.DEFAULT_LOAD_FACTOR), DirectLongHashSet.DEFAULT_LOAD_FACTOR, MemoryTag.NATIVE_FUNC_RSS);This effectively applies the
1 / loadFactorfactor twice, significantly over-allocating off-heap memory for large IN-lists.If the ctor’s
capacityparameter is intended to be “expected element count” (which matches the name), callers should pass the raw element count and let the ctor do the scaling:- DirectLongHashSet inVals = new DirectLongHashSet( - (int) (argCount / DirectLongHashSet.DEFAULT_LOAD_FACTOR), - DirectLongHashSet.DEFAULT_LOAD_FACTOR, - MemoryTag.NATIVE_FUNC_RSS); + DirectLongHashSet inVals = new DirectLongHashSet( + argCount, + DirectLongHashSet.DEFAULT_LOAD_FACTOR, + MemoryTag.NATIVE_FUNC_RSS); ... - this.inSet = new DirectLongHashSet( - (int) ((valueFunctions.size() - 1) / DirectLongHashSet.DEFAULT_LOAD_FACTOR), - DirectLongHashSet.DEFAULT_LOAD_FACTOR, - MemoryTag.NATIVE_FUNC_RSS); + this.inSet = new DirectLongHashSet( + valueFunctions.size() - 1, + DirectLongHashSet.DEFAULT_LOAD_FACTOR, + MemoryTag.NATIVE_FUNC_RSS);This keeps the load factor behavior while avoiding unnecessary off-heap over-allocation.
Also applies to: 230-231
core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java (1)
220-231: Minor: avoid fully‑qualifiedStringSinkwhere it’s already importedHere you can rely on the existing import instead of the fully qualified name:
- io.questdb.std.str.StringSink sink = new io.questdb.std.str.StringSink(); + StringSink sink = new StringSink();This keeps the test code a bit cleaner and consistent with the other tests using
StringSink.core/src/main/java/io/questdb/std/DirectLongHashSet.java (1)
37-70: DirectLongHashSet implementation looks correct; only minor API/robustness nitsAlgorithmically this is solid:
- Open addressing with linear probing and backward‑shift deletion (
removeAt+canMoveBack) is implemented correctly.- Zero is handled via
hasZeroso it doesn’t conflict with the internal 0‑as‑empty sentinel.size(),clear(),rehash(), andtoSink()all behave consistently with that design, including coverage for sentinel values and ordering (as validated in the tests).Two small nits to consider:
Clarify/lock in ctor semantics
The ctor currently interprets its
capacityargument as an expected element count and scales it by1 / loadFactorinternally. It would be good to document this explicitly in the Javadoc (and keep callers from pre‑scaling), e.g., “capacityis the expected number of elements”.Optional: guard against use after
close()Methods like
add,contains,remove, andkeyIndexassumememStart != 0. If someone accidentally calls them afterclose(), they’ll operate on addresses derived from 0. Adding a lightweight guard (e.g.,if (memStart == 0) throw new IllegalStateException("closed")) would make misuse fail fast instead of doing undefined off‑heap access. This is optional but can help future refactors.Also applies to: 83-103, 115-188, 190-257, 259-285
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java(6 hunks)core/src/main/java/io/questdb/std/DirectLongHashSet.java(1 hunks)core/src/test/java/io/questdb/test/griffin/InTest.java(2 hunks)core/src/test/java/io/questdb/test/griffin/engine/functions/regex/RegexpReplaceStrFunctionFactoryTest.java(2 hunks)core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-19T12:21:00.062Z
Learnt from: jerrinot
Repo: questdb/questdb PR: 6413
File: core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:11982-12002
Timestamp: 2025-11-19T12:21:00.062Z
Learning: QuestDB Java tests use a deterministic random seed. The test utilities (e.g., io.questdb.test.tools.TestUtils and io.questdb.std.Rnd) produce reproducible sequences, so rnd_* functions (including rnd_uuid4) yield deterministic outputs across runs. Do not flag tests in core/src/test/** that assert against values produced by rnd_* as flaky due to randomness.
Applied to files:
core/src/test/java/io/questdb/test/griffin/InTest.javacore/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java
📚 Learning: 2025-07-31T08:21:17.390Z
Learnt from: mtopolnik
Repo: questdb/questdb PR: 5997
File: core/src/main/java/io/questdb/griffin/engine/functions/array/DoubleArrayAccessFunctionFactory.java:127-131
Timestamp: 2025-07-31T08:21:17.390Z
Learning: In DoubleArrayAccessFunctionFactory.java, overflow validation for array index casting from long to int is performed in the validateIndexArgs() method rather than at each individual cast site. The validateIndexArgs() method is called early in newInstance() to validate all index arguments, making subsequent casts safe.
Applied to files:
core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java
🧬 Code graph analysis (4)
core/src/test/java/io/questdb/test/griffin/InTest.java (1)
core/src/main/java/io/questdb/std/Numbers.java (1)
Numbers(44-3523)
core/src/main/java/io/questdb/std/DirectLongHashSet.java (5)
core/src/main/java/io/questdb/std/MemoryTag.java (1)
MemoryTag(27-184)core/src/main/java/io/questdb/std/Numbers.java (1)
Numbers(44-3523)core/src/main/java/io/questdb/std/Unsafe.java (1)
Unsafe(40-542)core/src/main/java/io/questdb/std/Vect.java (1)
Vect(27-501)core/src/main/java/io/questdb/std/Hash.java (1)
Hash(30-229)
core/src/test/java/io/questdb/test/griffin/engine/functions/regex/RegexpReplaceStrFunctionFactoryTest.java (1)
core/src/main/java/io/questdb/std/str/StringSink.java (1)
StringSink(31-194)
core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java (2)
core/src/main/java/io/questdb/std/DirectLongHashSet.java (1)
DirectLongHashSet(36-286)core/src/main/java/io/questdb/std/Rnd.java (1)
Rnd(39-481)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (4)
core/src/test/java/io/questdb/test/griffin/engine/functions/regex/RegexpReplaceStrFunctionFactoryTest.java (1)
90-100: Dynamic SQL construction viaStringSinkis fineThe looped construction preserves the original SQL structure while avoiding a massive literal, and the comma/infix logic (
i > 0) looks correct. Initial capacity 8192 is just a hint givenStringSink’s growth semantics, so no risk there.core/src/test/java/io/questdb/test/griffin/InTest.java (1)
27-29: Sentinel/null IN regression test is well-targetedThis test covers
INTIN-lists with negative values, zero, and both literalnullandNumbers.INT_NULLvia bind variables, plus asserts the execution plan shows a sorted, null-aware in-set. That’s exactly the scenario that previously failed and should guard the new DirectLongHashSet-based implementation going forward. (Deterministic rnd_* usage is fine here.)Also applies to: 246-305
core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java (1)
38-42: Off-heap set usage and lifecycle wiring look correctUsing
DirectLongHashSetremoves the sentinel collision issue and the ownership model is clear:
- Const case: the set is built once, freed on failure in
newInstance, and later freed inInLongConstFunction.close()viaMisc.free(inSet).- Runtime-const case: the set is allocated in the ctor, repopulated in
init()viaclear()+parseToLong, and freed inclose().
getBool()andtoPlan()simply delegate toinSet.contains()/inSet.toSink(), so INT/LONG/null value semantics are now driven by the set rather than magic sentinels.No correctness issues spotted in these paths.
Also applies to: 120-127, 183-217, 220-255
core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java (1)
35-272: Good behavioral and edge‑case coverage forDirectLongHashSetThese tests exercise all the important aspects of the new set: duplicates, zero handling, positive/negative/sentinel values, rehashing, removal (including collision cases), and
toSinkordering. The use of try‑with‑resources keeps off‑heap memory safe.
|
hey @kafka1991 could you update PR description to make it use-facing. "fix" will be included in public release notes |
|
✅ Passed checks (3 passed) |
LongHashSet's sentinel value for InLongFunctionIN clauses with INT columns containing -1 return incorrect results.
IN clauses with INT columns containing -1 return incorrect results.IN clauses with INT/LONG columns containing -1 return incorrect results.
fixed |
core/src/main/java/io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/std/DirectLongHashSetTest.java
Outdated
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 115 / 122 (94.26%) file detail
|
Fixes a bug where queries using
INclauses with INT/LONG columns containing-1could return incorrect results.Example
Root cause: The internal HashSet implementation used -1 as a special marker value, which conflicted with actual -1 values in the data, causing incorrect query results.
close #6438