Querying#
SQLThunder provides three methods for executing SELECT queries, each optimized for different use cases and data volumes:
Overview#
Method |
Description |
Best For |
|---|---|---|
One-shot SELECT query |
Small to medium result sets |
|
Parallel chunking with LIMIT/OFFSET |
Large tables without a primary key |
|
Key-based pagination |
Very large tables with a sortable key |
query — One-shot SELECT#
SQLThunder.core.client.DBClient.query()
Use this for standard SELECT queries that return manageable amounts of data.
df = client.query("SELECT * FROM trades WHERE symbol = :symbol", args={"symbol": "AAPL"})
or equivalently:
df = client.query("SELECT * FROM trades WHERE symbol = 'AAPL'")
Arguments#
Name |
Default |
Description |
|---|---|---|
|
— |
A valid |
|
|
Query parameters (dict or tuple). |
|
|
|
|
|
Whether to print a preview to stdout. |
|
|
Rows to print if |
Returns#
results
Use case#
Fastest method for complex queries or queries returning up to a couple million rows.
Ideal for filters, aggregations, and data previews.
query_batch — Parallelized Chunked SELECT#
SQLThunder.core.client.DBClient.query_batch()
Splits your SELECT query into chunks using LIMIT and OFFSET and executes them in parallel.
df = client.query_batch("SELECT * FROM orders WHERE client_type = :client_type", args={"client_type": "vip"}, chunk_size=10000, max_workers=10)
or equivalently:
df = client.query_batch("SELECT * FROM orders WHERE client_type = 'vip'", chunk_size=10000, max_workers=10)
Arguments#
Name |
Default |
Description |
|---|---|---|
|
— |
A valid |
|
|
Query parameters (dict or tuple). |
|
|
Rows per thread per fetch. |
|
|
Number of threads. Must be less than |
|
|
|
|
|
Whether to print a preview to stdout. |
|
|
Rows to print if |
|
|
Whether to return a success flag. If a single or more chunk failed, then it becomes |
Returns#
resultsifreturn_statusisFalseTuple of
(results, success_flag)ifreturn_statusisTrue
Why use it?#
Querying very large tables (>3M rows) with no usable key column.
Useful even if speed is not a concern:
Prevents memory overload by streaming in parts.
Reduces risk of query timeouts or network issues.
Threads allow partial failure recovery.
⚠️ Caution#
OFFSET becomes slower with higher chunk index — avoid on huge datasets if possible.
Always ensure your SQL does not include LIMIT or OFFSET manually, otherwise SQLThunder will raise QueryDisallowedClauseError.
For long-running queries, configure timeouts:
MySQL example:#
read_timeout: 120
connect_timeout: 20
PostgreSQL example:#
pg_options: "-c statement_timeout=60000 -c idle_in_transaction_session_timeout=30000"
query_keyed — Key-based Chunked SELECT#
SQLThunder.core.client.DBClient.query_keyed()
Splits the query by incrementing a key column (e.g. id, timestamp), avoiding OFFSET.
df = client.query_keyed(
sql="SELECT * FROM trades WHERE symbol = :symbol",
key_column="id",
key_column_type="int",
start_key=0,
chunk_size=10000,
args={"symbol": "AAPL"}
)
or equivalently:
df = client.query_keyed(
sql="SELECT * FROM trades WHERE symbol = 'AAPL'",
key_column="id",
key_column_type="int",
start_key=0,
chunk_size=10000,
)
Arguments#
Name |
Default |
Description |
|---|---|---|
|
— |
A valid |
|
— |
The column to paginate by. Must be sortable and ideally indexed. |
|
— |
|
|
|
Query parameters (dict or tuple). |
|
|
Lower bound (inclusive). Required for |
|
|
Upper bound (inclusive). Optional. |
|
|
|
|
|
Rows per chunk. |
|
|
|
|
|
Return last key from result set. Useful for resuming pagination. Return value becomes (result, last_key) or (result, success, last_key) |
|
|
Whether to return a success flag. If the query fails before reaching the last key it returns |
|
|
Whether to print a preview to stdout. |
|
|
Rows to print if |
Returns#
resultsifreturn_statusisFalseTuple of
(results, success_flag)ifreturn_statusisTrueTuple of
(results, last_key)ifreturn_last_keyisTrueTuple of
(results, success_flag, last_key)ifreturn_last_keyisTrueandreturn_statusisTrue
Why use it?#
Highly efficient for huge tables (millions+ rows).
Avoids the cost of OFFSET — queries remain fast even in deep pagination.
Recovers partial results on failure.
Returns last key to allow seamless resume or incremental ingestion.
Requirements#
A unique or monotonically increasing key (e.g. primary key, created_at).
No
LIMITorOFFSETin your query, otherwise SQLThunder will raise QueryDisallowedClauseError.Key column type must be explicitly set to validate keys.
Which Should I Use?#
If… |
Use |
|---|---|
Small to medium result set |
|
Large table with no primary key |
|
Large table with indexed primary/sort key |
|
When performance isn’t the goal, querying in chunks helps mitigate:
Database connection timeouts
Network instability
Memory pressure
Large unbuffered result sets
Logging and Error Handling#
By default, SQLThunder does not raise exceptions for query_batch() and query_keyed() during query failures caused by database errors (e.g. invalid table name, malformed SQL, or permission issues). Instead:
The function will return normally.
If
return_status=Truewas set, the second return value (success_flag) will beFalse.The error message will be logged via Python logging.
To monitor query issues, you should configure logging:
from SQLThunder import configure_logging
import logging
configure_logging(level=logging.WARNING)
This will ensure that any query failure (e.g. SQL syntax issues) will be visible in the console logs. If you want to manually raise an exception, you can check the success_flag like so:
result, success = client.query_batch(..., return_status=True)
if not success:
raise RuntimeError("Query failed. Check logs for details.")
The original database error will be available in the logs (stdout/stderr).
Testing and Debugging Tips#
Use
print_result=Trueduring development to inspect query output.Use
print_limit=10to preview the first few rows.Set
return_status=Trueto confirm success without crashing pipelines.Log or store
return_last_keyfromquery_keyed()to resume ingestion jobs.
Next Steps#
API Reference:
queryAPI Reference:
query_batchAPI Reference:
query_keyed