API Reference#

Client#

class SQLThunder.core.client.DBClient(config_file_path: str, db_type: str | None = None, pool_size: int = 10, max_overflow: int = 5, max_workers: int | None = None)#

Bases: object

DBClient handles all SQL database interactions, including threaded bulk insert/update operations, query batching, and efficient pagination. Compatible with MySQL, PostgreSQL, and SQLite.

close() → None#

Cleanly dispose of the SQLAlchemy engine and shutdown the thread pool.

After calling close(), the instance becomes unusable unless reopen_connection() is called.

Executes a single non-SELECT SQL statement (INSERT, UPDATE, DELETE, CREATE, …) within a transaction.

This method is intended for single-row or non-batch operations. For multi-row or large-scale writes, consider using execute_many() or execute_batch() instead.

Parameters:

sql (str) – SQL statement with optional named placeholders (e.g., :id).
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any]]]) – A single row of bound parameters.
on_duplicate (Optional[str]) – Optional duplicate-handling clause to apply (e.g., “ignore”, “replace”).
return_failures (bool) – If True, returns a DataFrame with error details on failure. Defaults to True.
return_status (bool) – If True, includes a boolean success flag in the return. Defaults to False.

Returns:

Union[

tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]

]: A tuple containing:

A DataFrame of the failed record (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.

Raises:

InvalidSQLOperation – If the SQL or bound arguments are invalid or malformed.
SQLExecutionError – If duplicate-handling logic could not be applied.
DBClientClosedError – If the instance has already been closed.

Executes a SQL operation (INSERT, UPDATE, DELETE) in parallel batches using threads.

The input data is split into chunks and each chunk is executed in a separate thread. The method supports error capture per chunk, duplicate-handling clauses, and optional success/failure reporting.

Parameters:

sql (str) – SQL statement with placeholders (e.g., :id, :value).
args (Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any], pandas.DataFrame]) – Rows of parameters to bind to the SQL query.
chunk_size (int) – Number of rows per batch. Defaults to 512.
max_workers (Optional[int]) – Maximum number of concurrent threads. Defaults to internal pool size.
on_duplicate (Optional[str]) – Conflict resolution mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, includes failed records with error messages in the result.
return_status (bool) – If True, includes a boolean success flag in the result.

Returns:

Union[

tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]

]: A tuple containing:

A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.

Raises:

InvalidSQLOperation – If the SQL or arguments are invalid or cannot be processed.
BadArgumentsBulk – If no valid rows are provided.
SQLExecutionError – If duplicate-handling clause generation fails.
BaseSQLConversionError – If argument conversion fails during SQL preparation.
UnsupportedMultiThreadedDatabase – If multithreaded writes are attempted on SQLite.
LimitMaxWorkersError – If max_workers exceeds the available thread pool capacity.
DBClientClosedError – If the instance has already been closed.

Executes a bulk non-SELECT SQL operation (INSERT, UPDATE, or DELETE) in a single transaction.

This is an all-or-nothing operation: if any row in the batch fails, the entire transaction is rolled back. Useful for applying a uniform statement across many rows (e.g., bulk inserts with conflict handling).

Parameters:

sql (str) – SQL statement with placeholders (e.g., :id, :value).
args (Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any], pandas.DataFrame]) – Bound parameters to apply to the SQL query. Can be a list of tuples/dicts or a DataFrame.
on_duplicate (Optional[str]) – Optional duplicate-handling mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, returns a DataFrame of failed rows with error messages on failure.
return_status (bool) – If True, includes a success flag in the return value.

Returns:

Union[

tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]

]: A tuple containing:

A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.

Raises:

InvalidSQLOperation – If the SQL or input arguments are malformed or cannot be converted.
SQLExecutionError – If duplicate-handling clause insertion fails.
BadArgumentsBulk – If no valid rows are provided for execution.
BaseSQLConversionError – If argument conversion fails.
DBClientClosedError – If the instance has already been closed.

insert_batch(df: DataFrame, table_name: str, chunk_size: int = 512, max_workers: int | None = None, on_duplicate: str | None = None, return_failures: bool = True, return_status: bool = False) → tuple[DataFrame, bool] | tuple[DataFrame, None] | tuple[None, bool] | tuple[None, None]#

Inserts a pandas DataFrame into a SQL table using concurrent threaded chunking.

This wraps execute_batch() by automatically generating an INSERT statement based on the DataFrame columns and table name. Data is split into chunks and written concurrently using a thread pool.

Parameters:

df (pandas.DataFrame) – The DataFrame containing rows to insert. Columns must match the target table schema.
table_name (str) – Full table name, e.g., “schema.table”.
chunk_size (int) – Number of rows per batch. Defaults to 512.
max_workers (Optional[int]) – Maximum number of concurrent threads. Defaults to internal pool size.
on_duplicate (Optional[str]) – Conflict resolution mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, includes failed records with error messages in the result.
return_status (bool) – If True, includes a boolean success flag in the result.

Returns:

Union[

tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]

]: A tuple containing:

A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.

Raises:

BadArgumentsBulk – If the input DataFrame is empty or invalid.
UnsupportedDatabaseType – If the current database type does not support insert generation.
UnsupportedMultiThreadedDatabase – If multithreaded inserts are attempted on SQLite.
LimitMaxWorkersError – If max_workers exceeds available thread pool capacity.
SQLExecutionError – If duplicate-handling logic insertion fails.
BaseSQLConversionError – If argument conversion fails internally.
DBClientClosedError – If the instance has already been closed.

insert_many(df: DataFrame, table_name: str, on_duplicate: str | None = None, return_failures: bool = True, return_status: bool = False) → tuple[DataFrame, bool] | tuple[DataFrame, None] | tuple[None, bool] | tuple[None, None]#

Inserts a pandas DataFrame into a SQL table using a single atomic transaction.

This method builds a parameterized INSERT statement and executes it with the data from the DataFrame. All rows are inserted in a single commit. If any row fails, the entire transaction is rolled back.

Parameters:

df (pandas.DataFrame) – DataFrame containing the rows to insert.
table_name (str) – Target table name, e.g., “schema.table”.
on_duplicate (Optional[str]) – Conflict handling mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, returns a DataFrame of failed rows with error messages on failure.
return_status (bool) – If True, includes a success flag in the return value.

Returns:

Union[

tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]

]: A tuple containing:

A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.

Raises:

InvalidSQLOperation – If SQL insert generation fails or arguments are malformed.
SQLExecutionError – If duplicate-handling clause generation fails.
UnsupportedDatabaseType – If the current database type is unsupported for insert generation.
BadArgumentsBulk – If the input DataFrame is empty or invalid.
BaseSQLConversionError – If argument conversion during delegated insert fails.
DBClientClosedError – If the instance has already been closed.

property is_closed: bool#

Indicates whether the DBClient has been closed.

Returns:: True if close() was called and the instance is now inactive.
Return type:: bool

Executes a single SQL SELECT query with optional bound parameters.

Parameters:

sql (str) – A SQL SELECT statement. May include named placeholders (e.g., :id).
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any]]]) – Parameters to bind to the query. Must be a single dict or tuple, or a list containing exactly one such element.
return_type (Literal["df", "raw", "list", "none"]) – Format of the returned result. One of: - “df”: Return a pandas.DataFrame (default) - “list”: Return a list of dictionaries - “raw”: Return a list of SQLAlchemy Row objects - “none”: Return None
print_result (bool) – Whether to print the query result to stdout.
print_limit (int) – Number of rows to display when printing. Only used if print_result is True.

Returns:

The query result in the specified format.

Return type:

Union[pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None]

Raises:

QuerySelectOnlyError – If the SQL statement is not a SELECT query.
InvalidSQLOperation – If the SQL is malformed or multiple argument sets are passed.
QueryExecutionError – If query execution fails.
QueryResultFormatError – If return_type is unsupported.
DBClientClosedError – If the instance has already been closed.

Executes a large SQL SELECT query in parallel chunks using LIMIT and OFFSET.

This method divides a SELECT query into chunks and fetches them concurrently using a thread pool. It stops automatically when a chunk returns no rows. Useful for large data extractions where key-based pagination is not possible.

Parameters:

sql (str) – Base SQL SELECT query (without LIMIT or OFFSET clauses).
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any], pandas.DataFrame]]) – Parameters to bind to the query. Must represent a single row of parameters.
chunk_size (int) – Number of rows to fetch per chunk. Defaults to 10,000.
max_workers (int) – Number of threads to run in parallel. Defaults to 15.
return_type (Literal["df", "raw", "list", "none"]) – Format of the returned result. One of: - “df”: Return a pandas.DataFrame (default) - “list”: Return a list of dictionaries - “raw”: Return a list of SQLAlchemy Row objects - “none”: Return None
return_status (bool) – If True, returns a tuple with the result and a success flag.
print_result (bool) – Whether to print a preview of the result to stdout.
print_limit (int) – Number of rows to print if print_result is True.

Returns:

Union[: pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None, tuple[Union[pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None], bool]
]:: The query result in the specified format. If return_status is True, a tuple is returned with a success flag.

Raises:

QuerySelectOnlyError – If the SQL statement is not a SELECT query.
QueryDisallowedClauseError – If LIMIT or OFFSET is present in the SQL.
QueryResultFormatError – If return_type is not one of the supported values.
InvalidSQLOperation – If the SQL or arguments are malformed or incompatible.
LimitMaxWorkersError – If max_workers exceeds the connection pool capacity.
UnsupportedMultiThreadedDatabase – If multithreaded reads are attempted on a SQLite database.
DBClientClosedError – If the instance has already been closed.

Executes a large SQL SELECT query using key-based pagination on a sortable column.

This method chunks a large query by using a monotonically increasing key column, avoiding OFFSET-based pagination for better performance on large tables. It supports resuming, printing, and returning additional metadata.

Parameters:

sql (str) – Base SQL SELECT query (without LIMIT or pagination conditions).
key_column (str) – The column to use as the pagination key.
key_column_type (Literal["int", "string", "date"]) – Type of the key column for proper formatting and validation.
start_key (Optional[Union[int, datetime.datetime, str]]) – Exclusive lower bound key to start from.
end_key (Optional[Union[int, datetime.datetime, str]]) – Inclusive upper bound key to stop at.
order (Literal["asc", "desc"]) – Sort direction for pagination. Defaults to “asc”.
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any]]]) – Parameters to bind to the SQL query. Must be a single dict/tuple or a list containing one such element.
chunk_size (int) – Number of rows to fetch per chunk. Defaults to 10,000.
return_type (Literal["df", "raw", "list", "none"]) – Format of the returned result. One of: - “df”: Return a pandas.DataFrame (default) - “list”: Return a list of dictionaries - “raw”: Return a list of SQLAlchemy Row objects - “none”: Return None
print_result (bool) – Whether to print a preview of the query result.
print_limit (int) – Number of rows to preview if printing is enabled.
return_last_key (bool) – Whether to return the last key seen in the result set.
return_status (bool) – Whether to return a boolean indicating query success.

Returns:

Union[: pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None, tuple[result, last_key], tuple[result, success], tuple[result, success, last_key]
]:: The query result in the specified format. Optionally includes the last key and/or a success flag.

Raises:

QuerySelectOnlyError – If the SQL is not a SELECT statement.
QueryDisallowedClauseError – If LIMIT or OFFSET is present in the SQL.
QueryResultFormatError – If return_type is not one of the supported formats.
InvalidSQLOperation – If the key column type or bound arguments are invalid, or if required key values are missing or incorrectly typed.
DBClientClosedError – If the instance has already been closed.

reopen_connection() → None#

Reopen a previously closed DBClient by recreating the engine and thread pool.

Useful for long-running applications where connections might be dropped or closed.

Raises:: ReopenConnectionError – If reconnection fails after engine creation.

test_connection() → bool#

Checks if the database connection is currently valid.

Returns:: True if connection test succeeds, False otherwise.
Return type:: bool

API Reference#

Client#

This Page