API Reference#
Client#
- class SQLThunder.core.client.DBClient(config_file_path: str, db_type: str | None = None, pool_size: int = 10, max_overflow: int = 5, max_workers: int | None = None)#
Bases:
objectDBClient handles all SQL database interactions, including threaded bulk insert/update operations, query batching, and efficient pagination. Compatible with MySQL, PostgreSQL, and SQLite.
- close() None#
Cleanly dispose of the SQLAlchemy engine and shutdown the thread pool.
After calling close(), the instance becomes unusable unless reopen_connection() is called.
- execute(sql: str, args: list[tuple[Any, ...]] | list[dict[str, Any]] | tuple[Any, ...] | dict[str, Any] | None = None, on_duplicate: str | None = None, return_failures: bool = True, return_status: bool = False) tuple[DataFrame, bool] | tuple[DataFrame, None] | tuple[None, bool] | tuple[None, None]#
Executes a single non-SELECT SQL statement (INSERT, UPDATE, DELETE, CREATE, …) within a transaction.
This method is intended for single-row or non-batch operations. For multi-row or large-scale writes, consider using execute_many() or execute_batch() instead.
- Parameters:
sql (str) – SQL statement with optional named placeholders (e.g., :id).
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any]]]) – A single row of bound parameters.
on_duplicate (Optional[str]) – Optional duplicate-handling clause to apply (e.g., “ignore”, “replace”).
return_failures (bool) – If True, returns a DataFrame with error details on failure. Defaults to True.
return_status (bool) – If True, includes a boolean success flag in the return. Defaults to False.
- Returns:
- Union[
tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]
- ]: A tuple containing:
A DataFrame of the failed record (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.
- Raises:
InvalidSQLOperation – If the SQL or bound arguments are invalid or malformed.
SQLExecutionError – If duplicate-handling logic could not be applied.
DBClientClosedError – If the instance has already been closed.
- execute_batch(sql: str, args: list[tuple[Any, ...]] | list[dict[str, Any]] | tuple[Any, ...] | dict[str, Any] | DataFrame, chunk_size: int = 512, max_workers: int | None = None, on_duplicate: str | None = None, return_failures: bool = True, return_status: bool = False) tuple[DataFrame, bool] | tuple[DataFrame, None] | tuple[None, bool] | tuple[None, None]#
Executes a SQL operation (INSERT, UPDATE, DELETE) in parallel batches using threads.
The input data is split into chunks and each chunk is executed in a separate thread. The method supports error capture per chunk, duplicate-handling clauses, and optional success/failure reporting.
- Parameters:
sql (str) – SQL statement with placeholders (e.g., :id, :value).
args (Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any], pandas.DataFrame]) – Rows of parameters to bind to the SQL query.
chunk_size (int) – Number of rows per batch. Defaults to 512.
max_workers (Optional[int]) – Maximum number of concurrent threads. Defaults to internal pool size.
on_duplicate (Optional[str]) – Conflict resolution mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, includes failed records with error messages in the result.
return_status (bool) – If True, includes a boolean success flag in the result.
- Returns:
- Union[
tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]
- ]: A tuple containing:
A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.
- Raises:
InvalidSQLOperation – If the SQL or arguments are invalid or cannot be processed.
BadArgumentsBulk – If no valid rows are provided.
SQLExecutionError – If duplicate-handling clause generation fails.
BaseSQLConversionError – If argument conversion fails during SQL preparation.
UnsupportedMultiThreadedDatabase – If multithreaded writes are attempted on SQLite.
LimitMaxWorkersError – If max_workers exceeds the available thread pool capacity.
DBClientClosedError – If the instance has already been closed.
- execute_many(sql: str, args: list[tuple[Any, ...]] | list[dict[str, Any]] | tuple[Any, ...] | dict[str, Any] | DataFrame, on_duplicate: str | None = None, return_failures: bool = True, return_status: bool = False) tuple[DataFrame, bool] | tuple[DataFrame, None] | tuple[None, bool] | tuple[None, None]#
Executes a bulk non-SELECT SQL operation (INSERT, UPDATE, or DELETE) in a single transaction.
This is an all-or-nothing operation: if any row in the batch fails, the entire transaction is rolled back. Useful for applying a uniform statement across many rows (e.g., bulk inserts with conflict handling).
- Parameters:
sql (str) – SQL statement with placeholders (e.g., :id, :value).
args (Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any], pandas.DataFrame]) – Bound parameters to apply to the SQL query. Can be a list of tuples/dicts or a DataFrame.
on_duplicate (Optional[str]) – Optional duplicate-handling mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, returns a DataFrame of failed rows with error messages on failure.
return_status (bool) – If True, includes a success flag in the return value.
- Returns:
- Union[
tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]
- ]: A tuple containing:
A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.
- Raises:
InvalidSQLOperation – If the SQL or input arguments are malformed or cannot be converted.
SQLExecutionError – If duplicate-handling clause insertion fails.
BadArgumentsBulk – If no valid rows are provided for execution.
BaseSQLConversionError – If argument conversion fails.
DBClientClosedError – If the instance has already been closed.
- insert_batch(df: DataFrame, table_name: str, chunk_size: int = 512, max_workers: int | None = None, on_duplicate: str | None = None, return_failures: bool = True, return_status: bool = False) tuple[DataFrame, bool] | tuple[DataFrame, None] | tuple[None, bool] | tuple[None, None]#
Inserts a pandas DataFrame into a SQL table using concurrent threaded chunking.
This wraps execute_batch() by automatically generating an INSERT statement based on the DataFrame columns and table name. Data is split into chunks and written concurrently using a thread pool.
- Parameters:
df (pandas.DataFrame) – The DataFrame containing rows to insert. Columns must match the target table schema.
table_name (str) – Full table name, e.g., “schema.table”.
chunk_size (int) – Number of rows per batch. Defaults to 512.
max_workers (Optional[int]) – Maximum number of concurrent threads. Defaults to internal pool size.
on_duplicate (Optional[str]) – Conflict resolution mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, includes failed records with error messages in the result.
return_status (bool) – If True, includes a boolean success flag in the result.
- Returns:
- Union[
tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]
- ]: A tuple containing:
A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.
- Raises:
BadArgumentsBulk – If the input DataFrame is empty or invalid.
UnsupportedDatabaseType – If the current database type does not support insert generation.
UnsupportedMultiThreadedDatabase – If multithreaded inserts are attempted on SQLite.
LimitMaxWorkersError – If max_workers exceeds available thread pool capacity.
SQLExecutionError – If duplicate-handling logic insertion fails.
BaseSQLConversionError – If argument conversion fails internally.
DBClientClosedError – If the instance has already been closed.
- insert_many(df: DataFrame, table_name: str, on_duplicate: str | None = None, return_failures: bool = True, return_status: bool = False) tuple[DataFrame, bool] | tuple[DataFrame, None] | tuple[None, bool] | tuple[None, None]#
Inserts a pandas DataFrame into a SQL table using a single atomic transaction.
This method builds a parameterized INSERT statement and executes it with the data from the DataFrame. All rows are inserted in a single commit. If any row fails, the entire transaction is rolled back.
- Parameters:
df (pandas.DataFrame) – DataFrame containing the rows to insert.
table_name (str) – Target table name, e.g., “schema.table”.
on_duplicate (Optional[str]) – Conflict handling mode (“ignore”, “replace”, or None).
return_failures (bool) – If True, returns a DataFrame of failed rows with error messages on failure.
return_status (bool) – If True, includes a success flag in the return value.
- Returns:
- Union[
tuple[pandas.DataFrame, bool], tuple[pandas.DataFrame, None], tuple[None, bool], tuple[None, None]
- ]: A tuple containing:
A DataFrame of failed records (if any, else empty DataFrame, and return_failures is True) or None.
A success flag (if return_status is True), otherwise None.
- Raises:
InvalidSQLOperation – If SQL insert generation fails or arguments are malformed.
SQLExecutionError – If duplicate-handling clause generation fails.
UnsupportedDatabaseType – If the current database type is unsupported for insert generation.
BadArgumentsBulk – If the input DataFrame is empty or invalid.
BaseSQLConversionError – If argument conversion during delegated insert fails.
DBClientClosedError – If the instance has already been closed.
- property is_closed: bool#
Indicates whether the DBClient has been closed.
- Returns:
True if close() was called and the instance is now inactive.
- Return type:
bool
- query(sql: str, args: list[tuple[Any, ...]] | list[dict[str, Any]] | tuple[Any, ...] | dict[str, Any] | None = None, return_type: Literal['df', 'raw', 'list', 'none'] = 'df', print_result: bool = False, print_limit: int = 5) DataFrame | list[dict[str, Any]] | Sequence[Row] | None#
Executes a single SQL SELECT query with optional bound parameters.
- Parameters:
sql (str) – A SQL SELECT statement. May include named placeholders (e.g., :id).
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any]]]) – Parameters to bind to the query. Must be a single dict or tuple, or a list containing exactly one such element.
return_type (Literal["df", "raw", "list", "none"]) – Format of the returned result. One of: - “df”: Return a pandas.DataFrame (default) - “list”: Return a list of dictionaries - “raw”: Return a list of SQLAlchemy Row objects - “none”: Return None
print_result (bool) – Whether to print the query result to stdout.
print_limit (int) – Number of rows to display when printing. Only used if print_result is True.
- Returns:
The query result in the specified format.
- Return type:
Union[pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None]
- Raises:
QuerySelectOnlyError – If the SQL statement is not a SELECT query.
InvalidSQLOperation – If the SQL is malformed or multiple argument sets are passed.
QueryExecutionError – If query execution fails.
QueryResultFormatError – If return_type is unsupported.
DBClientClosedError – If the instance has already been closed.
- query_batch(sql: str, args: list[tuple[Any, ...]] | list[dict[str, Any]] | tuple[Any, ...] | dict[str, Any] | DataFrame | None = None, chunk_size: int = 10000, max_workers: int = 15, return_type: Literal['df', 'raw', 'list', 'none'] = 'df', return_status: bool = False, print_result: bool = False, print_limit: int = 10) DataFrame | list[dict[str, Any]] | Sequence[Row] | None | tuple[DataFrame | list[dict[str, Any]] | Sequence[Row] | None, bool]#
Executes a large SQL SELECT query in parallel chunks using LIMIT and OFFSET.
This method divides a SELECT query into chunks and fetches them concurrently using a thread pool. It stops automatically when a chunk returns no rows. Useful for large data extractions where key-based pagination is not possible.
- Parameters:
sql (str) – Base SQL SELECT query (without LIMIT or OFFSET clauses).
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any], pandas.DataFrame]]) – Parameters to bind to the query. Must represent a single row of parameters.
chunk_size (int) – Number of rows to fetch per chunk. Defaults to 10,000.
max_workers (int) – Number of threads to run in parallel. Defaults to 15.
return_type (Literal["df", "raw", "list", "none"]) – Format of the returned result. One of: - “df”: Return a pandas.DataFrame (default) - “list”: Return a list of dictionaries - “raw”: Return a list of SQLAlchemy Row objects - “none”: Return None
return_status (bool) – If True, returns a tuple with the result and a success flag.
print_result (bool) – Whether to print a preview of the result to stdout.
print_limit (int) – Number of rows to print if print_result is True.
- Returns:
- Union[
pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None, tuple[Union[pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None], bool]
- ]:
The query result in the specified format. If return_status is True, a tuple is returned with a success flag.
- Raises:
QuerySelectOnlyError – If the SQL statement is not a SELECT query.
QueryDisallowedClauseError – If LIMIT or OFFSET is present in the SQL.
QueryResultFormatError – If return_type is not one of the supported values.
InvalidSQLOperation – If the SQL or arguments are malformed or incompatible.
LimitMaxWorkersError – If max_workers exceeds the connection pool capacity.
UnsupportedMultiThreadedDatabase – If multithreaded reads are attempted on a SQLite database.
DBClientClosedError – If the instance has already been closed.
- query_keyed(sql: str, key_column: str, key_column_type: Literal['int', 'string', 'date'], start_key: int | datetime | str | None = None, end_key: int | datetime | str | None = None, order: Literal['asc', 'desc'] = 'asc', args: list[tuple[Any, ...]] | list[dict[str, Any]] | tuple[Any, ...] | dict[str, Any] | None = None, chunk_size: int = 10000, return_type: Literal['df', 'raw', 'list', 'none'] = 'df', print_result: bool = False, print_limit: int = 10, return_last_key: bool = False, return_status: bool = False) DataFrame | list[dict[str, Any]] | Sequence[Row] | None | tuple[DataFrame | list[dict[str, Any]] | Sequence[Row] | None, Any] | tuple[DataFrame | list[dict[str, Any]] | Sequence[Row] | None, bool] | tuple[DataFrame | list[dict[str, Any]] | Sequence[Row] | None, bool, Any]#
Executes a large SQL SELECT query using key-based pagination on a sortable column.
This method chunks a large query by using a monotonically increasing key column, avoiding OFFSET-based pagination for better performance on large tables. It supports resuming, printing, and returning additional metadata.
- Parameters:
sql (str) – Base SQL SELECT query (without LIMIT or pagination conditions).
key_column (str) – The column to use as the pagination key.
key_column_type (Literal["int", "string", "date"]) – Type of the key column for proper formatting and validation.
start_key (Optional[Union[int, datetime.datetime, str]]) – Exclusive lower bound key to start from.
end_key (Optional[Union[int, datetime.datetime, str]]) – Inclusive upper bound key to stop at.
order (Literal["asc", "desc"]) – Sort direction for pagination. Defaults to “asc”.
args (Optional[Union[list[tuple[Any, ...]], list[dict[str, Any]], tuple[Any, ...], dict[str, Any]]]) – Parameters to bind to the SQL query. Must be a single dict/tuple or a list containing one such element.
chunk_size (int) – Number of rows to fetch per chunk. Defaults to 10,000.
return_type (Literal["df", "raw", "list", "none"]) – Format of the returned result. One of: - “df”: Return a pandas.DataFrame (default) - “list”: Return a list of dictionaries - “raw”: Return a list of SQLAlchemy Row objects - “none”: Return None
print_result (bool) – Whether to print a preview of the query result.
print_limit (int) – Number of rows to preview if printing is enabled.
return_last_key (bool) – Whether to return the last key seen in the result set.
return_status (bool) – Whether to return a boolean indicating query success.
- Returns:
- Union[
pandas.DataFrame, list[dict[str, Any]], Sequence[Row], None, tuple[result, last_key], tuple[result, success], tuple[result, success, last_key]
- ]:
The query result in the specified format. Optionally includes the last key and/or a success flag.
- Raises:
QuerySelectOnlyError – If the SQL is not a SELECT statement.
QueryDisallowedClauseError – If LIMIT or OFFSET is present in the SQL.
QueryResultFormatError – If return_type is not one of the supported formats.
InvalidSQLOperation – If the key column type or bound arguments are invalid, or if required key values are missing or incorrectly typed.
DBClientClosedError – If the instance has already been closed.
- reopen_connection() None#
Reopen a previously closed DBClient by recreating the engine and thread pool.
Useful for long-running applications where connections might be dropped or closed.
- Raises:
ReopenConnectionError – If reconnection fails after engine creation.
- test_connection() bool#
Checks if the database connection is currently valid.
- Returns:
True if connection test succeeds, False otherwise.
- Return type:
bool