API Reference¶

class heavyai.Connection(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)¶

change_dashboard_sources(dashboard: omnisci.thrift.ttypes.TDashboard, remap: dict) → omnisci.thrift.ttypes.TDashboard¶

Change the sources of a dashboard

Parameters

dashboard: TDashboard: The HeavyDB dashboard object to transform
remap: dict: EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.

Returns

dashboard: TDashboard: An HeavyDB dashboard with the sources remapped

See also

duplicate_dashboard

Examples

>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}}
>>> dash = con.get_dashboard(1)
>>> newdash = con.change_dashboard_sources(dash, source_remap)

create_dashboard(dashboard: omnisci.thrift.ttypes.TDashboard) → int¶

Create a new dashboard

Parameters

dashboard: TDashboard: The HeavyDB dashboard object to create

Returns

dashboardid: int: The dashboard id of the new dashboard

create_table(table_name, data, preserve_index=False)¶

Create a table from a pandas.DataFrame

Parameters

table_name: str
data: DataFrame
preserve_index: bool, default False: Whether to create a column in the table for the DataFrame index

deallocate_ipc(df, device_id=0)¶

Deallocate a DataFrame using CPU shared memory.

Parameters

device_id: int: GPU which contains TDataFrame

deallocate_ipc_gpu(df, device_id=0)¶

Deallocate a DataFrame using GPU memory.

Parameters

device_ids: int: GPU which contains TDataFrame

duplicate_dashboard(dashboard_id, new_name=None, source_remap=None)¶

Duplicate an existing dashboard, returning the new dashboard id.

Parameters

dashboard_id: int: The id of the dashboard to duplicate
new_name: str: The name for the new dashboard
source_remap: dict: EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.

Examples

>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}}
>>> newdash = con.duplicate_dashboard(12345, "new dash", source_remap)

get_dashboard(dashboard_id)¶

Return the dashboard object of a specific dashboard

Examples

>>> con.get_dashboard(123)

get_dashboards()¶

List all the dashboards in the database

Examples

>>> con.get_dashboards()

get_table_details(table_name)¶

Get the column names and data types associated with a table.

Parameters

table_name: str

Returns

details: List[tuples]

Examples

>>> con.get_table_details('stocks')
[ColumnDetails(name='date_', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ColumnDetails(name='trans', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ...
]

get_tables()¶

List all the tables in the database

Examples

>>> con.get_tables()
['flights_2008_10k', 'stocks']

load_table(table_name, data, method='infer', preserve_index=False, create='infer', column_names=[])¶

Load data into a table

Parameters

table_name: str

data: pyarrow.Table, pandas.DataFrame, or iterable of tuples

method: {‘infer’, ‘columnar’, ‘rows’, ‘arrow’}

Method to use for loading the data. Three options are available

pyarrow and Apache Arrow loader
columnar loader
row-wise loader

The Arrow loader is typically the fastest, followed by the columnar loader, followed by the row-wise loader. If a DataFrame or pyarrow.Table is passed and pyarrow is installed, the Arrow-based loader will be used. If arrow isn’t available, the columnar loader is used. Finally, data is an iterable of tuples the row-wise loader is used.

preserve_index: bool, default False

Whether to keep the index when loading a pandas DataFrame

create: {“infer”, True, False}

Whether to issue a CREATE TABLE before inserting the data.

infer: check to see if the table already exists, and create a table if it does not
True: attempt to create the table, without checking if it exists
False: do not attempt to create the table

See also

load_table_arrow
load_table_columnar

load_table_arrow(table_name, data, preserve_index=False, load_column_names=[])¶

Load a pandas.DataFrame or a pyarrow Table or RecordBatch to the database using Arrow columnar format for interchange

Parameters

table_name: str
data: pandas.DataFrame, pyarrow.RecordBatch, pyarrow.Table
preserve_index: bool, default False: Whether to include the index of a pandas DataFrame when writing.

See also

load_table
load_table_columnar
load_table_rowwise

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_arrow('foo', df, preserve_index=False)

load_table_columnar(table_name, data, preserve_index=False, chunk_size_bytes=0, col_names_from_schema=False, column_names=[])¶

Load a pandas DataFrame to the database using HeavyDB’s Thrift-based columnar format

Parameters

table_name: str
data: DataFrame
preserve_index: bool, default False: Whether to include the index of a pandas DataFrame when writing.
chunk_size_bytes: integer, default 0: Chunk the loading of columns to prevent large Thrift requests. A value of 0 means do not chunk and send the dataframe as a single request
col_names_from_schema: bool, default False: Read the existing table schema to determine the column names. This will read the schema of an existing table in HeavyDB and match those names to the column names of the dataframe. This is for user convenience when loading from data that is unordered, especially handy when a table has a large number of columns.

See also

load_table
load_table_arrow
load_table_rowwise

Notes

Use pymapd >= 0.11.0 while running with heavydb >= 4.6.0 in order to avoid loading inconsistent values into DATE column.

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_columnar('foo', df, preserve_index=False)

load_table_rowwise(table_name, data, column_names=[])¶

Load data into a table row-wise

Parameters

table_name: str
data: Iterable of tuples: Each element of data should be a row to be inserted

See also

load_table
load_table_arrow
load_table_columnar

Examples

>>> data = [(1, 'a'), (2, 'b'), (3, 'c')]
>>> con.load_table('bar', data)

render_vega(vega, compression_level=1)¶

Render vega data on the database backend, returning the image as a PNG.

Parameters

vega: dict: The vega specification to render.
compression_level: int: The level of compression for the rendered PNG. Ranges from 0 (low compression, faster) to 9 (high compression, slower).

select_ipc(operation, parameters=None, first_n=- 1, release_memory=True, transport_method=1)¶

Execute a SELECT operation using CPU shared memory

Parameters

operation: str: A SQL select statement
parameters: dict, optional: Parameters to insert for a parametrized query
first_n: int, optional: Number of records to return
release_memory: bool, optional: Call self.deallocate_ipc(df) after DataFrame created

Returns

df: pandas.DataFrame

Notes

This method requires the Python code to be executed on the same machine where HeavyDB running.

select_ipc_gpu(operation, parameters=None, device_id=0, first_n=- 1, release_memory=True)¶

Execute a SELECT operation using GPU memory.

Parameters

operation: str: A SQL statement
parameters: dict, optional: Parameters to insert into a parametrized query
device_id: int: GPU to return results to
first_n: int, optional: Number of records to return
release_memory: bool, optional: Call self.deallocate_ipc_gpu(df) after DataFrame created

Returns

gdf: cudf.GpuDataFrame

Notes

This method requires cudf and libcudf to be installed. An ImportError is raised if those aren’t available.

This method requires the Python code to be executed on the same machine where HeavyDB running.

class heavyai.Cursor(connection)¶

A database cursor.

property arraysize¶

The number of rows to fetch at a time with fetchmany. Default 1.

See also

fetchmany

close()¶: Close this cursor.

property description¶

Read-only sequence describing columns of the result set. Each column is an instance of Description describing

name
type_code
display_size
internal_size
precision
scale
null_ok

We only use name, type_code, and null_ok; The rest are always None

execute(operation, parameters=None)¶

Execute a SQL statement.

Parameters

operation: str: A SQL query
parameters: dict: Parameters to substitute into operation.

Returns

selfCursor

Examples

>>> c = conn.cursor()
>>> c.execute("select symbol, qty from stocks")
>>> list(c)
[('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)]

Passing in parameters:

>>> c.execute("select symbol qty from stocks where qty <= :max_qty",
...           parameters={"max_qty": 500})
[('RHAT', 100.0), ('IBM', 500.0)]

executemany(operation, parameters)¶

Execute a SQL statement for many sets of parameters.

Parameters

operation: str
parameters: list of dict

Returns

results: list of lists

fetchmany(size=None)¶: Fetch size rows from the results set.

fetchone()¶: Fetch a single row from the results set

heavyai.connect(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)¶

Create a new Connection.

Parameters

uri: str
user: str
password: str
host: str
port: int
dbname: str
protocol: {‘binary’, ‘http’, ‘https’}
sessionid: str
bin_cert_validate: bool, optional, binary encrypted connection only: Whether to continue if there is any certificate error
bin_ca_certs: str, optional, binary encrypted connection only: Path to the CA certificate file
idpurlstr: EXPERIMENTAL Enable SAML authentication by providing the logon page of the SAML Identity Provider.
idpformusernamefield: str: The HTML form ID for the username, defaults to ‘username’.
idpformpasswordfield: str: The HTML form ID for the password, defaults to ‘password’.
idpsslverify: str: Enable / disable certificate checking, defaults to True.

Returns

conn: Connection

Examples

You can either pass a string uri, all the individual components, or an existing sessionid excluding user, password, and database

>>> connect('mapd://admin:HyperInteractive@localhost:6274/heavydb?'
...         'protocol=binary')
Connection(mapd://mapd:***@localhost:6274/mapd?protocol=binary)

>>> connect(user='admin', password='HyperInteractive', host='localhost',
...         port=6274, dbname='heavydb')

>>> connect(user='admin', password='HyperInteractive', host='localhost',
...         port=443, idpurl='https://sso.localhost/logon',
            protocol='https')

>>> connect(sessionid='XihlkjhdasfsadSDoasdllMweieisdpo', host='localhost',
...         port=6273, protocol='http')

Exceptions¶

Define exceptions as specified by the DB API 2.0 spec.

Includes some helper methods for translating thrift exceptions to the ones defined here.

exception omnisci.exceptions.DatabaseError¶: Raised when the database encounters an error.

exception omnisci.exceptions.Error¶: Base class for all pymapd errors.

exception omnisci.exceptions.IntegrityError¶: Raised when the relational integrity of the database is affected.

exception omnisci.exceptions.InterfaceError¶: Raised whenever you use pymapd interface incorrectly.

exception omnisci.exceptions.InternalError¶: Raised for errors internal to the database, e.g. and invalid cursor.

exception omnisci.exceptions.NotSupportedError¶: Raised when an API not supported by the database is used.

exception omnisci.exceptions.OperationalError¶: Raised for non-programmer related database errors, e.g. an unexpected disconnect.

exception omnisci.exceptions.ProgrammingError¶: Raised for programming errors, e.g. syntax errors, table already exists.