API Reference

class heavyai.Connection(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)
change_dashboard_sources(dashboard: omnisci.thrift.ttypes.TDashboard, remap: dict) omnisci.thrift.ttypes.TDashboard

Change the sources of a dashboard

Parameters
dashboard: TDashboard

The HeavyDB dashboard object to transform

remap: dict

EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.

Returns
dashboard: TDashboard

An HeavyDB dashboard with the sources remapped

Examples

>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}}
>>> dash = con.get_dashboard(1)
>>> newdash = con.change_dashboard_sources(dash, source_remap)
create_dashboard(dashboard: omnisci.thrift.ttypes.TDashboard) int

Create a new dashboard

Parameters
dashboard: TDashboard

The HeavyDB dashboard object to create

Returns
dashboardid: int

The dashboard id of the new dashboard

create_table(table_name, data, preserve_index=False)

Create a table from a pandas.DataFrame

Parameters
table_name: str
data: DataFrame
preserve_index: bool, default False

Whether to create a column in the table for the DataFrame index

deallocate_ipc(df, device_id=0)

Deallocate a DataFrame using CPU shared memory.

Parameters
device_id: int

GPU which contains TDataFrame

deallocate_ipc_gpu(df, device_id=0)

Deallocate a DataFrame using GPU memory.

Parameters
device_ids: int

GPU which contains TDataFrame

duplicate_dashboard(dashboard_id, new_name=None, source_remap=None)

Duplicate an existing dashboard, returning the new dashboard id.

Parameters
dashboard_id: int

The id of the dashboard to duplicate

new_name: str

The name for the new dashboard

source_remap: dict

EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.

Examples

>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}}
>>> newdash = con.duplicate_dashboard(12345, "new dash", source_remap)
get_dashboard(dashboard_id)

Return the dashboard object of a specific dashboard

Examples

>>> con.get_dashboard(123)
get_dashboards()

List all the dashboards in the database

Examples

>>> con.get_dashboards()
get_table_details(table_name)

Get the column names and data types associated with a table.

Parameters
table_name: str
Returns
details: List[tuples]

Examples

>>> con.get_table_details('stocks')
[ColumnDetails(name='date_', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ColumnDetails(name='trans', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ...
]
get_tables()

List all the tables in the database

Examples

>>> con.get_tables()
['flights_2008_10k', 'stocks']
load_table(table_name, data, method='infer', preserve_index=False, create='infer', column_names=[])

Load data into a table

Parameters
table_name: str
data: pyarrow.Table, pandas.DataFrame, or iterable of tuples
method: {‘infer’, ‘columnar’, ‘rows’, ‘arrow’}

Method to use for loading the data. Three options are available

  1. pyarrow and Apache Arrow loader

  2. columnar loader

  3. row-wise loader

The Arrow loader is typically the fastest, followed by the columnar loader, followed by the row-wise loader. If a DataFrame or pyarrow.Table is passed and pyarrow is installed, the Arrow-based loader will be used. If arrow isn’t available, the columnar loader is used. Finally, data is an iterable of tuples the row-wise loader is used.

preserve_index: bool, default False

Whether to keep the index when loading a pandas DataFrame

create: {“infer”, True, False}

Whether to issue a CREATE TABLE before inserting the data.

  • infer: check to see if the table already exists, and create a table if it does not

  • True: attempt to create the table, without checking if it exists

  • False: do not attempt to create the table

load_table_arrow(table_name, data, preserve_index=False, load_column_names=[])

Load a pandas.DataFrame or a pyarrow Table or RecordBatch to the database using Arrow columnar format for interchange

Parameters
table_name: str
data: pandas.DataFrame, pyarrow.RecordBatch, pyarrow.Table
preserve_index: bool, default False

Whether to include the index of a pandas DataFrame when writing.

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_arrow('foo', df, preserve_index=False)
load_table_columnar(table_name, data, preserve_index=False, chunk_size_bytes=0, col_names_from_schema=False, column_names=[])

Load a pandas DataFrame to the database using HeavyDB’s Thrift-based columnar format

Parameters
table_name: str
data: DataFrame
preserve_index: bool, default False

Whether to include the index of a pandas DataFrame when writing.

chunk_size_bytes: integer, default 0

Chunk the loading of columns to prevent large Thrift requests. A value of 0 means do not chunk and send the dataframe as a single request

col_names_from_schema: bool, default False

Read the existing table schema to determine the column names. This will read the schema of an existing table in HeavyDB and match those names to the column names of the dataframe. This is for user convenience when loading from data that is unordered, especially handy when a table has a large number of columns.

Notes

Use pymapd >= 0.11.0 while running with heavydb >= 4.6.0 in order to avoid loading inconsistent values into DATE column.

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_columnar('foo', df, preserve_index=False)
load_table_rowwise(table_name, data, column_names=[])

Load data into a table row-wise

Parameters
table_name: str
data: Iterable of tuples

Each element of data should be a row to be inserted

Examples

>>> data = [(1, 'a'), (2, 'b'), (3, 'c')]
>>> con.load_table('bar', data)
render_vega(vega, compression_level=1)

Render vega data on the database backend, returning the image as a PNG.

Parameters
vega: dict

The vega specification to render.

compression_level: int

The level of compression for the rendered PNG. Ranges from 0 (low compression, faster) to 9 (high compression, slower).

select_ipc(operation, parameters=None, first_n=- 1, release_memory=True, transport_method=1)

Execute a SELECT operation using CPU shared memory

Parameters
operation: str

A SQL select statement

parameters: dict, optional

Parameters to insert for a parametrized query

first_n: int, optional

Number of records to return

release_memory: bool, optional

Call self.deallocate_ipc(df) after DataFrame created

Returns
df: pandas.DataFrame

Notes

This method requires the Python code to be executed on the same machine where HeavyDB running.

select_ipc_gpu(operation, parameters=None, device_id=0, first_n=- 1, release_memory=True)

Execute a SELECT operation using GPU memory.

Parameters
operation: str

A SQL statement

parameters: dict, optional

Parameters to insert into a parametrized query

device_id: int

GPU to return results to

first_n: int, optional

Number of records to return

release_memory: bool, optional

Call self.deallocate_ipc_gpu(df) after DataFrame created

Returns
gdf: cudf.GpuDataFrame

Notes

This method requires cudf and libcudf to be installed. An ImportError is raised if those aren’t available.

This method requires the Python code to be executed on the same machine where HeavyDB running.

class heavyai.Cursor(connection)

A database cursor.

property arraysize

The number of rows to fetch at a time with fetchmany. Default 1.

See also

fetchmany
close()

Close this cursor.

property description

Read-only sequence describing columns of the result set. Each column is an instance of Description describing

  • name

  • type_code

  • display_size

  • internal_size

  • precision

  • scale

  • null_ok

We only use name, type_code, and null_ok; The rest are always None

execute(operation, parameters=None)

Execute a SQL statement.

Parameters
operation: str

A SQL query

parameters: dict

Parameters to substitute into operation.

Returns
selfCursor

Examples

>>> c = conn.cursor()
>>> c.execute("select symbol, qty from stocks")
>>> list(c)
[('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)]

Passing in parameters:

>>> c.execute("select symbol qty from stocks where qty <= :max_qty",
...           parameters={"max_qty": 500})
[('RHAT', 100.0), ('IBM', 500.0)]
executemany(operation, parameters)

Execute a SQL statement for many sets of parameters.

Parameters
operation: str
parameters: list of dict
Returns
results: list of lists
fetchmany(size=None)

Fetch size rows from the results set.

fetchone()

Fetch a single row from the results set

heavyai.connect(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)

Create a new Connection.

Parameters
uri: str
user: str
password: str
host: str
port: int
dbname: str
protocol: {‘binary’, ‘http’, ‘https’}
sessionid: str
bin_cert_validate: bool, optional, binary encrypted connection only

Whether to continue if there is any certificate error

bin_ca_certs: str, optional, binary encrypted connection only

Path to the CA certificate file

idpurlstr

EXPERIMENTAL Enable SAML authentication by providing the logon page of the SAML Identity Provider.

idpformusernamefield: str

The HTML form ID for the username, defaults to ‘username’.

idpformpasswordfield: str

The HTML form ID for the password, defaults to ‘password’.

idpsslverify: str

Enable / disable certificate checking, defaults to True.

Returns
conn: Connection

Examples

You can either pass a string uri, all the individual components, or an existing sessionid excluding user, password, and database

>>> connect('mapd://admin:HyperInteractive@localhost:6274/heavydb?'
...         'protocol=binary')
Connection(mapd://mapd:***@localhost:6274/mapd?protocol=binary)
>>> connect(user='admin', password='HyperInteractive', host='localhost',
...         port=6274, dbname='heavydb')
>>> connect(user='admin', password='HyperInteractive', host='localhost',
...         port=443, idpurl='https://sso.localhost/logon',
            protocol='https')
>>> connect(sessionid='XihlkjhdasfsadSDoasdllMweieisdpo', host='localhost',
...         port=6273, protocol='http')

Exceptions

Define exceptions as specified by the DB API 2.0 spec.

Includes some helper methods for translating thrift exceptions to the ones defined here.

exception omnisci.exceptions.DatabaseError

Raised when the database encounters an error.

exception omnisci.exceptions.Error

Base class for all pymapd errors.

exception omnisci.exceptions.IntegrityError

Raised when the relational integrity of the database is affected.

exception omnisci.exceptions.InterfaceError

Raised whenever you use pymapd interface incorrectly.

exception omnisci.exceptions.InternalError

Raised for errors internal to the database, e.g. and invalid cursor.

exception omnisci.exceptions.NotSupportedError

Raised when an API not supported by the database is used.

exception omnisci.exceptions.OperationalError

Raised for non-programmer related database errors, e.g. an unexpected disconnect.

exception omnisci.exceptions.ProgrammingError

Raised for programming errors, e.g. syntax errors, table already exists.