Contributing to heavyai¶
As an open-source company, Heavy.AI welcomes contributions to all of its open-source repositories, including heavyai. All discussion and development takes place via the heavyai GitHub repository.
It is suggested, but not required, that you create a GitHub issue before contributing a feature or bug fix. This is so that other developers 1) know that you are working on the feature/issue and 2) that internal Heavy.AI experts can help you navigate any database-specific logic that may not be obvious within heavyai. All patches should be submitted as pull requests, and upon passing the test suite and review by Heavy.AI, will be merged to master for release as part of the next package release cycle.
Development Environment Setup¶
heavyai is written in plain Python 3 (i.e. no Cython), and as such, doesn’t require any specialized development environment outside of installing the dependencies. However, we do suggest creating a new conda development environment with the provided conda environment.yml file to ensure that your changes work without relying on unspecified system-level Python packages.
Two development environment files are provided: one to provide the packages needed to develop on CPU only, and the other to provide GPU development packages. Only one is required, but you may decide to use both in order to run pytest against a CPU or GPU environment.
A heavyai development environment can be setup with the following:
# clone heavyai repo git clone https://github.com/heavyai/heavyai.git && cd heavyai conda env create -f ci/environment.yml # ensure you have activated the environment conda activate heavyai-dev # install pre-commit hooks make develop
# from the heavyai project root conda env create -f ci/environment_gpu.yml # ensure you have activated the environment conda activate heavyai-gpu-dev # install pre-commit hooks make develop
At this point, you have everything you need to develop heavyai. However, to run the test suite, you need to be running an instance of HeavyDB on the same machine you are devloping on. Heavy.AI provides Docker images that work great for this purpose.
Docker Environment Setup¶
HeavyDB Core CPU-only¶
Unless you are planning on developing GPU-specific functionality in heavyai, using the CPU image is enough to run the test suite:
docker run \ -d \ --name heavyai \ -p 6274:6274 \ -p 6278:6278 \ --ipc=host \ -v /home/<username>/heavydb-storage:/heavydb-storage \ heavyai/core-os-cpu
- With the above code, we:
create/run an instance of HeavyDB Core CPU as a daemon (i.e. running in the background until stopped)
6274(binary connection) and
ipc=hostfor testing shared memory/IPC functionality
point to a local directory to store data loaded to HeavyDB. This allows our container to be ephemeral.
To run the test suite, call
pytest from the top-level heavyai folder:
(heavyai_dev) laptop:~/github_work/heavyai$ pytest
pytest will run through the test suite, running the tests against the Docker container. Because we are using CPU-only, the
test suite skips the GPU tests, and you can expect to see the following messages at the end of the test suite run:
=============================================== short test summary info ================================================ SKIPPED  tests/test_data_no_nulls_gpu.py:15: No GPU available SKIPPED  tests/test_deallocate.py:34: No GPU available SKIPPED  tests/test_deallocate.py:54: deallocate non-functional in recent distros SKIPPED  tests/test_deallocate.py:67: No GPU available SKIPPED  tests/test_deallocate.py:80: deallocate non-functional in recent distros SKIPPED  tests/test_deallocate.py:92: No GPU available SKIPPED  tests/test_deallocate.py:105: deallocate non-functional in recent distros SKIPPED  tests/test_integration.py:207: No GPU available SKIPPED  tests/test_integration.py:238: No GPU available ================================== 69 passed, 13 skipped, 1 warnings in 19.40 seconds ==================================
HeavyDB Core GPU-enabled¶
To run the heavyai test suite with the GPU tests, the workflow is pretty much the same as CPU-only, except with the HeavyDB Core GPU-enabled container:
docker run \ --runtime=nvidia \ -d \ --name heavyai \ -p 6274:6274 \ -p 6278:6278 \ --ipc=host \ -v /home/<username>/heavydb-storage:/heavydb-storage \ heavyai/core-os-cuda
You also need to install cudf in your development environment. Because cudf is in active development, and requires attention to the specific version of CUDA installed, we recommend checking the cudf documentation to get the most up-to-date installation instructions.
Updating Apache Thrift Bindings¶
When the upstream HeavyDB project updates its Apache Thrift definition file, the bindings shipped with
heavyai need to be regenerated. Note that the heavydb repository must be cloned locally.
# Clone the heavydb repository git clone https://github.com/heavyai/heavydb # Ensure you are at the root of the heavydb directory. cd ./heavydb # Use Thrift to generate the Python bindings thrift -gen py -r heavy.thrift # Copy the generated bindings to the heavyai root cp -r ./gen-py/heavydb/* ../heavyai/heavydb/
Updating the Documentation¶
The documentation for heavyai is generated by ReadTheDocs on each commit. Some pages (such as this one) are manually created, others such as the API Reference is generated by the docstrings from each method.
If you are planning on making non-trival changes to the documentation and want to preview the result before making a commit, you need to install sphinx and sphinx-rtd-theme into your development environment:
pip install sphinx sphinx-rtd-theme
Once you have sphinx installed, to build the documentation switch to the
heavyai/docs directory and run
make html. This will update the documentation
heavyai/docs/build/html directory. From that directory,
index.html can be opened
in the browser. Run
make html each time you save a file to see the file changes in the documentation.
Publishing a new package version¶
heavyai doesn’t currently follow a rigid release schedule; rather, when enough functionality is deemed to be “enough” for a new version to be released, or a sufficiently serious bug/issue is fixed, we will release a new version. heavyai is distributed via PyPI and conda-forge.
To publish to PyPI, we use flit in the CI. Upon a new tag push, the package is built and published on PyPI. Be sure to have a matching version in pyproject.toml and tag.
Authorized users can also publish a new version locally:
conda install flit flit build flit publish
The release process for conda-forge is triggered via creating a new version number on the heavyai GitHub repository. Given the volume of packages released on conda-forge, it can take several hours for the bot to open a PR on heavyai-feedstock. There is nothing that needs to be done to speed this up, just be patient.
When the conda-forge bot opens a PR on the heavyai-feedstock repo, one of the feedstock maintainers needs to validate the correctness of the PR, check the accuracy of the package versions on the meta.yaml recipe file, and then merge once the CI tests pass.