Hopsworks comes with a prepackaged Python environment that contains libraries for data engineering, machine learning, and more general data science development. Hopsworks also offers the ability to install additional packages using different options e.g., Pypi, Conda channel, and public or private git repository among others. In some cases, the libraries require installing Linux/OS-level packages. It is also imperative to track how the environment has been evolving over time.
In Hopsworks 3.4 we have introduced new capabilities to assist in managing the Python environment:
The Hopsworks installation ships with a Miniconda environment that comes preinstalled with the most popular libraries you can find in a data scientist toolkit, including TensorFlow, PyTorch and scikit-learn. The environment is managed using the Hopsworks Python service to install libraries which may then be used in Jupyter notebooks or the Jobs service in the platform.
Some Python libraries require the installation of some OS-Level libraries. In some cases, you may need to add more complex configuration to your environment. This requires writing your own commands and executing them on top of the existing environment.
The Python environment is shared by different members of the project. When a member of the project introduces a change to the environment i.e., installs/uninstalls a library, a new environment is created and it becomes the de facto environment for everyone in the project. It is therefore important to track how the environment has been changing over time i.e., what libraries were installed, uninstalled, upgraded, or downgraded when the environment was created and who introduced the changes.
In this blog post, we will describe how you can run custom commands to install OS-Level packages or add extra configuration to the Python environment in Hopsworks. Furthermore, we will show how you can track the changes of your Python environment.
To follow this tutorial you should have an instance of Hopsworks version 3.4 or above.
In this section, we will see how you can run custom bash commands in Hopsworks to configure your Python environment.
In Hopsworks, we maintain a docker image built on top of Ubuntu Linux distribution. You can run generic bash commands on top of the project environment from the UI or REST API.
To use the UI, navigate to the Python environment in the Project settings. In the Python environment page, navigate to custom commands. From the UI, you can write the bash commands in the textbox provided. These bash commands will be uploaded and executed when building your new environment. You can include build artifacts e.g., binaries that you would like to execute or include when building the environment.
From the REST API, you should provide the path, in HopsFS, to the bash script and the artifacts. Thus, you should upload the artifacts to the Hopsworks filesystem - HopsFS. The REST API endpoint for running custom commands is: hopsworks-api/api/project/<projectId>/python/environments/<pythonVersion>/commands/custom and the POST request body should look like this:
Now let’s see an example of how you can install a Linux package, install a Python package, and use artifacts that you included in the commands file that you provide.
The bash script below shows how you can install OS-Level packages, and use the artifacts included during the build.
Now let’s look at what each command in the script does.
The Python environment evolves over time as libraries are installed, uninstalled, upgraded, and downgraded. To help you keep track of these changes, you can now access the Python environment history via the UI. This feature allows you to review the specific changes made when each new environment iteration. Hopsworks retains a versioned YAML file for each environment, enabling you to revert to an earlier environment if necessary. To compare the changes between environments, simply click the button shown in figure 2. This will display the differences between the current environment and the previous one from which it was derived.
As we can see in Figure 3, you can review custom commands for the environment in the UI, if the environment was built using custom commands.
In this article, we have shown how you can write and execute custom commands to add more sophisticated configurations to your Python environment. We have also shown how you can track the Python changes made to your environment in the UI.