Extension autoloading¶

Under hamilton.plugins, there are many modules named *_extensions (e.g., hamilton.plugins.pandas_extensions, hamilton.plugins.mlflow_extensions). They implement Hamilton features for 3rd party libraries, including @extract_columns, materializers (to.parquet, from_.mlflow), and more.

Autoloading behavior¶

By default, Hamilton attempts to load all extensions one-by-one. This means that as you have more Python packages in your environment (e.g., pandas, pyspark, mlflow, xgboost), importing Hamilton appears to become slower because it actually imports many packages.

This behavior can be less desirable when your Hamilton dataflow doesn’t use any of these packages, but you need them in your Python environment nonetheless. For example, if only pandas is needed for your dataflow, but you have mlflow and xgboost in your environment their respective extensions will be loaded each time.

Disable autoloading¶

Disabling extension autoloading allows to import Hamilton without any extensions, which can reduce import time from 2-3 sec to less than 0.5 sec. This speedup is welcomed when you need to restart a notebook’s kernel often or you’re operating in a low RAM environment (some Python packages are larger than 50Mbs).

There are three ways to opt-out: programmatically, environment variables, configuration file. You must opt-out before having any other hamilton import.

1. Programmatically¶

from hamilton import registry
registry.disable_autoload()

2. Environment variables¶

From the console

export HAMILTON_AUTOLOAD_EXTENSIONS=0

Programmatically via Python os.environ.

import os
os.environ["HAMILTON_AUTOLOAD_EXTENSIONS"] = "0"

Programmatically in Jupyter notebooks

%env HAMILTON_AUTOLOAD_EXTENSIONS=0

3. Configuration file¶

Using the following command disables autoloading via the configuration file ./hamilton.conf. Hamilton won’t autoload extensions anymore (i.e., you won’t need to use approach 1 or 2 each time).

hamilton-disable-autoload-extensions

To revert this configuration use the following command

hamilton-enable-autoload-extensions

To reenable autoloading in specific files, you can delete the environment variable or use registry.enable_autoload() before calling registry.initialize()

from hamilton import registry
registry.enable_autoload()
registry.initialize()

Manually loading extensions¶

If you disabled autoloading, extensions need to be loaded manually. You should load them before having any other hamilton import to avoid hard-to-track bugs. There are two ways.

1. Importing the extension¶

from hamilton.plugins import pandas_extensions, mlflow_extensions

2. Registering the extension¶

This approach has good IDE support via typing.Literal

from hamilton import registry
registry.load_extensions("mlflow")