load_from

Reference Documentation

class hamilton.function_modifiers.load_from

Decorator to inject externally loaded data into a function. Ideally, anything that is not a pure transform should either call this, or accept inputs from an external location.

This decorator functions by “injecting” a parameter into the function. For example, the following code will load the json file, and inject it into the function as the parameter input_data. Note that the path for the JSON file comes from another node called raw_data_path (which could also be passed in as an external input).

@load_from.json(path=source("raw_data_path"))
def raw_data(input_data: dict) -> dict:
    return input_data

The decorator can also be used with value to inject a constant value into the loader. In the following case, we use the literal value “some/path.json” as the path to the JSON file.

@load_from.json(path=value("some/path.json"))
def raw_data(input_data: dict) -> dict:
    return input_data

Note that, if neither source nor value is specified, the value will be passed in as a literal value.

@load_from.json(path="some/path.json")
def raw_data(input_data: dict) -> dict:
    return input_data

You can also utilize the inject_ parameter in the loader if you want to inject the data into a specific param. For example, the following code will load the json file, and inject it into the function as the parameter data.

@load_from.json(path=source("raw_data_path"), inject_="data")
def raw_data(data: dict, valid_keys: List[str]) -> dict:
    return [item for item in data if item in valid_keys]

You can also utilize multiple data loaders with separate inject_ parameters to load from multiple files. data loaders to a single function:

@load_from.json(path=source("raw_data_path"), inject_="data")
@load_from.json(path=source("raw_data_path2"), inject_="data2")
def raw_data(data: dict, data2: dict) -> dict:
    return [item for item in data if item in data2]

This is a highly pluggable functionality – here’s the basics of how it works:

1. Every “key” (json above, but others include csv, literal, file, pickle, etc…) corresponds to a set of loader classes. For example, the json key corresponds to the JSONLoader class in default_data_loaders. They implement the classmethod name. Once they are registered with the central registry they pick

2. Every data loader class (which are all dataclasses) implements the load_targets method, which returns a list of types it can load to. For example, the JSONLoader class can load data of type dict. Note that the set of potential loading candidate classes are evaluated in reverse order, so the most recently registered loader class is the one that is used. That way, you can register custom ones.

3. The loader class is instantiated with the kwargs passed to the decorator. For example, the JSONLoader class takes a path kwarg, which is the path to the JSON file.

4. The decorator then creates a node that loads the data, and modifies the node that runs the function to accept that. It also returns metadata (customizable at the loader-class-level) to enable debugging after the fact. This is unstructured, but can be used down the line to describe any metadata to help debug.

The “core” hamilton library contains a few basic data loaders that can be implemented within the confines of python’s standard library. pandas_extensions contains a few more that require pandas to be installed.

Note that these can have default arguments, specified by defaults in the dataclass fields. For the full set of “keys” and “types” (e.g. load_from.json, etc…), look for all classes that inherit from DataLoader in the hamilton library. We plan to improve documentation shortly to make this discoverable.

__init__()