save_to

Reference Documentation

class hamilton.function_modifiers.save_to

Decorator that outputs data to some external source. You can think about this as the inverse of load_from.

This decorates a function, takes the final node produced by that function and then appends an additional node that saves the output of that function.

As the load_from decorator does, this decorator can be referred to in a dynamic way. For instance, @save_to.json will save the output of the function to a json file. Note that this means that the output of the function must be a dictionary (or subclass thereof), otherwise the decorator will fail.

Looking at the json example:

@save_to.json(path=source("raw_data_path"), output_name_="data_save_output")
def final_output(data: dict, valid_keys: List[str]) -> dict:
    return [item for item in data if item in valid_keys]

This adds a final node to the DAG with the name “data_save_output” that accepts the output of the function “final_output” and saves it to a json. In this case, the JSONSaver accepts a path parameter, which is provided by the upstream node (or input) named “raw_data_path”. The output_name_ parameter then says how to refer to the output of this node in the DAG.

If you called this with the driver:

dr = driver.Driver(my_module)
output = dr.execute(['final_output'], {'raw_data_path': '/path/my_data.json'})

You would just get the final result, and nothing would be saved.

If you called this with the driver:

dr = driver.Driver(my_module)
output = dr.execute(['data_save_output'], {'raw_data_path': '/path/my_data.json'})

You would get a dictionary of metadata (about the saving output), and the final result would be saved to a path.

Note that you can also hardcode the path, rather than using a dependency:

@save_to.json(path=value('/path/my_data.json'), output_name_="data_save_output")
def final_output(data: dict, valid_keys: List[str]) -> dict:
    return [item for item in data if item in valid_keys]

Note that, like the loader function, you can use literal values as kwargs and they’ll get interpreted as values. If you needs savers, you should also look into .materialize on the driver – it’s a clean way to do this in a more ad-hoc/decoupled manner.

If you want to layer savers, you’ll have to use the target_ parameter, which tells the saver which node to use.

@save_to.json(path=source("raw_data_path"), output_name_="data_save_output", target_="data")
@save_to.json(path=source("raw_data_path2"), output_name_="data_save_output2", target_="data")
def final_output(data: dict, valid_keys: List[str]) -> dict:
    return [item for item in data if item in valid_keys]
__init__()