parameterized_subdag#

Reference Documentation

class hamilton.function_modifiers.parameterized_subdag(*load_from: module | Callable, inputs: Dict[str, ParametrizedDependency | LiteralDependency] = None, config: Dict[str, Any] = None, external_inputs: List[str] = None, **parameterization: SubdagParams)#

parameterized subdag is when you want to create multiple subdags at one time. Why might you want to do this?

  1. You have multiple data sets you want to run the same feature engineering pipeline on.

  2. You want to run some sort of optimization routine with a variety of results

  3. You want to run some sort of pipeline over slightly different configuration (E.G. region/business line)

Note that this really is just syntactic sugar for creating multiple subdags, just as @parameterize is syntactic sugar for creating multiple nodes from a function. That said, it is common that you won’t know what you want until compile time (E.G. when you have the config available), so this decorator along with the `@resolve decorator is a good way to make that feasible. Note that we are getting into advanced Hamilton here – we don’t recommend starting with this. In fact, we generally recommend repeating subdags multiple times if you don’t have too many. That said, that can get cumbersome if you have a lot, so this decorator is a good way to help with that.

Let’s take a look at an example:

@parameterized_subdag(
    feature_modules,
    from_datasource_1={"inputs" : {"data" : value("datasource_1.csv"}},
    from_datasource_2={"inputs" : {"data" : value("datasource_2.csv"}},
    from_datasource_3={
        "inputs" : {"data" : value("datasource_3.csv"},
        "config" : {"filter" : "only_even_client_ids"}
    }
)
def feature_engineering(feature_df: pd.DataFrame) -> pd.DataFrame:
    return feature_df

This is (obviously) contrived, but what it does is create three subdags, each with a different data source. The third one also applies a configuration to that subdags. Note that we can also pass in inputs/config to the decorator itself, which will be applied to all subdags.

This is effectively the same as the example above.

@parameterized_subdag(
    feature_modules,
    inputs={"data" : value("datasource_1.csv")},
    from_datasource_1={},
    from_datasource_2={
            "inputs" : {"data" : value("datasource_2.csv"}
    },
    from_datasource_3={
            "inputs" : {"data" : value("datasource_3.csv"},
            "config" : {"filter" : "only_even_client_ids"},
    }
)

Again, think about whether this feature is really the one you want – often times, verbose, static DAGs are far more readable than very concise, highly parameterized DAGs.

__init__(*load_from: module | Callable, inputs: Dict[str, ParametrizedDependency | LiteralDependency] = None, config: Dict[str, Any] = None, external_inputs: List[str] = None, **parameterization: SubdagParams)#

Initializes a parameterized_subdag decorator.

Parameters:
  • load_from – Modules to load from

  • inputs – Inputs for each subdag generated by the decorated function

  • config – Config for each subdag generated by the decorated function

  • external_inputs – External inputs to all parameterized subdags. Note that if you pass in any external inputs from local subdags, it overrides this (does not merge).

  • parameterization

    Parameterizations for each subdag generated. Note that this overrides any inputs/config passed to the decorator itself.

    Furthermore, note the following:

    1. The parameterizations passed to the constructor are **kwargs, so you are not allowed to name these load_from, inputs, or config. That’s a good thing, as these are not good names for variables anyway.

    2. Any empty items (not included) will default to an empty dict (or an empty list in the case of parameterization)