Using Data Adapters¶
This is an index of all the available data adapters, both savers and loaders. Note that some savers and loaders are the same (certain classes can handle both), but some are different. You will want to reference this when calling out to any of the following:
Using load_from [or for just exposing metadata dataloader].
Using materializers.
To read these tables, you want to first look at the key to determine which format you want – these should be human-readable and familiar to you. Then you’ll want to look at the types field to figure out which is the best for your case (the object you want to load from or save to).
Finally, look up the adapter params to see what parameters you can pass to the data adapters. The optional params come with their default value specified.
If you want more information, click on the module, it will send you to the code that implements it to see how the parameters are used.
As an example, say we wanted to save a pandas dataframe to a CSV file. We would first find the key csv, which would inform us that we want to call save_to.csv (or to.csv in the case of materialize). Then, we would look at the types field, finding that there is a pandas dataframe adapter. Finally, we would look at the params field, finding that we can pass path, and (optionally) sep (which we’d realize defaults to , when looking at the code).
All together, we’d end up with:
import pandas as pd
from hamilton.function_modifiers import value, save_to
@save_to.csv(path=value("my_file.csv"))
def my_data(...) -> pd.DataFrame:
...
For a less “abstracted” approach, where you just expose metadata from saving and loading, you can annotated your saving/loading functions to do so, e.g. analogous to the above you could do:
import pandas as pd
from hamilton.function_modifiers import datasaver
def my_data(...) -> pd.DataFrame:
# your function
...
return _df # return some df
@datasaver
def my_data_saver(my_data: pd.DataFrame, path: str) -> dict:
# code to save my_data
return {"path": path, "type": "csv", ...} # add other metadata
See dataloader for more information on how to load data and expose metadata via this more lighter weight way.
If you want to extend the @save_to or @load_from decorators, see Using Data Adapters for documentation, and the example in the repository for an example of how to do so.
Note that you will need to call registry.register_adapters (or import a module that does that) prior to dynamically referring to these in the code – otherwise we won’t know about them, and won’t be able to access that key!
Data Loaders¶
key |
loader params |
types |
module |
---|---|---|---|
json |
str |
dict list |
hamilton.io.default_data_loaders |
json |
Union chunksize Optional=None compression Union=infer convert_axes Optional=None convert_dates Union=True date_unit Optional=None dtype Union=None dtype_backend Optional=None encoding Optional=None encoding_errors Optional=strict engine str=ujson keep_default_dates bool=True lines bool=False nrows Optional=None orient Optional=None precise_float bool=False storage_options Optional=None typ str=frame |
DataFrame |
hamilton.plugins.pandas_extensions |
json |
Union schema Union=None schema_overrides Union=None |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
json |
Union |
XGBModel Booster |
hamilton.plugins.xgboost_extensions |
literal |
Any |
Any |
hamilton.io.default_data_loaders |
file |
str encoding str=utf-8 |
str |
hamilton.io.default_data_loaders |
file |
Union |
LGBMModel Booster CVBooster |
hamilton.plugins.lightgbm_extensions |
pickle |
str |
object Any |
hamilton.io.default_data_loaders |
pickle |
Union=None path Union=None compression Union=infer storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
environment |
Tuple |
dict |
hamilton.io.default_data_loaders |
yaml |
Union |
str int float bool dict list |
hamilton.plugins.yaml_extensions |
npy |
Union mmap_mode Optional=None allow_pickle Optional=None fix_imports Optional=None encoding Literal=ASCII |
ndarray |
hamilton.plugins.numpy_extensions |
csv |
Union sep Optional=, delimiter Optional=None header Union=infer names Optional=None index_col Union=None usecols Union=None dtype Union=None engine Optional=None converters Optional=None true_values Optional=None false_values Optional=None skipinitialspace Optional=False skiprows Union=None skipfooter int=0 nrows Optional=None na_values Union=None keep_default_na bool=True na_filter bool=True verbose bool=False skip_blank_lines bool=True parse_dates Union=False keep_date_col bool=False date_format Optional=None dayfirst bool=False cache_dates bool=True iterator bool=False chunksize Optional=None compression Union=infer thousands Optional=None decimal str=. lineterminator Optional=None quotechar Optional=None quoting int=0 doublequote bool=True escapechar Optional=None comment Optional=None encoding str=utf-8 encoding_errors Union=strict dialect Union=None on_bad_lines Union=error delim_whitespace bool=False low_memory bool=True memory_map bool=False float_precision Optional=None storage_options Optional=None dtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
csv |
Union has_header bool=True include_header bool=True columns Union=None new_columns Sequence=None separator str=, comment_char str=None quote_char str=" skip_rows int=0 dtypes Union=None null_values Union=None missing_utf8_is_empty_string bool=False ignore_errors bool=False try_parse_dates bool=False n_threads int=None infer_schema_length int=100 batch_size int=8192 n_rows int=None encoding Union=utf8 low_memory bool=False rechunk bool=True use_pyarrow bool=False storage_options Dict=None skip_rows_after_header int=0 row_count_name str=None row_count_offset int=0 sample_size int=1024 eol_char str=
raise_if_empty bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
csv |
Union has_header bool=True columns Union=None new_columns Sequence=None separator str=, comment_char str=None quote_char str=" skip_rows int=0 dtypes Union=None null_values Union=None missing_utf8_is_empty_string bool=False ignore_errors bool=False try_parse_dates bool=False n_threads int=None infer_schema_length int=100 batch_size int=8192 n_rows int=None encoding Union=utf8 low_memory bool=False rechunk bool=True use_pyarrow bool=False storage_options Dict=None skip_rows_after_header int=0 row_count_name str=None row_count_offset int=0 eol_char str=
raise_if_empty bool=True |
LazyFrame |
hamilton.plugins.polars_lazyframe_extensions |
csv |
SparkSession path str header bool=True sep str=, |
DataFrame |
hamilton.plugins.spark_extensions |
parquet |
Union engine Literal=auto columns Optional=None storage_options Optional=None use_nullable_dtypes bool=False dtype_backend Literal=numpy_nullable filesystem Optional=None filters Union=None |
DataFrame |
hamilton.plugins.pandas_extensions |
parquet |
Union columns Union=None n_rows int=None use_pyarrow bool=False memory_map bool=True storage_options Dict=None parallel Any=auto row_count_name str=None row_count_offset int=0 low_memory bool=False pyarrow_options Dict=None use_statistics bool=True rechunk bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
parquet |
Union columns Union=None n_rows int=None use_pyarrow bool=False memory_map bool=True storage_options Dict=None parallel Any=auto row_count_name str=None row_count_offset int=0 low_memory bool=False use_statistics bool=True rechunk bool=True |
LazyFrame |
hamilton.plugins.polars_lazyframe_extensions |
parquet |
SparkSession path str |
DataFrame |
hamilton.plugins.spark_extensions |
sql |
str db_connection Union chunksize Optional=None coerce_float bool=True columns Optional=None dtype Union=None dtype_backend Optional=None index_col Union=None params Union=None parse_dates Union=None |
DataFrame |
hamilton.plugins.pandas_extensions |
xml |
Union xpath Optional=./* namespace Optional=None elems_only Optional=False attrs_only Optional=False names Optional=None dtype Optional=None converters Optional=None parse_dates Union=False encoding Optional=utf-8 parser str=lxml stylesheet Union=None iterparse Optional=None compression Union=infer storage_options Optional=None dtype_backend str=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
html |
Union match Optional=.+ flavor Union=None header Union=None index_col Union=None skiprows Union=None attrs Optional=None parse_dates Optional=None thousands Optional=, encoding Optional=None decimal str=. converters Optional=None na_values Iterable=None keep_default_na bool=True displayed_only bool=True extract_links Optional=None dtype_backend Literal=numpy_nullable storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
stata |
Union convert_dates bool=True convert_categoricals bool=True index_col Optional=None convert_missing bool=False preserve_dtypes bool=True columns Optional=None order_categoricals bool=True chunksize Optional=None iterator bool=False compression Union=infer storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
feather |
Union columns Optional=None use_threads bool=True storage_options Optional=None dtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
feather |
Union columns Union=None n_rows Optional=None use_pyarrow bool=False memory_map bool=True storage_options Optional=None row_count_name Optional=None row_count_offset int=0 rechunk bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
feather |
Union columns Union=None n_rows Optional=None use_pyarrow bool=False memory_map bool=True storage_options Optional=None row_count_name Optional=None row_count_offset int=0 rechunk bool=True |
LazyFrame |
hamilton.plugins.polars_lazyframe_extensions |
orc |
Union columns Optional=None dtype_backend Literal=numpy_nullable filesystem Union=None |
DataFrame |
hamilton.plugins.pandas_extensions |
excel |
Union=None sheet_name Union=0 header Union=0 names Optional=None index_col Union=None usecols Union=None dtype Union=None engine Optional=None converters Union=None true_values Optional=None false_values Optional=None skiprows Union=None nrows Optional=None keep_default_na bool=True na_filter bool=True verbose bool=False parse_dates Union=False date_format Union=None thousands Optional=None decimal str=. comment Optional=None skipfooter int=0 storage_options Optional=None dtype_backend Literal=numpy_nullable engine_kwargs Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
table |
Union sep Optional=None delimiter Optional=None header Union=infer names Optional=None index_col Union=None usecols Optional=None dtype Union=None engine Optional=None converters Optional=None true_values Optional=None false_values Optional=None skipinitialspace bool=False skiprows Union=None skipfooter int=0 nrows Optional=None na_values Union=None keep_default_na bool=True na_filter bool=True verbose bool=False skip_blank_lines bool=True parse_dates Union=False infer_datetime_format bool=False keep_date_col bool=False date_parser Optional=None date_format Optional=None dayfirst bool=False cache_dates bool=True iterator bool=False chunksize Optional=None compression Union=infer thousands Optional=None decimal str=. lineterminator Optional=None quotechar Optional=" quoting int=0 doublequote bool=True escapechar Optional=None comment Optional=None encoding Optional=None encoding_errors Optional=strict dialect Optional=None on_bad_lines Union=error delim_whitespace bool=False low_memory bool=True memory_map bool=False float_precision Optional=None storage_options Optional=None dtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
fwf |
Union colspecs Union=infer widths Optional=None infer_nrows int=100 dtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
spss |
Union usecols Union=None convert_categoricals bool=True dtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
avro |
Union columns Union=None n_rows Optional=None |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
database |
str connection Union iter_batches bool=False batch_size Optional=None schema_overrides Optional=None infer_schema_length Optional=None execute_options Optional=None |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
spreadsheet |
Union sheet_id Union=None sheet_name Union=None engine Literal=xlsx2csv engine_options Optional=None read_options Optional=None schema_overrides Optional=None raise_if_empty bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
dlt |
DltResource |
DataFrame |
hamilton.plugins.dlt_extensions |
mlflow |
Optional=None mode Literal=tracking run_id Optional=None path Union=model model_name Optional=None version Union=None version_alias Optional=None flavor Union=None mlflow_kwargs Dict=None |
Any |
hamilton.plugins.mlflow_extensions |
Data Savers¶
key |
saver params |
types |
module |
---|---|---|---|
json |
str |
dict list |
hamilton.io.default_data_loaders |
json |
Union compression str=infer date_format str=epoch date_unit str=ms default_handler Optional=None double_precision int=10 force_ascii bool=True index Optional=None indent int=0 lines bool=False mode str=w orient Optional=None storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
json |
Union |
DataFrame LazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
json |
Union |
XGBModel Booster |
hamilton.plugins.xgboost_extensions |
file |
str encoding str=utf-8 |
str |
hamilton.io.default_data_loaders |
file |
Union |
bytes BytesIO |
hamilton.io.default_data_loaders |
file |
Union num_iteration Optional=None start_iteration int=0 importance_type Literal=split |
LGBMModel Booster CVBooster |
hamilton.plugins.lightgbm_extensions |
pickle |
str |
object |
hamilton.io.default_data_loaders |
pickle |
Union compression Union=infer protocol int=5 storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
memory | Any |
hamilton.io.default_data_loaders | |
yaml |
Union |
str int float bool dict list |
hamilton.plugins.yaml_extensions |
plt |
Union dpi Union=None format Optional=None metadata Optional=None bbox_inches Union=None pad_inches Union=None facecolor Union=None edgecolor Union=None backend Optional=None orientation Optional=None papertype Optional=None transparent Optional=None bbox_extra_artists Optional=None pil_kwargs Optional=None |
Figure |
hamilton.plugins.matplotlib_extensions |
npy |
Union allow_pickle Optional=None fix_imports Optional=None |
ndarray |
hamilton.plugins.numpy_extensions |
csv |
Union sep Optional=, na_rep str= float_format Union=None columns Optional=None header Union=True index Optional=False index_label Union=None mode str=w encoding Optional=None compression Union=infer quoting Optional=None quotechar Optional=" lineterminator Optional=None chunksize Optional=None date_format Optional=None doublequote bool=True escapechar Optional=None decimal str=. errors str=strict storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
csv |
Union include_header bool=True separator str=, line_terminator str=
quote_char str=" batch_size int=1024 datetime_format str=None date_format str=None time_format str=None float_precision int=None null_value str=None quote_style Type=None |
DataFrame LazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
parquet |
Union engine Literal=auto compression Optional=snappy index Optional=None partition_cols Optional=None storage_options Optional=None extra_kwargs Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
parquet |
Union compression Any=zstd compression_level int=None statistics bool=False row_group_size int=None use_pyarrow bool=False pyarrow_options Dict=None |
DataFrame LazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
sql |
str db_connection Any chunksize Optional=None dtype Union=None if_exists str=fail index bool=True index_label Union=None method Union=None schema Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
xml |
Union index bool=True root_name str=data row_name str=row na_rep Optional=None attr_cols Optional=None elems_cols Optional=None namespaces Optional=None prefix Optional=None encoding str=utf-8 xml_declaration bool=True pretty_print bool=True parser str=lxml stylesheet Union=None compression Union=infer storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
html |
Union=None columns Optional=None col_space Union=None header Optional=True index Optional=True na_rep Optional=NaN formatters Union=None float_format Optional=None sparsify Optional=True index_names Optional=True justify str=None max_rows Optional=None max_cols Optional=None show_dimensions bool=False decimal str=. bold_rows bool=True classes Union=None escape Optional=True notebook Literal=False border int=None table_id Optional=None render_links bool=False encoding Optional=utf-8 |
DataFrame |
hamilton.plugins.pandas_extensions |
stata |
Union=None convert_dates Optional=None write_index bool=True byteorder Optional=None time_stamp Optional=None data_label Optional=None variable_labels Optional=None version Literal=114 convert_strl Optional=None compression Union=infer storage_options Optional=None value_labels Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
feather |
Union dest Optional=None compression Literal=None compression_level Optional=None chunksize Optional=None version Optional=2 |
DataFrame |
hamilton.plugins.pandas_extensions |
feather |
Union=None compression Type=uncompressed |
DataFrame LazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
orc |
Union engine Literal=pyarrow index Optional=None engine_kwargs Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
excel |
Union sheet_name str=Sheet1 na_rep str= float_format Optional=None columns Optional=None header Union=True index bool=True index_label Union=None startrow int=0 startcol int=0 engine Optional=None merge_cells bool=True inf_rep str=inf freeze_panes Optional=None storage_options Optional=None engine_kwargs Optional=None mode Optional=w if_sheet_exists Optional=None datetime_format str=None date_format str=None |
DataFrame |
hamilton.plugins.pandas_extensions |
avro |
Union compression Any=uncompressed |
DataFrame LazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
database |
str connection Union if_table_exists Literal=fail engine Literal=sqlalchemy |
DataFrame LazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
spreadsheet |
Union worksheet Optional=None position Union=A1 table_style Union=None table_name Optional=None column_formats Optional=None dtype_formats Optional=None conditional_formats Optional=None header_format Optional=None column_totals Union=None column_widths Union=None row_totals Union=None row_heights Union=None sparklines Optional=None formulas Optional=None float_precision int=3 include_header bool=True autofilter bool=True autofit bool=False hidden_columns Union=None hide_gridlines bool=None sheet_zoom Optional=None freeze_panes Union=None |
DataFrame LazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
png |
Union dpi float=200 format str=png metadata Optional=None bbox_inches str=None pad_inches float=0.1 backend Optional=None papertype str=None transparent bool=None bbox_extra_artists Optional=None pil_kwargs Optional=None |
ConfusionMatrixDisplay DetCurveDisplay PrecisionRecallDisplay PredictionErrorDisplay RocCurveDisplay DecisionBoundaryDisplay LearningCurveDisplay PartialDependenceDisplay ValidationCurveDisplay Figure |
hamilton.plugins.sklearn_plot_extensions |
dlt |
Pipeline table_name str primary_key Optional=None write_disposition Optional=None columns Optional=None schema Optional=None loader_file_format Optional=None |
Iterable DataFrame Table RecordBatch |
hamilton.plugins.dlt_extensions |
mlflow |
Union=model register_as Optional=None flavor Union=None run_id Optional=None mlflow_kwargs Dict=None |
Any |
hamilton.plugins.mlflow_extensions |