Using Data Adapters¶

This is an index of all the available data adapters, both savers and loaders. Note that some savers and loaders are the same (certain classes can handle both), but some are different. You will want to reference this when calling out to any of the following:

  1. Using save_to [or for just exposing metadata datasaver].

  2. Using load_from [or for just exposing metadata dataloader].

  3. Using materializers.

To read these tables, you want to first look at the key to determine which format you want – these should be human-readable and familiar to you. Then you’ll want to look at the types field to figure out which is the best for your case (the object you want to load from or save to).

Finally, look up the adapter params to see what parameters you can pass to the data adapters. The optional params come with their default value specified.

If you want more information, click on the module, it will send you to the code that implements it to see how the parameters are used.

As an example, say we wanted to save a pandas dataframe to a CSV file. We would first find the key csv, which would inform us that we want to call save_to.csv (or to.csv in the case of materialize). Then, we would look at the types field, finding that there is a pandas dataframe adapter. Finally, we would look at the params field, finding that we can pass path, and (optionally) sep (which we’d realize defaults to , when looking at the code).

All together, we’d end up with:

import pandas as pd
from hamilton.function_modifiers import value, save_to

@save_to.csv(path=value("my_file.csv"))
def my_data(...) -> pd.DataFrame:
    ...

For a less “abstracted” approach, where you just expose metadata from saving and loading, you can annotated your saving/loading functions to do so, e.g. analogous to the above you could do:

import pandas as pd
from hamilton.function_modifiers import datasaver

 def my_data(...) -> pd.DataFrame:
    # your function
    ...
    return _df  # return some df

@datasaver
def my_data_saver(my_data: pd.DataFrame, path: str) -> dict:
    # code to save my_data
    return {"path": path, "type": "csv", ...} # add other metadata

See dataloader for more information on how to load data and expose metadata via this more lighter weight way.

If you want to extend the @save_to or @load_from decorators, see Using Data Adapters for documentation, and the example in the repository for an example of how to do so.

Note that you will need to call registry.register_adapters (or import a module that does that) prior to dynamically referring to these in the code – otherwise we won’t know about them, and won’t be able to access that key!

Data Loaders¶

key

loader params

types

module

json
pathstr
dict
list
hamilton.io.default_data_loaders
json
filepath_or_bufferUnion
chunksizeOptional=None
compressionUnion=infer
convert_axesOptional=None
convert_datesUnion=True
date_unitOptional=None
dtypeUnion=None
dtype_backendOptional=None
encodingOptional=None
encoding_errorsOptional=strict
enginestr=ujson
keep_default_datesbool=True
linesbool=False
nrowsOptional=None
orientOptional=None
precise_floatbool=False
storage_optionsOptional=None
typstr=frame
DataFrame
hamilton.plugins.pandas_extensions
json
sourceUnion
schemaUnion=None
schema_overridesUnion=None
DataFrame
hamilton.plugins.polars_post_1_0_0_extensions
json
pathUnion
XGBModel
Booster
hamilton.plugins.xgboost_extensions
literal
valueAny
Any
hamilton.io.default_data_loaders
file
pathstr
encodingstr=utf-8
str
hamilton.io.default_data_loaders
file
pathUnion
LGBMModel
Booster
CVBooster
hamilton.plugins.lightgbm_extensions
pickle
pathstr
object
Any
hamilton.io.default_data_loaders
pickle
filepath_or_bufferUnion=None
pathUnion=None
compressionUnion=infer
storage_optionsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
environment
namesTuple
dict
hamilton.io.default_data_loaders
yaml
pathUnion
str
int
float
bool
dict
list
hamilton.plugins.yaml_extensions
npy
pathUnion
mmap_modeOptional=None
allow_pickleOptional=None
fix_importsOptional=None
encodingLiteral=ASCII
ndarray
hamilton.plugins.numpy_extensions
csv
pathUnion
sepOptional=,
delimiterOptional=None
headerUnion=infer
namesOptional=None
index_colUnion=None
usecolsUnion=None
dtypeUnion=None
engineOptional=None
convertersOptional=None
true_valuesOptional=None
false_valuesOptional=None
skipinitialspaceOptional=False
skiprowsUnion=None
skipfooterint=0
nrowsOptional=None
na_valuesUnion=None
keep_default_nabool=True
na_filterbool=True
verbosebool=False
skip_blank_linesbool=True
parse_datesUnion=False
keep_date_colbool=False
date_formatOptional=None
dayfirstbool=False
cache_datesbool=True
iteratorbool=False
chunksizeOptional=None
compressionUnion=infer
thousandsOptional=None
decimalstr=.
lineterminatorOptional=None
quotecharOptional=None
quotingint=0
doublequotebool=True
escapecharOptional=None
commentOptional=None
encodingstr=utf-8
encoding_errorsUnion=strict
dialectUnion=None
on_bad_linesUnion=error
delim_whitespacebool=False
low_memorybool=True
memory_mapbool=False
float_precisionOptional=None
storage_optionsOptional=None
dtype_backendLiteral=numpy_nullable
DataFrame
hamilton.plugins.pandas_extensions
csv
fileUnion
has_headerbool=True
include_headerbool=True
columnsUnion=None
new_columnsSequence=None
separatorstr=,
comment_charstr=None
quote_charstr="
skip_rowsint=0
dtypesUnion=None
null_valuesUnion=None
missing_utf8_is_empty_stringbool=False
ignore_errorsbool=False
try_parse_datesbool=False
n_threadsint=None
infer_schema_lengthint=100
batch_sizeint=8192
n_rowsint=None
encodingUnion=utf8
low_memorybool=False
rechunkbool=True
use_pyarrowbool=False
storage_optionsDict=None
skip_rows_after_headerint=0
row_count_namestr=None
row_count_offsetint=0
sample_sizeint=1024
eol_charstr=
raise_if_emptybool=True
DataFrame
hamilton.plugins.polars_post_1_0_0_extensions
csv
fileUnion
has_headerbool=True
columnsUnion=None
new_columnsSequence=None
separatorstr=,
comment_charstr=None
quote_charstr="
skip_rowsint=0
dtypesUnion=None
null_valuesUnion=None
missing_utf8_is_empty_stringbool=False
ignore_errorsbool=False
try_parse_datesbool=False
n_threadsint=None
infer_schema_lengthint=100
batch_sizeint=8192
n_rowsint=None
encodingUnion=utf8
low_memorybool=False
rechunkbool=True
use_pyarrowbool=False
storage_optionsDict=None
skip_rows_after_headerint=0
row_count_namestr=None
row_count_offsetint=0
eol_charstr=
raise_if_emptybool=True
LazyFrame
hamilton.plugins.polars_lazyframe_extensions
csv
sparkSparkSession
pathstr
headerbool=True
sepstr=,
DataFrame
hamilton.plugins.spark_extensions
parquet
pathUnion
engineLiteral=auto
columnsOptional=None
storage_optionsOptional=None
use_nullable_dtypesbool=False
dtype_backendLiteral=numpy_nullable
filesystemOptional=None
filtersUnion=None
DataFrame
hamilton.plugins.pandas_extensions
parquet
fileUnion
columnsUnion=None
n_rowsint=None
use_pyarrowbool=False
memory_mapbool=True
storage_optionsDict=None
parallelAny=auto
row_count_namestr=None
row_count_offsetint=0
low_memorybool=False
pyarrow_optionsDict=None
use_statisticsbool=True
rechunkbool=True
DataFrame
hamilton.plugins.polars_post_1_0_0_extensions
parquet
fileUnion
columnsUnion=None
n_rowsint=None
use_pyarrowbool=False
memory_mapbool=True
storage_optionsDict=None
parallelAny=auto
row_count_namestr=None
row_count_offsetint=0
low_memorybool=False
use_statisticsbool=True
rechunkbool=True
LazyFrame
hamilton.plugins.polars_lazyframe_extensions
parquet
sparkSparkSession
pathstr
DataFrame
hamilton.plugins.spark_extensions
sql
query_or_tablestr
db_connectionUnion
chunksizeOptional=None
coerce_floatbool=True
columnsOptional=None
dtypeUnion=None
dtype_backendOptional=None
index_colUnion=None
paramsUnion=None
parse_datesUnion=None
DataFrame
hamilton.plugins.pandas_extensions
xml
path_or_bufferUnion
xpathOptional=./*
namespaceOptional=None
elems_onlyOptional=False
attrs_onlyOptional=False
namesOptional=None
dtypeOptional=None
convertersOptional=None
parse_datesUnion=False
encodingOptional=utf-8
parserstr=lxml
stylesheetUnion=None
iterparseOptional=None
compressionUnion=infer
storage_optionsOptional=None
dtype_backendstr=numpy_nullable
DataFrame
hamilton.plugins.pandas_extensions
html
ioUnion
matchOptional=.+
flavorUnion=None
headerUnion=None
index_colUnion=None
skiprowsUnion=None
attrsOptional=None
parse_datesOptional=None
thousandsOptional=,
encodingOptional=None
decimalstr=.
convertersOptional=None
na_valuesIterable=None
keep_default_nabool=True
displayed_onlybool=True
extract_linksOptional=None
dtype_backendLiteral=numpy_nullable
storage_optionsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
stata
filepath_or_bufferUnion
convert_datesbool=True
convert_categoricalsbool=True
index_colOptional=None
convert_missingbool=False
preserve_dtypesbool=True
columnsOptional=None
order_categoricalsbool=True
chunksizeOptional=None
iteratorbool=False
compressionUnion=infer
storage_optionsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
feather
pathUnion
columnsOptional=None
use_threadsbool=True
storage_optionsOptional=None
dtype_backendLiteral=numpy_nullable
DataFrame
hamilton.plugins.pandas_extensions
feather
sourceUnion
columnsUnion=None
n_rowsOptional=None
use_pyarrowbool=False
memory_mapbool=True
storage_optionsOptional=None
row_count_nameOptional=None
row_count_offsetint=0
rechunkbool=True
DataFrame
hamilton.plugins.polars_post_1_0_0_extensions
feather
sourceUnion
columnsUnion=None
n_rowsOptional=None
use_pyarrowbool=False
memory_mapbool=True
storage_optionsOptional=None
row_count_nameOptional=None
row_count_offsetint=0
rechunkbool=True
LazyFrame
hamilton.plugins.polars_lazyframe_extensions
orc
pathUnion
columnsOptional=None
dtype_backendLiteral=numpy_nullable
filesystemUnion=None
DataFrame
hamilton.plugins.pandas_extensions
excel
pathUnion=None
sheet_nameUnion=0
headerUnion=0
namesOptional=None
index_colUnion=None
usecolsUnion=None
dtypeUnion=None
engineOptional=None
convertersUnion=None
true_valuesOptional=None
false_valuesOptional=None
skiprowsUnion=None
nrowsOptional=None
keep_default_nabool=True
na_filterbool=True
verbosebool=False
parse_datesUnion=False
date_formatUnion=None
thousandsOptional=None
decimalstr=.
commentOptional=None
skipfooterint=0
storage_optionsOptional=None
dtype_backendLiteral=numpy_nullable
engine_kwargsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
table
filepath_or_bufferUnion
sepOptional=None
delimiterOptional=None
headerUnion=infer
namesOptional=None
index_colUnion=None
usecolsOptional=None
dtypeUnion=None
engineOptional=None
convertersOptional=None
true_valuesOptional=None
false_valuesOptional=None
skipinitialspacebool=False
skiprowsUnion=None
skipfooterint=0
nrowsOptional=None
na_valuesUnion=None
keep_default_nabool=True
na_filterbool=True
verbosebool=False
skip_blank_linesbool=True
parse_datesUnion=False
infer_datetime_formatbool=False
keep_date_colbool=False
date_parserOptional=None
date_formatOptional=None
dayfirstbool=False
cache_datesbool=True
iteratorbool=False
chunksizeOptional=None
compressionUnion=infer
thousandsOptional=None
decimalstr=.
lineterminatorOptional=None
quotecharOptional="
quotingint=0
doublequotebool=True
escapecharOptional=None
commentOptional=None
encodingOptional=None
encoding_errorsOptional=strict
dialectOptional=None
on_bad_linesUnion=error
delim_whitespacebool=False
low_memorybool=True
memory_mapbool=False
float_precisionOptional=None
storage_optionsOptional=None
dtype_backendLiteral=numpy_nullable
DataFrame
hamilton.plugins.pandas_extensions
fwf
filepath_or_bufferUnion
colspecsUnion=infer
widthsOptional=None
infer_nrowsint=100
dtype_backendLiteral=numpy_nullable
DataFrame
hamilton.plugins.pandas_extensions
spss
pathUnion
usecolsUnion=None
convert_categoricalsbool=True
dtype_backendLiteral=numpy_nullable
DataFrame
hamilton.plugins.pandas_extensions
avro
fileUnion
columnsUnion=None
n_rowsOptional=None
DataFrame
hamilton.plugins.polars_post_1_0_0_extensions
database
querystr
connectionUnion
iter_batchesbool=False
batch_sizeOptional=None
schema_overridesOptional=None
infer_schema_lengthOptional=None
execute_optionsOptional=None
DataFrame
hamilton.plugins.polars_post_1_0_0_extensions
spreadsheet
sourceUnion
sheet_idUnion=None
sheet_nameUnion=None
engineLiteral=xlsx2csv
engine_optionsOptional=None
read_optionsOptional=None
schema_overridesOptional=None
raise_if_emptybool=True
DataFrame
hamilton.plugins.polars_post_1_0_0_extensions
dlt
resourceDltResource
DataFrame
hamilton.plugins.dlt_extensions
mlflow
model_uriOptional=None
modeLiteral=tracking
run_idOptional=None
pathUnion=model
model_nameOptional=None
versionUnion=None
version_aliasOptional=None
flavorUnion=None
mlflow_kwargsDict=None
Any
hamilton.plugins.mlflow_extensions

Data Savers¶

key

saver params

types

module

json
pathstr
dict
list
hamilton.io.default_data_loaders
json
filepath_or_bufferUnion
compressionstr=infer
date_formatstr=epoch
date_unitstr=ms
default_handlerOptional=None
double_precisionint=10
force_asciibool=True
indexOptional=None
indentint=0
linesbool=False
modestr=w
orientOptional=None
storage_optionsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
json
fileUnion
DataFrame
LazyFrame
hamilton.plugins.polars_post_1_0_0_extensions
json
pathUnion
XGBModel
Booster
hamilton.plugins.xgboost_extensions
file
pathstr
encodingstr=utf-8
str
hamilton.io.default_data_loaders
file
pathUnion
bytes
BytesIO
hamilton.io.default_data_loaders
file
pathUnion
num_iterationOptional=None
start_iterationint=0
importance_typeLiteral=split
LGBMModel
Booster
CVBooster
hamilton.plugins.lightgbm_extensions
pickle
pathstr
object
hamilton.io.default_data_loaders
pickle
pathUnion
compressionUnion=infer
protocolint=5
storage_optionsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
memory
Any
hamilton.io.default_data_loaders
yaml
pathUnion
str
int
float
bool
dict
list
hamilton.plugins.yaml_extensions
plt
pathUnion
dpiUnion=None
formatOptional=None
metadataOptional=None
bbox_inchesUnion=None
pad_inchesUnion=None
facecolorUnion=None
edgecolorUnion=None
backendOptional=None
orientationOptional=None
papertypeOptional=None
transparentOptional=None
bbox_extra_artistsOptional=None
pil_kwargsOptional=None
Figure
hamilton.plugins.matplotlib_extensions
npy
pathUnion
allow_pickleOptional=None
fix_importsOptional=None
ndarray
hamilton.plugins.numpy_extensions
csv
pathUnion
sepOptional=,
na_repstr=
float_formatUnion=None
columnsOptional=None
headerUnion=True
indexOptional=False
index_labelUnion=None
modestr=w
encodingOptional=None
compressionUnion=infer
quotingOptional=None
quotecharOptional="
lineterminatorOptional=None
chunksizeOptional=None
date_formatOptional=None
doublequotebool=True
escapecharOptional=None
decimalstr=.
errorsstr=strict
storage_optionsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
csv
fileUnion
include_headerbool=True
separatorstr=,
line_terminatorstr=
quote_charstr="
batch_sizeint=1024
datetime_formatstr=None
date_formatstr=None
time_formatstr=None
float_precisionint=None
null_valuestr=None
quote_styleType=None
DataFrame
LazyFrame
hamilton.plugins.polars_post_1_0_0_extensions
parquet
pathUnion
engineLiteral=auto
compressionOptional=snappy
indexOptional=None
partition_colsOptional=None
storage_optionsOptional=None
extra_kwargsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
parquet
fileUnion
compressionAny=zstd
compression_levelint=None
statisticsbool=False
row_group_sizeint=None
use_pyarrowbool=False
pyarrow_optionsDict=None
DataFrame
LazyFrame
hamilton.plugins.polars_post_1_0_0_extensions
sql
table_namestr
db_connectionAny
chunksizeOptional=None
dtypeUnion=None
if_existsstr=fail
indexbool=True
index_labelUnion=None
methodUnion=None
schemaOptional=None
DataFrame
hamilton.plugins.pandas_extensions
xml
path_or_bufferUnion
indexbool=True
root_namestr=data
row_namestr=row
na_repOptional=None
attr_colsOptional=None
elems_colsOptional=None
namespacesOptional=None
prefixOptional=None
encodingstr=utf-8
xml_declarationbool=True
pretty_printbool=True
parserstr=lxml
stylesheetUnion=None
compressionUnion=infer
storage_optionsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
html
bufUnion=None
columnsOptional=None
col_spaceUnion=None
headerOptional=True
indexOptional=True
na_repOptional=NaN
formattersUnion=None
float_formatOptional=None
sparsifyOptional=True
index_namesOptional=True
justifystr=None
max_rowsOptional=None
max_colsOptional=None
show_dimensionsbool=False
decimalstr=.
bold_rowsbool=True
classesUnion=None
escapeOptional=True
notebookLiteral=False
borderint=None
table_idOptional=None
render_linksbool=False
encodingOptional=utf-8
DataFrame
hamilton.plugins.pandas_extensions
stata
pathUnion=None
convert_datesOptional=None
write_indexbool=True
byteorderOptional=None
time_stampOptional=None
data_labelOptional=None
variable_labelsOptional=None
versionLiteral=114
convert_strlOptional=None
compressionUnion=infer
storage_optionsOptional=None
value_labelsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
feather
pathUnion
destOptional=None
compressionLiteral=None
compression_levelOptional=None
chunksizeOptional=None
versionOptional=2
DataFrame
hamilton.plugins.pandas_extensions
feather
fileUnion=None
compressionType=uncompressed
DataFrame
LazyFrame
hamilton.plugins.polars_post_1_0_0_extensions
orc
pathUnion
engineLiteral=pyarrow
indexOptional=None
engine_kwargsOptional=None
DataFrame
hamilton.plugins.pandas_extensions
excel
pathUnion
sheet_namestr=Sheet1
na_repstr=
float_formatOptional=None
columnsOptional=None
headerUnion=True
indexbool=True
index_labelUnion=None
startrowint=0
startcolint=0
engineOptional=None
merge_cellsbool=True
inf_repstr=inf
freeze_panesOptional=None
storage_optionsOptional=None
engine_kwargsOptional=None
modeOptional=w
if_sheet_existsOptional=None
datetime_formatstr=None
date_formatstr=None
DataFrame
hamilton.plugins.pandas_extensions
avro
fileUnion
compressionAny=uncompressed
DataFrame
LazyFrame
hamilton.plugins.polars_post_1_0_0_extensions
database
table_namestr
connectionUnion
if_table_existsLiteral=fail
engineLiteral=sqlalchemy
DataFrame
LazyFrame
hamilton.plugins.polars_post_1_0_0_extensions
spreadsheet
workbookUnion
worksheetOptional=None
positionUnion=A1
table_styleUnion=None
table_nameOptional=None
column_formatsOptional=None
dtype_formatsOptional=None
conditional_formatsOptional=None
header_formatOptional=None
column_totalsUnion=None
column_widthsUnion=None
row_totalsUnion=None
row_heightsUnion=None
sparklinesOptional=None
formulasOptional=None
float_precisionint=3
include_headerbool=True
autofilterbool=True
autofitbool=False
hidden_columnsUnion=None
hide_gridlinesbool=None
sheet_zoomOptional=None
freeze_panesUnion=None
DataFrame
LazyFrame
hamilton.plugins.polars_post_1_0_0_extensions
png
pathUnion
dpifloat=200
formatstr=png
metadataOptional=None
bbox_inchesstr=None
pad_inchesfloat=0.1
backendOptional=None
papertypestr=None
transparentbool=None
bbox_extra_artistsOptional=None
pil_kwargsOptional=None
ConfusionMatrixDisplay
DetCurveDisplay
PrecisionRecallDisplay
PredictionErrorDisplay
RocCurveDisplay
DecisionBoundaryDisplay
LearningCurveDisplay
PartialDependenceDisplay
ValidationCurveDisplay
Figure
hamilton.plugins.sklearn_plot_extensions
dlt
pipelinePipeline
table_namestr
primary_keyOptional=None
write_dispositionOptional=None
columnsOptional=None
schemaOptional=None
loader_file_formatOptional=None
Iterable
DataFrame
Table
RecordBatch
hamilton.plugins.dlt_extensions
mlflow
pathUnion=model
register_asOptional=None
flavorUnion=None
run_idOptional=None
mlflow_kwargsDict=None
Any
hamilton.plugins.mlflow_extensions