Data versioning¶
This module contains hashing functions for Python objects. It uses functools.singledispatch to allow specialized implementations based on type. Singledispatch automatically applies the most specific implementation
This module houses implementations for the Python standard library. Supporting all types is considerable endeavor, so we’ll add support as types are requested by users.
Otherwise, 3rd party types can be supported via the h_databackends module. This registers abstract types that can be checked without having to import the 3rd party library. For instance, there are implementations for pandas.DataFrame and polars.DataFrame despite these libraries not being imported here.
IMPORTANT all container types that make a recursive call to hash_value or a specific implementation should pass the depth parameter to prevent RecursionError.
- hamilton.caching.fingerprinting.hash_bytes(obj, *args, **kwargs) str ¶
Convert the primitive to a string and hash it
Primitive type returns a hash and doesn’t have to handle depth.
- hamilton.caching.fingerprinting.hash_mapping(obj, *, ignore_order: bool = True, depth: int = 0, **kwargs) str ¶
Hash each key then its value.
The mapping is always sorted first because order shouldn’t matter in a mapping.
NOTE Since Python 3.7, dictionary store insertion order. However, this function assumes that they key order doesn’t matter to uniquely identify the dictionary.
foo = {"key": 3, "key2": 13} bar = {"key2": 13, "key": 3} hash_mapping(foo) == hash_mapping(bar)
- hamilton.caching.fingerprinting.hash_none(obj, *args, **kwargs) str ¶
Hash for None is <none>
Primitive type returns a hash and doesn’t have to handle depth.
- hamilton.caching.fingerprinting.hash_numpy_array(obj, *args, depth: int = 0, **kwargs) str ¶
Get the bytes representation of the array raw data and hash it.
Might not be ideal because different higher-level numpy objects could have the same underlying array representation (e.g., masked arrays). Unsure, but it’s an area to investigate.
- hamilton.caching.fingerprinting.hash_pandas_obj(obj, *args, depth: int = 0, **kwargs) str ¶
Convert a pandas dataframe, series, or index to a dictionary of {index: row_hash} then hash it.
Given the hashing for mappings, the physical ordering or rows doesn’t matter. For example, if the index is a date, the hash will represent the {date: row_hash}, and won’t preserve how dates were ordered in the DataFrame.
- hamilton.caching.fingerprinting.hash_polars_column(obj, *args, depth: int = 0, **kwargs) str ¶
Promote the single Series to a dataframe and hash it
- hamilton.caching.fingerprinting.hash_polars_dataframe(obj, *args, depth: int = 0, **kwargs) str ¶
Convert a polars dataframe, series, or index to a list of hashes then hash it.
- hamilton.caching.fingerprinting.hash_primitive(obj, *args, **kwargs) str ¶
Convert the primitive to a string and hash it
Primitive type returns a hash and doesn’t have to handle depth.
- hamilton.caching.fingerprinting.hash_repr(obj, *args, **kwargs) str ¶
Use the built-in repr() to get a string representation of the object and hash it.
While .__repr__() might not be implemented for all classes, the function repr() will handle it, along with exceptions, to always return a value.
Primitive type returns a hash and doesn’t have to handle depth.
- hamilton.caching.fingerprinting.hash_sequence(obj, *args, depth: int = 0, **kwargs) str ¶
Hash each object of the sequence.
Orders matters for the hash since orders matters in a sequence.
- hamilton.caching.fingerprinting.hash_set(obj, *args, depth: int = 0, **kwargs) str ¶
Hash each element of the set, then sort hashes, and create a hash of hashes.
For the same objects in the set, the hashes will be the same.
- hamilton.caching.fingerprinting.hash_unordered_mapping(obj, *args, depth: int = 0, **kwargs) str ¶
When hashing an unordered mapping, the two following dict have the same hash.
foo = {"key": 3, "key2": 13} bar = {"key2": 13, "key": 3} hash_mapping(foo) == hash_mapping(bar)
- hamilton.caching.fingerprinting.hash_value(obj, *args, depth=0, **kwargs) str ¶
- hamilton.caching.fingerprinting.hash_value(obj: None, *args, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: bool, *args, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: float, *args, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: int, *args, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: str, *args, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: bytes, *args, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: Sequence, *args, depth: int = 0, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: Mapping, *, ignore_order: bool = True, depth: int = 0, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: Set, *args, depth: int = 0, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: AbstractPandasColumn, *args, depth: int = 0, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: AbstractPandasDataFrame, *args, depth: int = 0, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: AbstractPolarsDataFrame, *args, depth: int = 0, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: AbstractPolarsColumn, *args, depth: int = 0, **kwargs) str
- hamilton.caching.fingerprinting.hash_value(obj: AbstractNumpyArray, *args, depth: int = 0, **kwargs) str
Fingerprinting strategy that computes a hash of the full Python object.
The default case hashes the __dict__ attribute of the object (recursive).
- hamilton.caching.fingerprinting.set_max_depth(depth: int) None ¶
Set the maximum recursion depth for fingerprinting non-supported types.
- Parameters:
depth – The maximum depth for fingerprinting.