Graphs

Search

Search IconIcon to open search

Data catalogs

Last updated Aug 7, 2023

# Metadata

See Metadata.

# Data catalog rise

While data catalog is not something new in general it’s been on rise in recent years. Notable mentions:

Which shows the need and the complexity which companies facing when dealing with data in modern world.

# Data catalog and co

There are a lot of software/methodologies which do some sort of metadata cataloging. For example:

Data catalogs provide metadata management capabilities:

But often do more than that

Which shows deep roots of “data catalog” approach in RDBMs and data warehouses in general. On the other hand data lineage, browsing and quality doesn’t make sense in the context of data serialisation schemas.

# New tool

While there are a lot of open source tools, they all seems to be DB centric. I see great potential in general metadata tool:

# Uber’s metadata infrastructure

It seems that Uber’s metadata infrastructure is quite close to what I described above. They had two tools in core: Databook and Dragon. Both of them are close-source, but their successors are open-source

flowchart LR u[Uber metadata infrastructure] u --> Dargon --> Hydra u --> Databook --> OpenMetadata -.- JSONSchema

Unfortunately they are disconnected. OpenMetadata uses JSONSchema for schema description and not Dragon/Hydra.

# Dragon

A little algebra goes a long way

Dragon is based on Algebraic Property Graphs (mathematical concept from the same authors).

# Hydra

Hydra is a transformation toolkit along the lines of Dragon, but open source, and with a more advanced type system and other new features.