Avro and schema evolution#
Sasquatch uses the Avro format. An advantage of Avro is that it has a schema that comes with the data and supports schema evolution.
Sasquatch uses the Confluent Schema Registry to ensure schemas can evolve safely. In Sasquatch, schema changes must be forward-compatible so that consumers of Sasquatch won’t break. That includes Kafka consumers, InfluxDB queries, and even Chronograf dashboards.
Forward compatibility means that data produced with a new schema can be read by consumers using the previous schema. An example of a forward-compatible schema change is adding a new field. Removing or renaming an existing field are non forward-compatible schema changes.
Read more about forward compatibility in the Confluent Schema Registry documentation.
For example, assume the skyFluxMetric
metric with the following payload:
{
"timestamp": 1681248783000000,
"band": "y",
"instrument": "LSSTCam-imSim",
"meanSky": -213.75839364883444,
"stdevSky": 2328.906118708811,
}
Suppose there’s a dashboard in Chronograf with a chart that displays a time series of meanSky
and stdevSky
values grouped by band
.
Thus the timestamp
, band
, meanSky
and stdevSky
fields are required in the metric record for the dashboard to work.
The following Avro schema will ensure these fields are always present:
{
"namespace": "lsst.example",
"type": "record",
"name": "skyFluxMetric",
"fields": [
{
"name": "timestamp",
"type": "long"
},
{
"name": "band",
"type": "string"
},
{
"name": "instrument",
"type": "string",
},
{
"name": "meanSky",
"type": "float"
},
{
"name": "stdevSky",
"type": "float"
}
]
}
Suppose you want to add a table linked to the previous chart in the dashboard to display the visit ID associated with this metric.
Adding the visit
field to the schema is a forward-compatible change, so that’s allowed:
{
"namespace": "lsst.example",
"type": "record",
"name": "skyFluxMetric",
"fields": [
{
"name": "timestamp",
"type": "long"
},
{
"name": "band",
"type": "string"
},
{
"name": "instrument",
"type": "string",
},
{
"name": "visit",
"type": "int"
},
{
"name": "meanSky",
"type": "float"
},
{
"name": "stdevSky",
"type": "float"
}
]
}
New messages sent to Sasquatch now require the visit
field and a new version of the dashboard that uses the visit
information can be implemented.
Because this is a forward-compatible schema change, previous dashboard versions won’t break since they don’t use the visit
field.
In Sasquatch, a metric (or a telemetry topic) corresponds to a Kafka topic.
The metric namespace is specified in the Avro schema, and the metric full qualified name in this example is lsst.example.skyFluxMetric
.
Read more about Avro schemas and types in the Avro specification.