Avro in a nutshell

Apache Avro is a binary serialization format. The format is schema-based so, it depends on the definition of schemas in JSON format. These schemas define which fields are mandatory and their types. Avro also supports arrays, enums, and nested fields.

One major advantage of Avro is that it supports schema evolution. In this way, we can have several historical versions of the schema.

Normally, the system must adapt to the changing needs of the business. For this reason, we can add or remove fields from our entities, and even change the data types. To support forward or backward compatibility, we must consider which fields are indicated as optional.

Because Avro converts the data into arrays of bytes (serialization), and Kafka's messages are also sent in binary data format, with Apache Kafka, we can send messages in Avro format. The real question is, where do we store the schemas for Apache Avro to work?

Recall that one of the main functions of an enterprise service bus is the format validation of the messages it processes, and what better if it has a historical record of these formats?

The Kafka Schema Registry is the module responsible for performing important functions. The first is to validate that the messages are in the appropriate format, the second is to have a repository of these schemas, and the third is to have a historical version format of these schemas.

The Schema Registry is a server that runs in the same place as our Kafka brokers. It runs and stores the schemas, including the schema versions. When messages are sent to Kafka in Avro format, the messages contain an identifier of a schema stored in the Schema Registry.

There is a library that allows for message serialization and deserialization in Avro format. This library works transparently and naturally with the Schema Registry.

When a message is sent in Avro format, the serializer ensures that the schema is registered and obtains the schema ID. If we send an Avro message that is not in the Schema Registry, the current version of the schema is registered automatically in the Registry. If you do not want the Schema Registry to behave in this way, you can disable it by setting the auto.register.schemas flag to false.

When a message is received in Avro format, the deserializer tries to find the schema ID in the Registry and fetch the schema to deserialize the message in Avro format.

Both the Schema Registry and the library for the serialization and deserialization of messages in Avro format are under the Confluent Platform. It is important to mention that when you need to use the Schema Registry, you must use the Confluent Platform.

It is also important to mention that with the Schema Registry, the Confluent library should be used for serialization in Avro format, as the Apache Avro library doesn't work.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.201.95