Time for action – examining the output data with Java

To show that the data is accessible from multiple languages, let's also display the job output using Java.

  1. Create the following as OutputRead.java:
    import java.io.File;
    import java.io.IOException;
    
    import org.apache.avro.file.DataFileReader;
    import org.apache.avro.generic.GenericData;
    import org.apache.avro. generic.GenericDatumReader;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.avro.io.DatumReader;
    
    public class OutputRead
    {
        public static void main(String[] args) throws IOException
        {
            String filename = args[0] ;
    
            File file=new File(filename) ;
    DatumReader<GenericRecord> reader= new 
    GenericDatumReader<GenericRecord>();
    DataFileReader<GenericRecord>dataFileReader=new 
    DataFileReader<GenericRecord>(file,reader);
    
            while (dataFileReader.hasNext())
            {
    GenericRecord result=dataFileReader.next();
                String output = String.format("%s %d",
    result.get("shape"), result.get("count")) ;
    System.out.println(output) ;
            }
        }
    }
  2. Compile and run the program:
    $ javacOutputResult.java
    $ java OutputResultresult.avro
    blur 1
    cylinder 1
    diamond 2
    formation 1
    light 3
    saucer 1
    

What just happened?

We added this example to show the Avro data being read by more than one language. The code is very similar to the earlier InputRead class; the only difference is that the named fields are used to display each datum as it is read from the datafile.

Have a go hero – graphs in Avro

As previously mentioned, we worked hard to reduce representation-related complexity in our GraphPath class. But with mappings to and from flat lines of text and objects, there was an overhead in managing these transformations.

With its support for nested complex types, Avro can natively support a representation of a node that is much closer to the runtime object. Modify the GraphPath class job to read and write the graph representation to an Avro datafile comprising of datums for each node. The following example schema may be a good starting point, but feel free to enhance it:

{ "type": "record",
  "name": "Graph_representation",
  "fields" : [
{"name": "node_id", "type": "int"},
    {"name": "neighbors", "type": "array", "items:"int" },
    {"name": "distance", "type": "int"},
  {"name": "status", "type": "enum", 
"symbols": ["PENDING", "CURRENT", "DONE"
},]
] 
}

Going forward with Avro

There are many features of Avro we did not cover in this case study. We focused only on its value as an at-rest data representation. It can also be used within a remote procedure call (RPC) framework and can optionally be used as the default RPC format in Hadoop 2.0. We didn't use Avro's code generation facilities that produce a much more domain-focused API. Nor did we cover issues such as Avro's ability to support schema evolution that, for example, allows new fields to be added to recent records without invalidating old datums or breaking existing clients. It's a technology you are very likely to see more of in the future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.134.17