Cloud Dataproc

Cloud Dataproc is a fully managed Hadoop and Spark cluster that can be spun within a few seconds. Cloud Dataproc is an auto scaling cluster and can be used to run Hadoop, Spark, and AI and ML applications very effectively. At peak hours, nodes can be added to the cluster based on usage, and it can scale down when there are lower requirements.

Dataproc is integrated with other services such as Cloud Storage, BigQuery, Stackdriver, identity and access management, and networking. This makes the cluster's usage very easy and secure.

Beneath a Dataproc cluster Google actually runs compute instances. Users can choose from a wide range of machine configurations to build the cluster or if existing machine configurations are not sufficing the needs, users can build a cluster with a custom machine configuration as well. One very important thing to note here is the use of preemptive instances with the Dataproc cluster. This can work wonders with the pricing of the cluster. Preemptive instances come at much lower prices, approximately at 20% of the actual instance with the same configuration, with the catch that Google can take the instance back with notification of 30 seconds.

With a Dataproc cluster, preemptive instances can be used as data nodes because generally a Dataproc cluster will be used for compute purpose and all data will be saved in Cloud Storage. So in this case, even if a preemptive instance goes down, that job will be shifted to another node and there will be no impact. Cloud Dataproc cluster pricing varies with instances, but it has very competitive pricing. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.79.121