appendix. Apache TinkerPop installation and overview

For the examples in this book, we use graph databases and tools from the Apache Software Foundation’s TinkerPop project (http://tinkerpop.apache.org/). The project’s software is properly called Apache TinkerPop or simply TinkerPop. This appendix delivers an overview of the TinkerPop project and explains how to install and configure the features needed to run the code examples in this book.

A.1 Overview

TinkerPop is a top-level Apache Foundation project, which offers an open source and vendor-agnostic graph computing framework with both transactional (OLTP) and analytical (OLAP) capabilities. In addition to the core libraries included as part of the project, there are a wide array of third-party libraries that are part of the TinkerPop ecosystem.

TinkerPop provides a standardized interface that is currently implemented by more than 20 separate database engines. This includes DBaaS (DataBase-as-a-Service) products (such as Amazon Neptune and Azure ComosDB), commercial offerings (such as DataStax Enterprise Graph and Neo4j), and open source software (such as TinkerGraph and JanusGraph).

Note A TinkerPop-enabled graph database is a database that implements at least the minimum APIs required to perform traversals via the Gremlin query language.

The TinkerPop project is made up of multiple different pieces. We have included the ones in this overview that we use throughout this book.

A.1.1 Gremlin traversal language

The Gremlin traversal language is the graph query language of the TinkerPop project and is the query language we use for the examples in this book. Gremlin supports both imperative and declarative syntaxes, but the imperative syntax is the predominant approach.

Gremlin allows for both query and mutation operations on data through the use of a series of steps that are chained together, similar to the way a functional language chains methods. This ability to chain operations enables the construction of complex traversals through our graphs. It is often useful to think of a Gremlin traversal in terms of a stream processor: data enters from the previous step, an operation is performed on it, and data is transmitted on to the next step.

A.1.2 TinkerGraph

TinkerGraph is an in-memory graph engine that supports both OLTP and OLAP workloads and is part of the TinkerPop Gremlin Server and Gremlin Console. TinkerGraph is built as a reference implementation of the TinkerPop API. It is a full-featured, open source implementation of TinkerPop. TinkerGraph is the core graph engine used in the various tools and software provided as part of TinkerPop.

Note that TinkerGraph isn’t a piece of software that you download. It is the core engine that is used by the downloadable software such as Gremlin Server and Gremlin Console. Other vendors may choose to include it in their implementations.

A.1.3 Gremlin Console

The Gremlin Console is an interactive terminal application used with TinkerPop-enabled graph databases. The Gremlin Console enables users to connect to local or remote databases, load data into a graph, and interactively traverse around the graph. It can be used either as a standalone application with its own in-memory graph data or as a client to a graph database server. We use the Gremlin Console as a client throughout this book for our interactions with a separately running Gremlin Server.

A.1.4 Gremlin Language Variants (GLVs)

Gremlin Language Variants (GLVs) are like language-specific drivers that allow developers to use Gremlin as a query language, but to do so with the vernacular and idioms of their preferred development language. GLVs are exceptionally powerful and go well beyond our common understanding of database drivers.

When using a GLV for your language, be it Java, Python, C#, or JavaScript, you are using that language’s tools and syntax. GLVs encourage writing Gremlin traversals in the style of the application’s programming language: a Java developer uses Java syntax, a .NET developer uses .NET syntax, and so forth. In this book, we use the Gremlin-Java variant.

A.1.5 Gremlin Server

The Gremlin Server facilitates remote execution of graph commands against graph data. The Gremlin Server also allows non-JVM clients to communicate with JVM-based graph databases and provides a mechanism to communicate with databases hosted on separate machines. In this book, we use the Gremlin Server to host our graph data in a client-server architecture.

A.1.6 Documentation

The Apache TinkerPop website (http://tinkerpop.apache.org/) has a complete set of documentation including tutorials, getting started examples, and Gremlin recipes. Although we discuss some Gremlin concepts and syntax within this book, this book is not intended to serve as a replacement for the TinkerPop documentation. We strongly recommend that you take time to familiarize yourself with the available resources on the site if you choose to use a TinkerPop-enabled database.

A.2 Installation

The first step in installing the TinkerPop framework is to download the reference tools from the Apache TinkerPop site: http://tinkerpop.apache.org/downloads.html. The most recent version at the time of publication is 3.4.6, but any TinkerPop 3.4 implementation should work with the examples. For this book, you need to download and install both the Gremlin Console and the Gremlin Server.

Note This book utilizes the MacOS syntax for all examples, but we provide the Windows syntax for the same options as well.

A.2.1 Installing and verifying the Java Runtime

The prerequisite for running the Gremlin Console is Java version 8. If you do not have Java installed, you should download and install the latest Java Development Kit (JDK) from Oracle (http://mng.bz/ZrPj), OpenJDK (https://openjdk.java.net/), or your preferred Java distribution. To verify that Java is installed and its version number, use the command java -version like this:

$ java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)

This indicates that this machine is running Java version 1.8.0.222. From the response, we know that Java is properly configured and ready to use.

A.2.2 Installing Gremlin Console

Now that we have all the prerequisites installed and verified, the next step is to install and run the Gremlin Console:

  1. From the TinkerPop downloads page (http://tinkerpop.apache.org/downloads), click the button for Gremlin Console.

  2. We are now on a page that lists the mirrors of the sites to download from. Select a mirror and click the link to download it.

  3. Once the download completes, unzip the code using either a command-line tool or a GUI editor to a directory that we refer to as GREMLIN_CONSOLE_HOME.

  4. Open a command-line terminal.

  5. Navigate to the GREMLIN_CONSOLE_HOME directory.

  6. Start the Gremlin Console:

    1. For MacOS or Linux, type bin/gremlin.sh.

    2. For Windows, type bingremlin.bat.

  7. Once the Gremlin Console starts, you will see it move through a loading process where any configured plugins are activated. Once the plugins are activated, you get an input dialog as shown here:

    $ bin/gremlin.sh                         
     
             ,,,/
             (o o)
    -----oOOo-(3)-oOOo-----
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities
    plugin activated: tinkerpop.tinkergraph
    gremlin>                                 

    Starts the Gremlin Console

    The Gremlin Console command prompt ready to accept input

A.2.3 Installing Gremlin Server

Now that we have installed and can run the Gremlin Console, it is time for us to install and run the Gremlin Server:

Note The Gremlin Server uses TCP port 8182. You may need to adjust your OS or local firewall settings in order to permit access on this port.

  1. From the TinkerPop downloads page (http://tinkerpop.apache.org/downloads), click the button for Gremlin Server.

  2. We are now on a page that lists the mirrors of the sites to download from. Select a mirror and click the link to download it.

  3. Once the download completes, unzip the code using either a command-line tool or a GUI editor to a directory that we refer to as GREMLIN_SERVER_HOME.

  4. Open a command-line terminal.

  5. Navigate to the GREMLIN_ SERVER _HOME directory.

  6. Start the Gremlin Server:

    1. For MacOS or Linux, type bin/gremlin-server.sh start.

    2. For Windows, type bingremlin-server.bat start.

  7. You will get a message saying that the server has started, along with the process ID. The process ID is different each time you start the server. For example

    $ bin/gremlin-server.sh start
    Server started 56799.

A.2.4 Configuring the Gremlin Console to connect to the Gremlin Server

With both the Gremlin Server and the Gremlin Console running, it is time to connect the Gremlin Console to our Gremlin Server instance:

Note If you have any Gremlin Console instances running, close these with the console commands :q or :exit.

  1. Open a command-line terminal.

  2. From the GREMLIN_CONSOLE_HOME directory, navigate to the conf directory.

  3. In a text editor, open the remote.yaml file. This file contains three parameters that you might need to adjust. If you are running everything locally, then you will not need to change any of these parameters:

    • hosts: [localhost]--This parameter is the IP or domain name of the Gremlin Server where we want to connect.

    • port: 8182--This parameter is the port to connect to; it defaults to 8182.

    • serializer: { className: org.apache.tinkerpop.gremlin .driver.ser .GryoMessageSerializerV3d0, config: { serialize-ResultToString: true }}--This parameter is the data interchange format between the Gremlin Console and the Gremlin Server. Depending on the production database you choose, you may need to adjust it to the format provided by that database vendors documentation.

  4. Save the file and close it.

  5. From the GREMLIN_CONSOLE_HOME directory, start the Gremlin Console with the following commands:

    1. For MacOS/Linux, type bingremlin.sh.

    2. For Windows, type bin/gremlin.bat.

  6. Once the Gremlin Console starts, execute the following command:

    :remote connect tinkerpop.server conf/remote.yaml

    This command uses the parameters that we just defined to connect to the Gremlin Server instance we have running.

  7. A message is returned confirming that you are connected:

             ,,,/
             (o o)
    -----oOOo-(3)-oOOo-----
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.utilities
    plugin activated: tinkerpop.tinkergraph
    :remote connect tinkerpop.server 
     conf/remote.yaml                          
    ==>Configured localhost/127.0.0.1:8182       
    gremlin>

    Gremlin Console command connects to a Gremlin Server

    Response confirms that the connection is configured

  8. Next, run the command to switch from local mode to server mode:

    :remote console

    The Gremlin Console informs you that it has switched modes:

    gremlin> :remote console
    ==>All scripts will now be sent to Gremlin Server - 
     [localhost/127.0.0.1:8182] - 
     type ':remote console' to return to local mode
    gremlin>
  9. Run the following command to display some basic information about the graph database hosted on the Gremlin Server:

    gremlin> g
    ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]

We have now successfully connected to our Gremlin Server via the Gremlin Console. To exit the session with the Gremlin Server and close the connection, execute the following command:

:remote close

A.2.5 Gremlin Console command modes: Local versus remote

When issuing commands to a remote graph database server, you can choose either of these two modes: local mode and remote mode. The preferred method of sending commands to the Gremlin Server is to put the Gremlin Console into remote mode. This is what we did in the previous section, and it is becoming the default mode when using Gremlin Console connected to a server. Remote mode means that any commands executed in Gremlin Console will be sent to the Gremlin Server, run there, and the results will then be displayed by the Gremlin Console.

If you are going to only issue one or two commands, you can do this with local mode by prefacing each command with :>. This sends the command to the configured remote connection. Only the commands prefaced by these two characters (:>) will be executed on the Gremlin Server. Any commands that have not been prefaced with these characters run within Gremlin Console’s own process. To switch between the two modes, use the :remote console command like this:

gremlin> :remote console 
==>All scripts will now be sent to Gremlin Server - 
 [localhost/127.0.0.1:8182] - type ':remote console' 
 to return to local mode
gremlin> :remote console
==>All scripts will now be evaluated locally - 
 type ':remote console' to return to remote mode 
 for Gremlin Server - [localhost/127.0.0.1:8182]
gremlin>

A.2.6 Using the Gremlin Console

Before we fire up the Gremlin Console, there are a few additional options we should discuss. If you want to see a list of the options available for the Gremlin Console, type the following:

$ bin/gremlin.sh --help 
Usage: gremlin.sh [-CDhlQvV] [-e=<SCRIPT ARG1 ARG2 ...>]... 
  [-i=<SCRIPT ARG1 ARG2 ...>...]...
  -C, --color     Disable use of ANSI colors
  -D, --debug     Enabled debug Console output
  -e, --execute=<SCRIPT ARG1 ARG2 ...>

                and close the console on completion
  -h, --help      Display this help message
  -i, --interactive=<SCRIPT ARG1 ARG2 ...>...
                  Execute the specified script and leave the console 
                open on completion
  -l              Set the logging level of components that use 
                standard logging output independent of the Console
  -Q, --quiet     Suppress superfluous Console output
  -v, --version   Display the version
  -V, --verbose   Enable verbose Console output

As depicted, there are a variety of options to use, but the most common one we use (-i) loads a script while starting the Gremlin Console. This is handy for configuring the Gremlin Console, loading data, and then leaving the Gremlin Console up and running, waiting for further input. All of the scripts provided in the book’s GitHub repository (https://github.com/bechbd/graph-databases-in-action) do the following:

  • Configure a remote connection to a Gremlin Server on localhost

  • Set the Gremlin Console in remote mode

  • Load the data, either with scripted operations or from a GraphSON import file

What follows is an example of running a simple data load script:

$ bin/gremlin.sh -i $BASE_DIR/path/to/
 data-load-script.groovy                                         
 
         ,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> g                                                         
==>graphtraversalsource[tinkergraph[vertices:4 edges:5], standard]
gremlin>                                                           

Starts the Gremlin Console in interactive mode with a data load script

Uses the built-in g variable to quickly verify data is loaded in the graph

The Gremlin Console prompt waiting for input

The Gremlin Console is what is known as a REPL (Read Evaluate Print Loop) terminal. This means that the commands we type are immediately executed, and the results of that evaluation are printed to the screen. Because the Gremlin Console runs on Groovy, you can execute standard Groovy code like an addition computation inside the Gremlin Console. For example

gremlin> a = 1
==>1
gremlin> b = 2
==>2
gremlin> a + b
==>3

The ability to run Groovy code allows you to perform complex queries on your graphs and to save the results of those queries to variables that you can use later for additional calculations. This ability to write code is also extremely helpful when debugging your graph traversal code.

Inside the Gremlin Console, there are several available commands, all of which begin with a colon (:). To see a listing of the available commands, type :help and press Enter. The most common commands we use are :exit, :quit, :x, or :q, which are all functionally identical and exit the Gremlin Console:

gremlin> :help
 
For information about Groovy, visit:
    http://groovy-lang.org 
 
Available commands:
  :help       (:h  ) Display this help message
  ?           (:?  ) Alias to: :help
  :exit       (:x  ) Exit the shell
  :quit       (:q  ) Alias to: :exit
  import      (:i  ) Import a class into the namespace
  :display    (:d  ) Display the current buffer
  :clear      (:c  ) Clear the buffer and reset the prompt counter
  :show       (:S  ) Show variables, classes or imports
  :inspect    (:n  ) Inspect a variable or the last result with the 
                       GUI object browser
  :purge      (:p  ) Purge variables, classes, imports or preferences
  :edit       (:e  ) Edit the current buffer
  :load       (:l  ) Load a file or URL into the buffer
  .           (:.  ) Alias to: :load
  :save       (:s  ) Save the current buffer to a file
  :record     (:r  ) Record the current session to a file
  :history    (:H  ) Display, manage and recall edit-line history
  :alias      (:a  ) Create an alias
  :grab       (:g  ) Add a dependency to the shell environment
  :register   (:rc ) Register a new command with the shell
  :doc        (:D  ) Open a browser window displaying the doc for the
                       argument
  :set        (:=  ) Set (or list) preferences
  :uninstall  (:-  ) Uninstall a Maven library and its dependencies from 
                       the Gremlin Console
  :install    (:+  ) Install a Maven library and its dependencies into 
                       the Gremlin Console
  :plugin     (:pin) Manage plugins for the Console
  :remote     (:rem) Define a remote connection
  :submit     (:>  ) Send a Gremlin script to Gremlin Server
  :bytecode   (:bc ) Gremlin bytecode helper commands
 
For help on a specific command type:
    :help command 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.154.171