JDBC

This plugin is used to import data from a database to Logstash. Each row in the results set would become an event, and each column would get converted into fields in the event. Using this plugin, you can import all the data at once by running a query, or you can periodically schedule the import using cron syntax (using the schedule parameter/setting). When using this plugin, the user would need to specify the path of the JDBC drivers that's appropriate to the database. The driver library can be specified using the jdbc_driver_library parameter.

The SQL query can be specified using the statement parameter or can be stored in a file; the path of the file can be specified using the statement_filepath parameter. You can use either statement or statement_filepath for specifying the query. It is good practice to store the bigger queries in a file. This plugin accepts only one SQL statement since multiple SQL statements aren't supported. If the user needs to execute multiple queries to ingest data from multiple tables/views, then they need to define multiple JDBC inputs (that is, one JDBC input for one query) in the input section of Logstash configuration. 

The results set size can be specified by using the jdbc_fetch_size parameter. The plugin will persist the sql_last_value parameter in the form of a metadata file stored in the configured last_run_metadata_path parameter. Upon query execution, this file will be updated with the current value of sql_last_value. The sql_last_value value is used to incrementally import data from the database every time the query is run based on the schedule set. Parameters to the SQL statement can be specified using the parameters setting, which accepts a hash of the query parameter.

Let's look at an example:

#jdbc.conf
input {
jdbc {
# path of the
jdbc driver
jdbc_driver_library => "/path/to/mysql-connector-java-5.1.36-bin.jar"

# The name of the driver class

jdbc_driver_class => "com.mysql.jdbc.Driver"

# Mysql jdbc connection string to company database
jdbc_connection_string => "jdbc:mysql://localhost:3306/company"

# user credentials to connect to the DB

jdbc_user => "user"
jdbc_password => "password"

# when to periodically run statement, cron format (ex: every 30 minutes)
schedule => "30 * * * *"

# query parameters
parameters => { "department" => "IT" }

# sql statement
statement => "SELECT * FROM employees WHERE department= :department AND
created_at >= :sql_last_value"
}
}

output {
elasticsearch {
index => "company"
document_type => "employee"
hosts => "localhost:9200"
}
}

The previous configuration is used to connect to the company schema belonging to MySQLdb and is used to pull employee records from the IT department. The SQL statement is run every 30 minutes to check for new employees that have been created since the last run. The fetched rows are sent to Elasticsearch and configured as the output.

sql_last_value is set to Thursday, January 1, 1970 by default before the execution of the query, and is updated with the timestamp every time the query is run. You can force it to store a column value other than the last execution time, by setting the use_column_value parameter to true and specifying the column name to be used using the tracking_column parameter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.195.97