7.4. GridFTP

The Globus Toolkit 2.2 uses an efficient and robust protocol for data movement. This protocol should be used whenever large files are involved instead of the http and https protocols that can also be used with the GASS subsystem.

The Globus Toolkit 2.2 provides a GridFTP server based on wu-ftpd code and a C API that can be used by applications to use GridFTP functionality. This GridFTP server does not implement all the features of the GridFTP protocol. It works only as a non-striped server even if it can inter-operate with other striped servers.

All Globus Toolkit 2.2 shell commands can transparently use the GridFTP protocol whenever the URL used for a file begins with gsiftp://.

7.4.1. GridFTP examples

The following example copies the jndi file located on m0.itso-maya.com to the host g2.itso-guarani.com. Note that this command can be issued on a third machine, such as t2.itso-tupi.com.

globus-url-copy gsiftp://m0/~/jndi-1_2_1.zip gsiftp://g2/~/jndi-1_2_1.zip

The following example executes on g2.itso-guarini.com a binary that is retrieved from g1.itso-guarani.com. This command could be issued from t3.itso-tupi.com.

globus-job-run g2 gsiftp://g1/bin/hostname

A grid-enabled application needs to use the GridFTP API to be able to transparently use Globus Toolkit 2 data grid features. This API is detailed in Globus GridFTP APIs.

7.4.2. Globus GridFTP APIs

This section discusses the APIs that can be used with GridFTP.

Skeletons for C/C++ applications

globus_module_activate(GLOBUS_FTP_CLIENT_MODULE) must be called at the beginning of the program to activate the globus_ftp_client module.

Within the globus_ftp_client API, all FTP operations require a handle parameter. Only one FTP operation may be in progress at once per FTP handle. The type of this handle is globus_ftp_client_handle_t, and must be initialized using globus_ftp_client_handle_init().

The properties of the FTP connection can be configured using another handle of type globus_ftp_client_handleattr_t that also must be initialized by using globus_ftp_client_handleattr_init().

By using these two handles, a client can easily execute all of the usual FTP commands:

  • globus_ftp_client_put(), globus_ftp_client_get(), globus_ftp_client_mkdir(), globus_ftp_client_rmdir(), globus_ftp_client_list(), globus_ftp_client_delete(), globus_ftp_client_verbose_list(), globus_ftp_client_move().

  • globus_ftp_client_exists() tests the existence of a file of a directory.

  • globus_ftp_client_modification_time() returns the modification time of a file.

  • globus_ftp_client_size() returns the size of the file.

The globus_ftp_client*get() functions only start a get file transfer from an FTP server. If this function returns GLOBUS_SUCCESS, then the user may immediately begin calling globus_ftp_client_read() to retrieve the data associated with this URL.

Similarly, the globus_ftp_client*put() functions only start a put file transfer from an FTP server. If this function returns GLOBUS_SUCCESS, then the user may immediately begin calling globus_ftp_client_write() to write the data associated with this URL.

Example 7-22. First example extracted from the Globus tutorial
/************************************************************************
* Globus Developers Tutorial: GridFTP Example - Simple Authenticated Put
*
* There are no handle or operation attributes used in this example.
* This means the transfer runs using all defaults, which implies standard
* FTP stream mode. Note that while this program shows proper usage of
* the Globus GridFTP client library functions, it is not an example of
* proper coding style. Much error checking has been left out and other
* simplifications made to keep the program simple.
***************************************************************************/

#include <stdio.h>
#include "globus_ftp_client.h"


static globus_mutex_t lock;
static globus_cond_t cond;
static globus_bool_t done;

#define MAX_BUFFER_SIZE 2048
#define ERROR -1
#define SUCCESS 0

/****************************************************************
* done_cb: A pointer to this function is passed to the call to
* globus_ftp_client_put (and all the other high level transfer
* operations).   It is called when the transfer is completely
* finished, i.e. both the data channel and control channel exchange.
* Here it simply sets a global variable (done) to true so the main
* program will exit the while loop.
********************************************************************/
static
void
done_cb(
        void *                                   user_arg,
        globus_ftp_client_handle_t *             handle,
        globus_object_t *                        err)
{
    char * tmpstr;

     if(err)
     {
         fprintf(stderr, "%s", globus_object_printable_to_string(err));
     }
    globus_mutex_lock(&lock);
    done = GLOBUS_TRUE;
    globus_cond_signal(&cond);
    globus_mutex_unlock(&lock);
     return;
}

/*************************************************************************
 * data_cb: A pointer to this function is passed to the call to
 * globus_ftp_client_register_write.  It is called when the user supplied
 * buffer has been successfully transferred to the kernel.  Note that does
 * not mean it has been successfully transmitted.  In this simple version,
 * it justs reads the next block of data and calls register_write again.
 *************************************************************************/
static
void
data_cb(
    void *                                       user_arg,
    globus_ftp_client_handle_t *                 handle,
    globus_object_t *                            err,
    globus_byte_t *                              buffer,
    globus_size_t                                length,
    globus_off_t                                 offset,
    globus_bool_t                                eof)
{
     if(err)
     {
         fprintf(stderr, "%s", globus_object_printable_to_string(err));
     }
    else
     {
         if(!eof)
         {
            FILE *fd = (FILE *) user_arg;
            int rc;
            rc = fread(buffer, 1, MAX_BUFFER_SIZE, fd);
            if (ferror(fd) != SUCCESS)
            {
                printf("Read error in function data_cb; errno = %d
", errno);
                 return;
            }
            globus_ftp_client_register_write(
                 handle,
                 buffer,
                 rc,
                 offset + length,
                 feof(fd) != SUCCESS,
                 data_cb,
                 (void *) fd);
         } /* if(!eof) */
     } /* else */
    return;
} /* data_cb */

/**************************
 * Main Program
 *************************/

int main(int argc, char **argv)
{
    globus_ftp_client_handle_t              handle;
    globus_byte_t                           buffer[MAX_BUFFER_SIZE];
    globus_size_t                           buffer_length = MAX_BUFFER_SIZE;
    globus_result_t                         result;
    char *                                  src;
    char *                                  dst;
    FILE *                                  fd;


    /*************************************
      * Process the command line arguments
     *************************************/

    if (argc != 3)
     {
        printf("Usage: put local_file DST_URL
");
        return(ERROR);
     }
    else
     {
         src = argv[1];
         dst = argv[2];
        }

        /*********************************
         * Open the local source file
         *********************************/
        fd = fopen(src,"r");
        if(fd == NULL)
        {
           printf("Error opening local file: %s
",src);
           return(ERROR);

        }

     /*********************************************************************
     * Initialize the module, and client handle
     * This has to be done EVERY time you use the client library
     * The mutex and cond are theoretically optional, but highly recommended
     * because they will make the code work correctly in a threaded build.
     *
     * NOTE: It is possible for each of the initialization calls below to
     * fail and we should be checking for errors.  To keep the code simple
     * and clean we are not.  See the error checking after the call to
     * globus_ftp_client_put for an example of how to handle errors in
     * the client library.
     *********************************************************************/

        globus_module_activate(GLOBUS_FTP_CLIENT_MODULE);
        globus_mutex_init(&lock, GLOBUS_NULL);
        globus_cond_init(&cond, GLOBUS_NULL);
        globus_ftp_client_handle_init(&handle,  GLOBUS_NULL);

     /********************************************************************
     * globus_ftp_client_put starts the protocol exchange on the control
     * channel.  Note that this does NOT start moving data over the data
     * channel
     *******************************************************************/
        done = GLOBUS_FALSE;

        result = globus_ftp_client_put(&handle,
                                   dst,
                                   GLOBUS_NULL,
                                   GLOBUS_NULL,
                                   done_cb,
                                   0);
        if(result != GLOBUS_SUCCESS)
        {
        globus_object_t * err;
        err = globus_error_get(result);
        fprintf(stderr, "%s", globus_object_printable_to_string(err));
        done = GLOBUS_TRUE;
        }
        else
        {
         int rc;

        /**************************************************************
         * This is where the data movement over the data channel is initiated.
         * You read a buffer, and call register_write.  This is an asynch
         * call which returns immediately.  When it is finished writing
         * the buffer, it calls the data callback (defined above) which
         * reads another buffer and calls register_write again.
         * The data callback will also indicate when you have hit eof
         * Note that eof on the data channel does not mean the control
         * channel protocol exchange is complete.  This is indicated by
         * the done callback being called.
         *************************************************************/
        rc = fread(buffer, 1, MAX_BUFFER_SIZE, fd);
        globus_ftp_client_register_write(
            &handle,
            buffer,
            rc,
            0,
            feof(fd) != SUCCESS,
            data_cb,
            (void *) fd);
        }

     /*********************************************************************
     * The following is a standard thread construct.  The while loop is
     * required because pthreads may wake up arbitrarily.  In non-threaded
     * code, cond_wait becomes globus_poll and it sits in a loop using
     * CPU to wait for the callback.  In a threaded build, cond_wait would
     * put the thread to sleep
     *********************************************************************/
        globus_mutex_lock(&lock);
        while(!done)
        {
            globus_cond_wait(&cond, &lock);
        }
        globus_mutex_unlock(&lock);
     /**********************************************************************
     * Since done has been set to true, the done callback has been called.
     * The transfer is now completely finished (both control channel and
     * data channel).  Now, Clean up and go home
     **********************************************************************/
        globus_ftp_client_handle_destroy(&handle);
        globus_module_deactivate_all();

        return 0;

}

To compile the program:

gcc -I /usr/local/globus/include/gcc32 -L/usr/local/globus/lib -o
gridftpclient2 gridftpclientl.c -lglobus_ftp_client_gcc32

To use it:

[globus@m0 globus]$ grid-proxy-init
Your identity: /O=Grid/O=Globus/OU=itso-maya.com/CN=globus
Enter GRID pass phrase for this identity:
Creating proxy ................................................ Done
Your proxy is valid until: Thu Mar 6 02:17:53 2003

[globus@m0 globus]$ ./gridftpclient1 LocalFile gsiftp://g2/tmp/RemoteFile

Partial transfer

All operations are asynchronous and require a callback function that will be called when the operation has been completed. Mutex and condition variables must be used to ensure thread safety.

GridFTP supports partial transfer. To do this, you need to use offsets that will determine the beginning and the end of data that you want to transfer. The type of the offset is globus_off_t.

The globus_ftp_client_partial_put() and globus_ftp_client_partial_get() are used to execute the partial transfer.

The Globus FTP Client library provides the ability to start a file transfer from a known location in the file. This is accomplished by passing a restart marker to globus_ftp_client_get() and globus_ftp_client_put(). The type of this restart marker is globus_ftp_client_restart_marker_t and must be initialized by calling globus_ftp_client_restart_marker_init().

For a complete description of the globus_ftp_client API, see:

http://www--unix.globus.org/api/c/globus_ftp_client/html/index.html

Parallelism

GridFTP supports two kind of transfers:

  • Stream mode is a file transfer mode where all data is sent over a single TCP socket, without any data framing. In stream mode, data will arrive in sequential order. This mode is supported by nearly all FTP servers.

  • Extended block mode is a file transfer mode where data can be sent over multiple parallel connections and to multiple data storage nodes to provide a high-performance data transfer. In extended block mode, data may arrive out of order. ASCII type files are not supported in extended block mode.

Use globus_ftp_client_operationattr_set_mode() to select the mode. Note that you will need a control handler of type globus_ftp_client_operationattr_t to define this transfer mode, and it needs to be initialized before being used by the function globus_ftp_client_operationattr_init().

Currently, only a “fixed” parallelism level is supported. This is interpreted by the FTP server as the number of parallel data connections to be allowed for each stripe of data. Use the globus_ftp_client_operationattr_set_parallelism() to set up the parallelism.

You also need to define a layout that defines what regions of a file will be stored on each stripe of a multiple-striped FTP server. You can do this by using the function globus_ftp_client_operationattr_set_layout().

Example 7-23. Parallel transfer example extracted from Globus tutorial
 /***************************************************************************
 * Globus Developers Tutorial: GridFTP Example - Authenticated Put w/ attrs
 *
 * Operation attributes are used in this example to set a parallelism of 4.
 * This means the transfer must run in extended block mode MODE E.
 * Note that while this program shows proper usage of
 * the Globus GridFTP client library functions, it is not an example of
 * proper coding style.  Much error checking has been left out and other
 * simplifications made to keep the program simple.
 ***************************************************************************/

#include <stdio.h>
#include "globus_ftp_client.h"


static globus_mutex_t lock;
static globus_cond_t cond;
static globus_bool_t done;
int           global_offset = 0;


#define MAX_BUFFER_SIZE (64*1024)
#define ERROR  -1
#define SUCCESS 0
#define PARALLELISM 4


 /********************************************************************
 * done_cb:  A pointer to this function is passed to the call to
 * globus_ftp_client_put (and all the other high level transfer
 * operations).   It is called when the transfer is completely
 * finished, i.e. both the data channel and control channel exchange.
 * Here it simply sets a global variable (done) to true so the main
 * program will exit the while loop.
 ********************************************************************/
static
void
done_cb(
        void *                                   user_arg,
        globus_ftp_client_handle_t *             handle,
        globus_object_t *                        err)
{
    char * tmpstr;

    if(err)
    {
        fprintf(stderr, "%s", globus_object_printable_to_string(err));
    }
    globus_mutex_lock(&lock);
    done = GLOBUS_TRUE;
    globus_cond_signal(&cond);
    globus_mutex_unlock(&lock);
    return;
}

 /*************************************************************************
 * data_cb: A pointer to this function is passed to the call to
 * globus_ftp_client_register_write.  It is called when the user supplied
 * buffer has been successfully transferred to the kernel.  Note that does
 * not mean it has been successfully transmitted.  In this simple version,
 * it justs reads the next block of data and calls register_write again.
 *************************************************************************/
static
void
data_cb(
    void *                                       user_arg,
    globus_ftp_client_handle_t *                 handle,
    globus_object_t *                            err,
    globus_byte_t *                              buffer,
    globus_size_t                                length,
    globus_off_t                                 offset,
    globus_bool_t                                eof)
{
    if(err)
    {
        fprintf(stderr, "%s", globus_object_printable_to_string(err));
    }
    else
    {
        if(!eof)
        {
            FILE *fd = (FILE *) user_arg;
            int rc;
            rc = fread(buffer, 1, MAX_BUFFER_SIZE, fd);
            if (ferror(fd) != SUCCESS)
            {
                printf("Read error in function data_cb; errno = %d
", errno);
                return;
            }
            globus_ftp_client_register_write(
               handle,
               buffer,
               rc,
               global_offset,
               feof(fd) != SUCCESS,
               data_cb,
               (void *) fd);
            global_offset += rc;
        } /* if(!eof) */
        else
        {
           globus_libc_free(buffer);
        }

    } /* else */
    return;
} /* data_cb */

/**************************
 * Main Program
 *************************/

int main(int argc, char **argv)
{
    globus_ftp_client_handle_t              handle;
    globus_ftp_client_operationattr_t       attr;
    globus_ftp_client_handleattr_t          handle_attr;
    globus_byte_t *                         buffer;
    globus_result_t                         result;
    char *                                  src;
    char *                                  dst;
    FILE *                                  fd;
    globus_ftp_control_parallelism_t        parallelism;
    globus_ftp_control_layout_t             layout;
    int                                     i;

   /*************************************
    * Process the command line arguments
    *************************************/

   if (argc != 3)
   {
       printf("Usage: ext-put local_file DST_URL
");
       return(ERROR);
   }
   else
   {
       src = argv[1];
       dst = argv[2];
   }

   /*********************************
    * Open the local source file
    *********************************/
   fd = fopen(src,"r");
   if(fd == NULL)
   {
       printf("Error opening local file: %s
",src);
       return(ERROR);
   }

   /*********************************************************************
    * Initialize the module, handleattr, operationattr, and client handle
    * This has to be done EVERY time you use the client library
    * (if you don't use attrs, you don't need to initialize them and can
    * pass NULL in the parameter list)
    * The mutex and cond are theoretically optional, but highly recommended
    * because they will make the code work correctly in a threaded build.
    *
    * NOTE: It is possible for each of the initialization calls below to
    * fail and we should be checking for errors.  To keep the code simple
    * and clean we are not.  See the error checking after the call to
    * globus_ftp_client_put for an example of how to handle errors in
    * the client library.
    *********************************************************************/

   globus_module_activate(GLOBUS_FTP_CLIENT_MODULE);
   globus_mutex_init(&lock, GLOBUS_NULL);
   globus_cond_init(&cond, GLOBUS_NULL);
   globus_ftp_client_handleattr_init(&handle_attr);
   globus_ftp_client_operationattr_init(&attr);

   /************************************************************************
    * Set any desired attributes, in this case we are using parallel streams
    ************************************************************************/

   parallelism.mode = GLOBUS_FTP_CONTROL_PARALLELISM_FIXED;
   parallelism.fixed.size = PARALLELISM;
   layout.mode = GLOBUS_FTP_CONTROL_STRIPING_BLOCKED_ROUND_ROBIN;
   layout.round_robin.block_size = 64*1024;
   globus_ftp_client_operationattr_set_mode(
      &attr,
      GLOBUS_FTP_CONTROL_MODE_EXTENDED_BLOCK);
   globus_ftp_client_operationattr_set_parallelism(&attr,
                                                   &parallelism);

   globus_ftp_client_operationattr_set_layout(&attr,
                                              &layout);

   globus_ftp_client_handle_init(&handle,  &handle_attr);

   /********************************************************************
    * globus_ftp_client_put starts the protocol exchange on the control
    * channel.  Note that this does NOT start moving data over the data
    * channel
    *******************************************************************/
   done = GLOBUS_FALSE;

   result = globus_ftp_client_put(&handle,
                                  dst,
                                  &attr,
                                  GLOBUS_NULL,
                                  done_cb,
                                  0);
   if(result != GLOBUS_SUCCESS)
   {
       globus_object_t * err;
       err = globus_error_get(result);
       fprintf(stderr, "%s", globus_object_printable_to_string(err));
       done = GLOBUS_TRUE;
   }
   else
   {
       int rc;

       /**************************************************************
        * This is where the data movement over the data channel is initiated.
        * You read a buffer, and call register_write.  This is an asynch
        * call which returns immediately.  When it is finished writing
        * the buffer, it calls the data callback (defined above) which
        * reads another buffer and calls register_write again.
        * The data callback will also indicate when you have hit eof
        * Note that eof on the data channel does not mean the control
        * channel protocol exchange is complete.  This is indicated by
        * the done callback being called.
        *
        * NOTE: The for loop is present BECAUSE of the parallelism, but
        * it is not CAUSING the parallelism.  The parallelism is hidden
        * inside the client library.  This for loop simply insures that
        * we have sufficient buffers queued up so that we don't have
        * TCP steams sitting idle.
        *************************************************************/
       for (i = 0; i< 2 * PARALLELISM && feof(fd) == SUCCESS; i++)
       {
           buffer = malloc(MAX_BUFFER_SIZE);
           rc = fread(buffer, 1, MAX_BUFFER_SIZE, fd);
           globus_ftp_client_register_write(
               &handle,
               buffer,
               rc,
               global_offset,
               feof(fd) != SUCCESS,
               data_cb,
               (void *) fd);
           global_offset += rc;

       }

   }

   /*********************************************************************
    * The following is a standard thread construct.  The while loop is
    * required because pthreads may wake up arbitrarily.  In non-threaded
    * code, cond_wait becomes globus_poll and it sits in a loop using
    * CPU to wait for the callback.  In a threaded build, cond_wait would
    * put the thread to sleep
    *********************************************************************/
   globus_mutex_lock(&lock);
   while(!done)
   {
       globus_cond_wait(&cond, &lock);
   }
   globus_mutex_unlock(&lock);

   /**********************************************************************
    * Since done has been set to true, the done callback has been called.
    * The transfer is now completely finished (both control channel and
    * data channel).  Now, Clean up and go home
    **********************************************************************/
   globus_ftp_client_handle_destroy(&handle);
   globus_module_deactivate_all();

   return 0;
}

To compile the program:

gcc -I /usr/local/globus/include/gcc32 -L/usr/local/globus/lib -o
gridftpclient2 gridftpclient2.c -lglobus_ftp_client_gcc32

To use it:

[globus@m0 globus]$ grid-proxy-init
Your identity: /O=Grid/O=Globus/OU=itso-maya.com/CN=globus
Enter GRID pass phrase for this identity:
Creating proxy ..................................................Done
Your proxy is valid until: Thu Mar 6 02:17:53 2003
[globus@m0 globus]$ ./gridftpclient2 LocalFile gsiftp://g2/tmp/RemoteFile

Shells tools

globus-url-copy is the shell tool to use to transfer files from one location to another. It takes two parameters that are the URLs for the specific file. The prefix gsiftp://<hostname>/ is used to specify a GridFTP server.

The following example copies a file from the host m0 to the server a1:

globus-url-copy gsiftp://m0/tmp/FILE gsiftp://a1/~/tmp

The following example uses a GASS server started on host b0 and listening on port 23213:

globus-url-copy https://b0:23213/home/globus/OtherFile gsiftp://a1/~/tmp

The following example uses a local file as a source file:

globus-url-copy file:///tmp/FILE gsiftp://a1/~/tmp

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.16.23