Chapter 6: Parallel Processing in DS2

6.1 Introduction

6.2 Understanding Threaded Processing

6.2.1 The Need for Speed

6.2.2 Loading Data to and from RAM

6.2.3 Manipulating Data in RAM

6.3 DS2 Thread Programs

6.3.1 Writing DS2 Thread Programs

6.3.2 Parallel Processing Data with DS2 Threads

6.4 DS2 and the SAS In-Database Code Accelerator

6.4.1 DS2 Program In-Database Processing

6.5 DS2 and SAS® Viya® and SAS Cloud Analytic Services (CAS)

6.5.1 A Brief Introduction to SAS Viya and CAS

6.5.2 Running DS2 Programs in CAS

6.6 Review of Key Concepts

6.1 Introduction

In Chapter 1, we noted that the traditional SAS DATA step always processes data one row at a time using a single compute thread. We said that DS2 is capable of significantly speeding up compute-intensive data manipulation by processing multiple rows of data in parallel. But up to this point, all of our DS2 data programs have also been using only one compute thread. In this chapter, we’ll write DS2 thread programs and execute them from DS2 data programs to process multiple rows of data in parallel and, as we practice with this technique, we’ll demonstrate the benefits of parallel processing for compute-intensive applications. Specifically, we will cover these points:

   threading in data processing

   writing DS2 thread programs

   thread parameters and constructor methods

   global and local variables in threads

   storing and reusing threads

   parallel processing with DS2 threads

   executing threads on the Base SAS compute platform

   executing threads in-database with the SAS In-Database Code Accelerator

6.2 Understanding Threaded Processing

6.2.1 The Need for Speed

We humans are such an impatient species! Since the inception of the computer, extraordinary effort has been expended to increase processing speed. There are so many factors affecting processing speed that we won’t be able to address the problem in realistic detail in this book. However, to simplify the discussion, we’ll break down the problem into two major categories: moving data from off-line storage into system memory (RAM) and back to off-line storage, and processing data that resides in memory.

6.2.2 Loading Data to and from RAM

Memory and storage have traditionally been limited assets, requiring careful conservation on the part of the programmer. Off-line storage has taken many forms over the years, from punch cards to paper tape, magnetic tape, floppy disks, hard drives, thumb drives, and today’s solid state disk (SSD) drives. Moving data to and from off-line storage has always been an electro-mechanical process and has always been orders of magnitude slower than processing data in RAM. And even today’s solid state technologies have significantly higher lag than RAM. Lag in data retrieval can be exacerbated by the need to draw the data to the compute platform over a busy network. When it takes longer to retrieve the data than it takes to perform the computations, the process is referred to as input/output bound, or more typically as I/O bound. If you have a long-running DATA step and the SAS log indicates significantly more elapsed time than CPU time, the process is likely to be I/O bound.

With off-line storage, the goal is to move the data into and out of RAM as quickly as possible. Many strategies have evolved to address this issue. Some of the more significant improvements have come from redundant arrays of independent disks (RAID) technology and the rise of SSDs. Caching strategies along with physically parallel data storage and retrieval schemes like RAID have greatly increased data throughput rates. SSDs store data in memory chips that don’t require constant power applied to retain their data instead of the spinning platters and moving read heads of the traditional hard drive. Because the process is completely electronic, the data transfer rates of SSDs are much faster than those of hard drives, though they usually cost a bit more. SSDs are slower than the dynamic memory technologies used in RAM, but the gap is narrowing.

All of these technologies and strategies present data to a CPU in the same order as it would have appeared if read serially from a single disk. Because of this, any modifications required to accomplish threaded Read/Write operations primarily affect the operating system and are transparent to the applications that are consuming the data.

6.2.3 Manipulating Data in RAM

As data was moved into RAM at ever faster rates and as the computations performed on each row of data became more complex, it became more common for processes to receive additional data faster than the CPU could accomplish the calculations on the current row. This type of process is described as CPU bound. You can identify long-running CPU-bound DATA steps in SAS by looking at the log. If the CPU time is consistently the same as the elapsed time, the process is probably CPU bound.

Like the field of off-line storage, our CPU and supporting systems were evolving to meet the new demand, becoming ever faster and more efficient. From the 8-bit processors with ~3.5K transistors operating at 1-2 MHz clock speeds in the first Apple computers to today’s 64-bit processors with over 1.5T transistors operating at clock speeds greater than 4 GHz, we pushed the CPU speed envelope to the limits. Eventually, making a single CPU with more, smaller transistors and a faster clock no longer produced dramatic processing speed boosts, and we needed to rethink our approaches to boosting speed. And so, the age of parallel application processing was born. Instead of a single CPU, today’s processor chips boast multiple CPU cores, each capable of processing separate tasks in parallel. Each task is normally referred to as a thread. Frequently, a core is capable of simultaneously processing two threads at once, so a CPU chip with four cores can process eight threads simultaneously. Today, even our smart phones boast multiple-cored CPUs and the associated parallel processing capabilities.

Having more than one CPU available opens up whole new ways of approaching processing to improve the speed of operations. Those willing to redesign processes from the ground up can identify program tasks that could be accomplished independently of each other and rewrite the program to do these tasks in parallel, greatly reducing the clock time required to complete the entire program. As you might imagine, this greatly increases the complexity of writing software, but can provide dramatic speed boosts. This is the world in which DS2 was created to operate.

6.3 DS2 Thread Programs

6.3.1 Writing DS2 Thread Programs

DS2 threads are stored programs that can be executed in parallel. They are in some ways similar to data programs in that they must contain at least one explicit system method (INIT, RUN, or TERM), can declare local and global variables that affect the PDV, and can instantiate and use packages.

Threads are also similar to packages: they are stored in SAS libraries as encrypted source code and can include user-defined methods. Threads can accept parameters, but, unlike packages, they don’t have user-defined constructor methods and thus do not accept parameter input upon instantiation. Instead, parameter values are set using the thread’s SETPARMS method, which must be called before executing the thread with a SET FROM statement. Like a package, parameter variables are private to the thread program, but are globally available within the thread.

Let’s write our first thread program and take a look at the results.

proc ds2;

thread work.myThread(double flag)/overwrite=y;

   dcl int ThreadNo;

   dcl int Count;

   method run();

      set sas_data.banks;

      count+1;

   end;

   method term();

      ThreadNo=_threadid_;

      put ThreadNo= Count=;

   end;

endthread;

run;

quit;

A quick look at the thread data set reveals that it looks much like a package, but the column that contains the encrypted source is named SAS_TEXTTHREAD_ instead of SAS_TEXTPACKAGE_, as shown in Figure 6.1:

Figure 6.1: Viewing the Contents of a Stored DS2 Thread Program

image

To execute the thread, first we must declare an instance in a data program. Because the thread requires a parameter, we’ll call the thread’s SETPARMS method to pass in the parameter value before we use a SET FROM statement to execute the thread. The THREADS=3 option will cause three copies of the thread program to execute in parallel.

data _null_;

   dcl thread work.myThread t;

   method init();

      t.setparms(1);

   end;

   method run();

      set from t threads=3;

   end;

   method term();

      put 'DATA Program: ';

      put _all_;

      put;

   end;

enddata;

SAS Log:

ThreadNo=3 Count=0

ThreadNo=1 Count=3

ThreadNo=2 Count=0

DATA Program:

Count=3 ThreadNo= Bank=National Savings and Trust _N_=4 Rate=0.0328

The log reveals that the thread program’s parameter variable flag is available for processing in the thread program, but not in the data program. However, the thread’s globally declared ThreadNo and Count variables do appear in the data program’s PDV, and this can be a bit confusing. Because Count was incremented with a SUM statement, its value is retained in the PDV. However, the ThreadNo variable is not retained. So, by the time the data program TERM method is executed, Count is still 3, but ThreadNo has been reinitialized to missing–even though _n_ is 4! In this case, it might have been a better idea just to drop ThreadNo and Count in the thread program, as we really have no use for them in the output result set.

It is also clear from the log that although three threads were spawned, threads 2 and 3 didn’t process any data. Thread 1 performed all the work. Because sas_data.banks is very small, there was really only one block of data and therefore all the data was sent to a single thread. The remaining threads executed and terminated without processing any data.

When executing DS2 threads on the SAS compute platforms with realistically sized data, you can  visualize the process as shown in Figure 6.2:

Figure 6.2: DS2 Threaded Processing on the SAS Compute Platform

image

The SET FROM statement spawns a single read thread that distributes the source data to the prescribed number of compute threads. The single read thread ensures that a row of data is never passed to more than one thread. As each thread completes its computations on a row of data, it returns the result to the data program’s PDV. The data program processes the data row and writes it to the target data set in a single-threaded process.

6.3.2 Parallel Processing Data with DS2 Threads

When you are considering whether threading with DS2 on the SAS compute platform will improve performance, ask these two primary questions:

   Is the process CPU bound? If the process is I/O bound we have to remember that there will be only one read thread for our process and adding additional compute threads will probably actually degrade performance, not improve it.

   Is the data set that we are processing large enough to produce several blocks that can be distributed to more than one thread? Even if the process is CPU bound, if the data is so small that it all gets sent to a single compute thread, we shouldn’t expect improved performance.

We have access to a DS2 package containing a scoring method and a DS2 data program that is currently used for scoring. We’ve been asked to convert this to a threaded process. The data set that we are scoring is sas_data.campaign.

Let’s run the data program and check the log to see whether it might be CPU bound:

proc ds2;

data scored/overwrite=yes;

   dcl package sas_data.Scoring s;

   dcl double FinalScore;

   dcl bigint Count;

   drop Count;

   keep id FinalScore;

   method run();

      set sas_data.campaign;

      /*Instantiate the SCORING package with the input variables */

      s=_new_  sas_data.Scoring       

              (IM_DemMedHomeValue,IM_GiftAvgAll,IM_PromCntAll);

       /* Call the SCORE method to score the data */

      s.Score(FinalScore);

      Count+1;

   end;

   method term();

      put 'The DATA step processed ' Count ' observations.';

   end;

enddata;

run;

quit;

Here is the SAS log excerpt from the DS2 data program:

The DATA step processed  10000  observations.

NOTE: Execution succeeded. 10000 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):

       real time           0.47 seconds

       cpu time            0.46 seconds

Real time is very close to CPU time, so it’s quite possible the process is CPU bound. Let’s convert the data program to a thread program, execute the thread, and compare the performance:

proc ds2;

thread ScoreIt / overwrite=yes;

   dcl package sas_data.Scoring s;

   dcl double FinalScore;

   dcl bigint Count;

   drop Count;

   keep id FinalScore;

   method run();

      set sas_data.campaign;

      /*Instantiate the SCORING package with input values */

      s=_new_ sas_data.Scoring

             (IM_DemMedHomeValue,IM_GiftAvgAll,IM_PromCntAll);

      /* Call the SCORE method to score the data */

      s.Score(FinalScore);

      Count+1;

   end;

   method term();

      put 'Thread ' _threadid_ 'processed ' Count ' observations.';

   end;

endthread;

run;

quit;

 

proc ds2;

data scored_thread/overwrite=yes;

   dcl thread ScoreIt th;

   method run();

      set from th;

   end;

enddata;

run;

quit;

Here is the SAS log excerpt from the DS2 data program executing a single thread program:  

Thread  0 processed  10000  observations.

NOTE: Execution succeeded. 10000 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):

       real time           0.45 seconds

       cpu time            0.46 seconds

The threaded version is running without error, and it’s taking about the same time as the original data program. This is expected, as we did not specify THREADS=, so the program was running single threaded. My laptop has four cores, so let’s execute this in four threads and see whether it improves performance.

proc ds2;

data scored_threads/overwrite=yes;

   dcl thread ScoreIt th;

   method run();

      set from th threads=4;

   end;

enddata;

run;

quit;

Here is the SAS log excerpt from the DS2 data program executing four thread programs in parallel:  

Thread  3 processed  1634  observations.

Thread  0 processed  2314  observations.

Thread  2 processed  2492  observations.

Thread  1 processed  3560  observations.

NOTE: Execution succeeded. 10000 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):

      real time           0.23 seconds

      cpu time            0.56 seconds

The multi-threaded version also ran without error and cut execution time in half!

6.4 DS2 and the SAS In-Database Code Accelerator

6.4.1 DS2 Program In-Database Processing

Regardless of where your DS2 threads are stored, if you have licensed, installed, and configured the SAS In-Database Code Accelerator for a supported database, the thread program can be sent into the database as code where it compiles and executes distributed on the database hardware. If the prospect of running thread programs on an MPP platform excites you, then you will really love knowing that, on Teradata or Hadoop installations, the DS2 data program itself can also run in parallel inside the database! This allows us to use the power and flexibility of SAS DS2 compute processes while benefitting from the massively parallel processing (MPP) capability of the database hardware. As of the time this book was written, the SAS In-Database Code Accelerator is available for Teradata, Hadoop, and Greenplum databases.

To set your DS2 program free from the bounds of the SAS compute platform, you need only to ensure that the thread program is reading from a database table in the SET statement and that you have granted DS2 permission to run in the database. This can be accomplished by specifying the PROC DS2 option DS2ACCEL=YES, or by setting the SAS system option DS2ACCEL=ANY. If you also write the results of your program to the same database, your code goes in, all processing happens on the DBMS platform, and nothing is returned to the SAS session but the log. It really is impressively speedy!

You might not have access to a Teradata instance with the SAS In-Database Code Accelerator installed, but I did while writing this section of the book. First, I ran a DS2 data program in threads on the SAS compute platform, using a large Teradata table as input and writing the results to Teradata. I then ran the exact same program in-database on Teradata by setting the PROC DS2 option DS2ACCEL=YES.

Here is the code that ran on the SAS platform:

proc ds2;

data db_data.scored_thread/overwrite=yes;

   dcl thread ScoreIt th;

   method run();

      set from th threads=4;

   end;

enddata;

run;

quit;

Here is the SAS log excerpt from execution on the SAS platform:  

NOTE: PROCEDURE DS2 used (Total process time):

       real time           4:08.68

       cpu time            6.48 seconds

And here is the code used for in-database processing:

proc ds2 ds2accel=yes;

data db_data.scored_thread/overwrite=yes;

   dcl thread ScoreIt th;

   method run();

      set from th;

   end;

enddata;

run;

quit;

Note that when running DS2 inside the database, we no longer need to specify THREADS= on the SET FROM statement – the SAS In-Database Code Accelerator will ensure that the program is distributed to the appropriate nodes where the data resides. If you forget and specify a value for THREADS=, DS2 will inform you in the log that the THREADS= specification was ignored while running inside the database. Here is the SAS log excerpt from executing the DS2 program in-database:  

NOTE: PROCEDURE DS2 used (Total process time):

       real time           16.65 seconds

       cpu time            0.40 seconds

There was some extra network latency introduced because my SAS server was located in Boston, MA, and the Teradata server was located in Cary, NC, but as you can see, the processing speed was tremendously improved by moving the processing into the Teradata database, using the amazing, massively parallel processing capabilities of the Teradata system.

6.5 DS2 and SAS® Viya® and SAS Cloud Analytic Services (CAS)

6.5.1 A Brief Introduction to SAS Viya and CAS

In 2016, SAS announced a radically new architecture known as SAS Viya. SAS Viya is a cloud-enabled, in-memory analytics engine that is elastic, highly scalable, fault-tolerant and self-healing. The engine that underpins SAS Viya is called SAS Cloud Analytic Services (CAS). CAS provides resilient, in-memory distributed processing capability for optimized processing of complex analytical workloads. CAS can be programmed using traditional SAS programming interfaces, but is also accessible from open-source languages like Python, R, or Lua via the SAS Scripting Wrapper for Analytics Transfer (SWAT). Java classes are provided to enable connections to the server and for data analysis. SAS Viya also includes public REST APIs to make the underlying analytics easily accessible from just about any application.

While single machine deployment is possible, CAS is normally deployed in a multi-machine, distributed, massively parallel processing (MPP) architecture. Data in the server is managed in blocks, which can be cached to disk to help manage memory efficiently. This allows CAS to handle enormous data volumes efficiently while remaining responsive to multiple users.

6.5.2 Running DS2 Programs in CAS

If your organization has licensed SAS Viya and you have access to CAS, you’re in for a real treat as a DS2 programmer! Like in-database processing with Teradata and Hadoop, if both your source and target data reside in CAS, CAS will execute both the DATA and the THREAD programs distributed in the MPP environment, dramatically improving processing speed.

Let’s score the campaign data set in CAS. First, we will connect to CAS and load the campaign data:

/* Connect to CAS */

OPTIONS CASHOST="mycas.mydomain.com" CASPORT=1234;

cas mycas sessopts=(caslib=casuser timeout=1800 locale="en_US");

 

/*****************************************************************

 Make current CAS libraries visible in SAS library tree

******************************************************************/

caslib _all_ assign;

 

/*****************************************************************

 Load Base SAS data into CAS

   DATA=      SAS data set to copy into CAS

   OUTCASLIB= CAS LIBREF to hold the new CAS table

   CASOUT=    name for the new CAS table

   PROMOTE makes the new table available to all active CAS sessions.

******************************************************************/

proc casutil;

   load

   data=db_data.campaign

   outcaslib="casuser"

   casout="campaign" promote;

run;

As indicated in the SAS log, it didn’t take long to load the data into CAS:  

NOTE: PROCEDURE CASUTIL used (Total process time):

       real time           0.94 seconds

       cpu time            0.51 seconds

It would be even faster if the data was stored in SASHDAT format in the CAS environment.

Next, we re-create the scoring package in CAS to make the package available in the CAS environment. Only two changes are required to the DS2 package program:

1.   Use the SESSREF= option in the PROC DS2 statement to process in CAS.

2.   Store the package in the CAS library.

proc ds2 sessref=mycas;

package casuser.Scoring / overwrite=yes;

   method Score(in_out double ScoreVar,

            double im_demmedhomevalue,

            double im_giftavgcard36,

            double im_giftcnt36,

            double im_gifttimefirst,

            double im_gifttimelast,

            double statuscat96nk,

            double m_giftavgcard36);

     /*ScoreVar is the target variable*/

      dcl char(12) I_TARGET_B  _ST12;

      dcl double

          p_target_b0 u_target_b _TEMP _P1 _P0 _IY _MAXP _LP0

          _DM_FIND _LMR_BAD _ST5 _2_5 _2_4 _2_3 _2_2 _2_1 _2_0

          _6_1 _6_0;

      _LMR_BAD = 0.0;

      if MISSING(IM_DEMMEDHOMEVALUE) then

         do;

            _LMR_BAD = 1.0;

            goto DefaultExit;

         end;

      if MISSING(IM_GIFTAVGCARD36) then

         do;

            _LMR_BAD = 1.0;

            goto DefaultExit;

         end;

      if MISSING(IM_GIFTCNT36) then

         do;

            _LMR_BAD = 1.0;

            goto DefaultExit;

         end;

      if MISSING(IM_GIFTTIMEFIRST) then

         do;

            _LMR_BAD = 1.0;

            goto DefaultExit;

         end;

      if MISSING(IM_GIFTTIMELAST) then

         do;

            _LMR_BAD = 1.0;

            goto DefaultExit;

         end;

      _2_0 = 0.0;

      _2_1 = 0.0;

      _2_2 = 0.0;

      _2_3 = 0.0;

      _2_4 = 0.0;

      _2_5 = 0.0;

      _ST5 = LEFT(TRIM(put(STATUSCAT96NK, $5.)));

      _DM_FIND = 0.0;

      if _ST5 <= 'F' then

         do;

            if _ST5 <= 'E' then

               do;

                  if _ST5 = 'A' then

                     do;

                        _2_0 = 1.0;

                        _DM_FIND = 1.0;

                     end;

                  else

                     do;

                        if _ST5 = 'E' then

                           do;

                              _2_1 = 1.0;

                              _DM_FIND = 1.0;

                           end;

                     end;

               end;

            else

               do;

                  if _ST5 = 'F' then

                     do;

                        _2_2 = 1.0;

                        _DM_FIND = 1.0;

                     end;

               end;

         end;

      else

         do;

            if _ST5 <= 'N' then

               do;

                  if _ST5 = 'L' then

                     do;

                        _2_3 = 1.0;

                        _DM_FIND = 1.0;

                     end;

                  else

                     do;

                        if _ST5 = 'N' then

                           do;

                              _2_4 = 1.0;

                              _DM_FIND = 1.0;

                           end;

                     end;

               end;

            else

               do;

                  if _ST5 = 'S' then

                     do;

                        _2_5 = 1.0;

                        _DM_FIND = 1.0;

                     end;

               end;

         end;

      if ^_DM_FIND then

         do;

            _2_0 = .;

            _2_1 = .;

            _2_2 = .;

            _2_3 = .;

            _2_4 = .;

            _2_5 = .;

            _LMR_BAD = 1.0;

            goto DefaultExit;

         end;

      _6_0 = 0.0;

      _6_1 = 0.0;

      _ST12 = LEFT(TRIM(put(M_GIFTAVGCARD36, BEST12.)));

      if _ST12 = '0' then

         do;

            _6_0 = 1.0;

         end;

      else if _ST12 = '1' then

         do;

            _6_1 = 1.0;

         end;

      else

         do;

            _6_0 = .;

            _6_1 = .;

            _LMR_BAD = 1.0;

            goto DefaultExit;

         end;

      _LP0 = 0.0;

      _LP0 = _LP0 + (1.4083626419216E-6) * IM_DEMMEDHOMEVALUE;

      _LP0 = _LP0 + (-0.01129757026733) * IM_GIFTAVGCARD36;

      _LP0 = _LP0 + (0.07014728733418) * IM_GIFTCNT36;

      _LP0 = _LP0 + (0.00325352658053) * IM_GIFTTIMEFIRST;

      _LP0 = _LP0 + (-0.03942784854233) * IM_GIFTTIMELAST;

      _TEMP = 1.0;

      _LP0 = _LP0 + (0.28218821521876) * _TEMP * _6_0;

      _LP0 = _LP0 + (0.0) * _TEMP * _6_1;

      _TEMP = 1.0;

      _LP0 = _LP0 + (-0.04430991827625) * _TEMP * _2_0;

      _LP0 = _LP0 + (0.33212057095241) * _TEMP * _2_1;

      _LP0 = _LP0 + (-0.15280893542678) * _TEMP * _2_2;

      _LP0 = _LP0 + (-0.02184926539233) * _TEMP * _2_3;

      _LP0 = _LP0 + (0.08403485184956) * _TEMP * _2_4;

      _LP0 = _LP0 + (0.0) * _TEMP * _2_5;

      _TEMP = 0.0513406495515 + _LP0;

      if (_TEMP < 0.0) then

         do;

            _TEMP = EXP(_TEMP);

            _P0 = _TEMP / (1.0 + _TEMP);

         end;

      else _P0 = 1.0 / (1.0 + EXP(-_TEMP));

      _P1 = 1.0 - _P0;

      ScoreVar = _P0;

      _MAXP = _P0;

      _IY = 1.0;

      P_TARGET_B0 = _P1;

      if (_P1 > _MAXP + 1E-8) then

         do;

            _MAXP = _P1;

            _IY = 2.0;

         end;

      select (_IY);

         when (1.0)

            do;

               I_TARGET_B = '1';

               U_TARGET_B = 1.0;

            end;

         when (2.0)

            do;

               I_TARGET_B = '0';

               U_TARGET_B = 0.0;

            end;

         otherwise

            do;

               I_TARGET_B = '';

               U_TARGET_B = .;

            end;

      end;

      DefaultExit:

      if _LMR_BAD = 1.0 then

         do;

            I_TARGET_B = '';

            U_TARGET_B = .;

            ScoreVar = .;

            P_TARGET_B0 = .;

         end;

      ;

   end;

endpackage;

run;

quit;

And the SAS log indicates that this process ran quickly and successfully:  

NOTE: Created package scoring in data set casuser.scoring.

NOTE: Execution succeeded. No rows affected.

NOTE: PROCEDURE DS2 used (Total process time):

      real time           0.10 seconds

      cpu time            0.01 seconds

Next, we will store the THREAD program in CAS to make it available in the CAS environment. The same two changes are required to the DS2 THREAD program. In the TERM section, I added references to the _THREADID_, _NTHREADS_, and _HOSTNAME_ automatic variables, which provide the thread identifier number, total number of threads spawned, and the name of the host on which the thread is executing, respectively. It helps me visualize how the data was processed.

proc ds2 sessref=mycas;

thread ScoreIt / overwrite=yes;

   dcl package casuser.Scoring s();

   dcl double FinalScore;

   dcl bigint Count;

   keep id FinalScore Target_B;

   method run();

      set casuser.campaign;

      /*Instantiate the SCORING package with input values */

      /* Call the SCORE method to score the data */

       /* Call the SCORE method to score the data */

      s.Score(FinalScore,

           im_demmedhomevalue,

           im_giftavgcard36,

           im_giftcnt36,

           im_gifttimefirst,

           im_gifttimelast,

           statuscat96nk,

           m_giftavgcard36);

      Count+1;

   end;

   method term();

      put 'Thread' _threadid_ 'of' _nthreads_ 'processed'

          Count 'observations on' _hostname_ '.';

   end;

endthread;

run;

quit;

And the SAS log indicates that this process also ran quickly and successfully:  

NOTE: Created thread scoreit in data set "casuser(majord)".scoreit.

NOTE: Execution succeeded. No rows affected.

NOTE: PROCEDURE DS2 used (Total process time):

      real time           0.17 seconds

      cpu time            0.00 seconds

Now, we will execute the DATA program in CAS to score the data:

proc ds2 sessref=mycas;

data casuser.scored_thread/overwrite=yes;

   dcl thread ScoreIt th;

   method run();

      set from th;

   end;

enddata;

run;

quit;

The SAS log indicates that the THREAD and DATA programs executed in parallel in CAS:  

Thread 2 of 2 processed 53273 observations on casnode1.

Thread 1 of 2 processed 53273 observations on casnode2.

NOTE: Running THREAD program on all nodes

NOTE: Running DATA program on all nodes

NOTE: Execution succeeded. 106546 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):

      real time           5.71 seconds

      cpu time            0.00 seconds

The observations processed by the threads are returned to the DATA program and further processing is possible. Because there is DATA program processing after the threads processing, I would expect the DATA program to run on a single node (single threaded). Let’s try separating the scored observations into high, medium, and low scoring data sets:

proc ds2 sessref=mycas;

data casuser.high_score

     casuser.medium_score

     casuser.low_score   /overwrite=yes;

   dcl thread ScoreIt th;

   method run();

      set from th;

      if FinalScore >= .65 then output casuser.high_score;

        else if FinalScore >= .30 then output casuser.medium_score;

        else output casuser.low_score;

   end;

enddata;

run;

quit;

As expected, the SAS log indicates that the THREAD program executed in parallel on all CAS nodes, but the DATA program ran on a single node:  

Thread 1 of 2 processed 53273 observations on casnode2.

Thread 2 of 2 processed 53273 observations on casnode1.

NOTE: Running THREAD program on all nodes

NOTE: Running DATA program on one node

NOTE: Execution succeeded. 106546 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):

      real time           6.17 seconds

      cpu time            0.00 seconds

6.6 Review of Key Concepts

   A program step is likely to be CPU bound if the SAS log reports that real time and CPU time for the step are within 10% of each other.

   Traditional SAS DATA steps process individual rows of data sequentially in a single compute thread.

   DATA steps that perform many computations on each row of data can easily become CPU bound.

   CPU-bound DATA steps will likely perform better if converted to DS2 and executed in several parallel threads.

   The SAS In-Database Code Accelerator is available for Hadoop, Teradata, and Greenplum.

   With the SAS In-Database Code Accelerator installed, DS2 thread programs can be executed in-database. For Hadoop and Teradata, DS2 data programs can also be executed in-database.

   If your data resides in a DBMS with the SAS In-Database Code Accelerator installed, even I/O bound processes will likely see significant improvements in execution speed if converted to DS2 THREADs and executed in-database.

   If SAS Viya is available, it is quite simple to execute DS2 programs fully distributed using Cloud Analytic Services (CAS). CAS provides benefits similar to in-database processing for a wide array of data sources accessible from SAS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.253.210