Cajoling HTML and JavaScript

Before we begin compiling our mixed HTML and JavaScript documents into a safe code subset, we need to look at the tools that we will be using.

In the caja directory that we created for the project, you’ll see a directory containing the scripts that we will use to compile our code. The cajole_html script is specific to the task of cajoling standard HTML and JavaScript, and it’s the script we’ll use here to cajole our standard code. After the cajoling process completes, we will have two output files:

  • An HTML output file containing the markup of our script, divorced from any embedded JavaScript blocks. This HTML file will contain secure, directly embeddable markup that we can insert within a site. All unsafe markup tags, such as iframes, will be stripped from the final derived markup.

  • The cajoled JavaScript file. The JavaScript will be a secured version of what we started with, stripping out any insecure script.

To run the mixed HTML/JavaScript command-line cajoler, we can simply go to the root of the caja directory from which we checked out the SVN source and run the appropriate cajole_html script with a few parameters:

cd caja
bin/cajole_html -i <htmlInputFile> -o <outputTarget>

cajole_html allows us to specify an input file to cajole (htmlInputFile) and an output filename to dump our two cajoled files to (outputTarget). htmlInputFile can be an absolute URL of a file to be cajoled or a direct reference to a file on the local system. outputTarget is simply the string name to call the output files, along with the file path to build them to. The two output files will be named:

  • {outputTarget}.out.html

  • {outputTarget}.out.js

These are the two files that you should expect to be generated when you run the cajoler against a source file with mixed HTML and JavaScript.

Running the cajoler

Let’s look at an example of the cajoling process:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Caja Sample HTML</title>
</head>
<body>

<h1>Sample Redirection Script</h1>
<a onclick="goRedirect()">Click to Redirect</a>

<script type="text/javascript">
//redirect user to new site
function goRedirect(){
   var redirects;
   with(redirects){
      var href = "http://www.yahoo.com"
      window.location = href;
   }
}
</script>

</body>
</html>

When we cajole this mixed HTML/JavaScript file via the command line, we get the following messages:

1    notseveral-lm:caja jleblanc$ bin/cajole_html -i 
     ../git/programming-social-applications/caja/ch9_caja_sample_html.html 
     -o caja_sample
2    LOG    : Checkpoint: LegacyNamespaceFixupStage at T+0.113971 seconds
3    LOG    : Checkpoint: ResolveUriStage at T+0.12005 seconds
4    LOG    : Checkpoint: RewriteHtmlStage at T+0.124126 seconds
5    LINT   : ch9_caja_sample_html.html:16+42: Semicolon inserted
6    LOG    : Checkpoint: InlineCssImportsStage at T+0.204033 seconds
7    LOG    : Checkpoint: SanitizeHtmlStage at T+0.204083 seconds
8    WARNING: ch9_caja_sample_html.html:2+1 - 23+8: folding element html into parent
9    WARNING: ch9_caja_sample_html.html:3+1 - 5+8: folding element head into parent
10    WARNING: ch9_caja_sample_html.html:4+1 - 32: removing disallowed tag title
11    WARNING: ch9_caja_sample_html.html:6+1 - 22+8: folding element body into parent
12    LOG    : Checkpoint: ValidateCssStage at T+0.206399 seconds
13    LOG    : Checkpoint: RewriteCssStage at T+0.222766 seconds
14    LOG    : Checkpoint: HtmlToBundleStage at T+0.222807 seconds
15    LOG    : Checkpoint: OptimizeJavascriptStage at T+0.279367 seconds
16    LOG    : Checkpoint: ValidateJavascriptStage at T+0.279401 seconds
17    ERROR  : ch9_caja_sample_html.html:15+5 - 18+6: "with" blocks are not allowed
18    LOG    : Checkpoint: ConsolidateCodeStage at T+0.553624 seconds
19    LOG    : Checkpoint: CheckForErrorsStage at T+0.561566 seconds

For the sake of this example, we will ignore the LOG messages—they are just notifications at different stages of the cajoling process. We are, however, interested in the LINT, WARNING, and ERROR messages, as those are pertinent to our build process.

The LINT message on line 5 states that a semicolon was inserted. This message was generated because we forgot a semicolon at the end of the line when we defined our url parameter in our HTML sample code. By default, JavaScript tries to help developers by automatically inserting a semicolon if one was omitted. But because this process can sometimes insert semicolons where you do not want them—causing errors in the program flow—messages like this are produced.

Next, we have the WARNING messages on lines 8 through 11. Since Caja is building an HTML and JavaScript file to be inserted on an existing page (such as a gadget), the html, head, and body tags are all folded up into the parent and thus removed from the output in the HTML file. In addition, the title element is also removed because the code base is running in an existing container.

Last is the ERROR message on line 17, which tells us that with blocks in JavaScript are not allowed in compiling. This error will stop the cajoling process and not produce output files.

If we were to remove the with block in question from our code, we would be able to produce cajoled files. This involves changing our script block to the following:

<script type="text/javascript">
//redirect user to new site
function goRedirect(){
   var href = "http://www.yahoo.com";
   window.location = href;
}
</script>

If we were to then recajole the scripts:

notseveral-lm:caja jleblanc$ bin/cajole_html -i 
../git/programming-social-applications/caja/ch9_caja_sample_html.html 
-o caja_sample

We would get the following two output files:

caja_sample.out.html

The sanitized HTML of our file

caja_sample.out.js

The cajoled JavaScript of our original file, with the added layers of Caja security

Next, we’ll explore these files to see what content is produced.

The cajoled HTML

When we look at the content of the caja_sample_out.html file, we see the following:

<h1>Sample Redirection Script</h1>
<a id="id_2___" target="_blank">Click to Redirect</a>

Our html, head, and body elements have all been removed from the output. Since the content of a cajoled file is meant to exist within the body of some container, it will exist in the same DOM as that container and thus must not include competing root nodes. The content of our HTML file is stripped down to our h1 and the redirect <a> tag. Within the <a> tag, our onclick event is silently stripped out of the HTML content. When Caja runs into tags that are not allowed and it can safely remove those tags without compromising valuable output, the cajoler will silently strip them out and output usable files.

Warning

If you are embedding onclick handlers directly into your markup layer, Caja will most likely strip them from the returned HTML, depending on how strict your implementation is. To avoid this, you should attach JavaScript event handlers after the page content has loaded by using the traditional methods of working between object.onclick, object.attachEvent, or object.addEventListener.

The cajoled JavaScript

Next let’s look at the JavaScript file compiled during the cajoling process, caja_sample.out.js. If we open this file, we see a much larger JavaScript construct than we defined in our functional script block:

{
   ___.loadModule({
      'instantiate': function (___, IMPORTS___) {
         return ___.prepareModule({
            'instantiate': function (___, IMPORTS___) {
               var $v = ___.readImport(IMPORTS___, '$v', {
                  'getOuters': { '()': {} },
                  'initOuter': { '()': {} },
                  'cf': { '()': {} },
                  'ro': { '()': {} }
               });
               var moduleResult___, $dis, el___, emitter___, c_1___;
               moduleResult___ = ___.NO_RESULT;
               $dis = $v.getOuters();
               $v.initOuter('onerror'),
               {
                  emitter___ = IMPORTS___.htmlEmitter___;
                  el___ = emitter___.byId('id_2___'),
                  c_1___ = ___.markFuncFreeze(function (event, thisNode___) {
                     $v.cf($v.ro('goRedirect'), [ ]);
                  });
                  el___.onclick = function (event) {
                     return plugin_dispatchEvent___(this, event,
                     ___.getId(IMPORTS___), c_1___);
                  };
                  emitter___.setAttr(el___, 'id', 'redirect-' +
                  IMPORTS___.getIdClass___());
                  el___ = emitter___.finish();
               }
               return moduleResult___;
            },
            'cajolerName': 'com.google.caja',
            'cajolerVersion': '4319',
            'cajoledDate': 1288626955029
         })(IMPORTS___), ___.prepareModule({
            'instantiate': function (___, IMPORTS___) {
               var $v = ___.readImport(IMPORTS___, '$v', {
                  'getOuters': { '()': {} },
                  'initOuter': { '()': {} },
                  'so': { '()': {} },
                  's': { '()': {} },
                  'ro': { '()': {} },
                  'dis': { '()': {} }
               });
               var moduleResult___, $dis;
               moduleResult___ = ___.NO_RESULT;
               $dis = $v.getOuters();
               $v.initOuter('onerror'),
               try {
                  {
                     $v.so('goRedirect', ___.markFuncFreeze(function () {
                        var goRedirect;
                        function goRedirect$_caller($dis) {
                           var href;
                           href = 'http://www.yahoo.com';
                           $v.s($v.ro('window'), 'location', href);
                        }
                        goRedirect$_caller.FUNC___ = 'goRedirect$_caller';
                        goRedirect = $v.dis(___.primFreeze(goRedirect$_caller),
                           'goRedirect'),
                        return goRedirect;
                     }).CALL___());
                  }
               } catch (ex___) {
                  ___.getNewModuleHandler().handleUncaughtException(ex___,
                  $v.ro('onerror'), 'ch9_caja_sample_html.html', '13'),
               }
               return moduleResult___;
            },
            'cajolerName': 'com.google.caja',
            'cajolerVersion': '4319',
            'cajoledDate': 1288626955094
         })(IMPORTS___), ___.prepareModule({
            'instantiate': function (___, IMPORTS___) {
               var moduleResult___;
               moduleResult___ = ___.NO_RESULT;
               {
                  IMPORTS___.htmlEmitter___.signalLoaded();
               }
               return moduleResult___;
            },
            'cajolerName': 'com.google.caja',
            'cajolerVersion': '4319',
            'cajoledDate': 1288626955121
         })(IMPORTS___);
      },
      'cajolerName': 'com.google.caja',
      'cajolerVersion': '4319',
      'cajoledDate': 1288626955128
   });
}

The reason this new JavaScript block is so much more extensive than the code we started with is that the cajoled code applies error checks and security layers on top of our original code. Our original functionality is highlighted in the preceding example, now with secured access to our redirection code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.247.53