Before we begin compiling our mixed HTML and JavaScript documents into a safe code subset, we need to look at the tools that we will be using.
In the caja directory that we created for the project, you’ll see a directory containing the scripts that we will use to compile our code. The cajole_html script is specific to the task of cajoling standard HTML and JavaScript, and it’s the script we’ll use here to cajole our standard code. After the cajoling process completes, we will have two output files:
An HTML output file containing the markup of our script, divorced from any embedded JavaScript blocks. This HTML file will contain secure, directly embeddable markup that we can insert within a site. All unsafe markup tags, such as iframes, will be stripped from the final derived markup.
The cajoled JavaScript file. The JavaScript will be a secured version of what we started with, stripping out any insecure script.
To run the mixed HTML/JavaScript command-line cajoler, we can
simply go to the root of the caja
directory from which we checked out the SVN source and run the
appropriate cajole_html
script with a
few parameters:
cd caja bin/cajole_html -i <htmlInputFile> -o <outputTarget>
cajole_html
allows us to
specify an input file to cajole (htmlInputFile
) and an output filename to dump
our two cajoled files to (outputTarget
). htmlInputFile
can be an absolute URL of a file
to be cajoled or a direct reference to a file on the local system.
outputTarget
is simply the string
name to call the output files, along with the file path to build them
to. The two output files will be named:
{outputTarget}.out.html
{outputTarget}.out.js
These are the two files that you should expect to be generated when you run the cajoler against a source file with mixed HTML and JavaScript.
Let’s look at an example of the cajoling process:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>Caja Sample HTML</title> </head> <body> <h1>Sample Redirection Script</h1> <a onclick="goRedirect()">Click to Redirect</a> <script type="text/javascript"> //redirect user to new site function goRedirect(){ var redirects; with(redirects){ var href = "http://www.yahoo.com" window.location = href; } } </script> </body> </html>
When we cajole this mixed HTML/JavaScript file via the command line, we get the following messages:
1 notseveral-lm:caja jleblanc$ bin/cajole_html -i ../git/programming-social-applications/caja/ch9_caja_sample_html.html -o caja_sample 2 LOG : Checkpoint: LegacyNamespaceFixupStage at T+0.113971 seconds 3 LOG : Checkpoint: ResolveUriStage at T+0.12005 seconds 4 LOG : Checkpoint: RewriteHtmlStage at T+0.124126 seconds 5 LINT : ch9_caja_sample_html.html:16+42: Semicolon inserted 6 LOG : Checkpoint: InlineCssImportsStage at T+0.204033 seconds 7 LOG : Checkpoint: SanitizeHtmlStage at T+0.204083 seconds 8 WARNING: ch9_caja_sample_html.html:2+1 - 23+8: folding element html into parent 9 WARNING: ch9_caja_sample_html.html:3+1 - 5+8: folding element head into parent 10 WARNING: ch9_caja_sample_html.html:4+1 - 32: removing disallowed tag title 11 WARNING: ch9_caja_sample_html.html:6+1 - 22+8: folding element body into parent 12 LOG : Checkpoint: ValidateCssStage at T+0.206399 seconds 13 LOG : Checkpoint: RewriteCssStage at T+0.222766 seconds 14 LOG : Checkpoint: HtmlToBundleStage at T+0.222807 seconds 15 LOG : Checkpoint: OptimizeJavascriptStage at T+0.279367 seconds 16 LOG : Checkpoint: ValidateJavascriptStage at T+0.279401 seconds 17 ERROR : ch9_caja_sample_html.html:15+5 - 18+6: "with" blocks are not allowed 18 LOG : Checkpoint: ConsolidateCodeStage at T+0.553624 seconds 19 LOG : Checkpoint: CheckForErrorsStage at T+0.561566 seconds
For the sake of this example, we will ignore the LOG
messages—they are just notifications at
different stages of the cajoling process. We are, however, interested
in the LINT
, WARNING
, and ERROR
messages, as those are pertinent to
our build process.
The LINT
message on line 5
states that a semicolon was inserted. This message was generated
because we forgot a semicolon at the end of the line when we defined
our url
parameter in our HTML
sample code. By default, JavaScript tries to help developers by
automatically inserting a semicolon if one was omitted. But because
this process can sometimes
insert semicolons where you do not want them—causing errors in the
program flow—messages like this are produced.
Next, we have the WARNING
messages on lines 8 through 11. Since Caja is building an HTML and JavaScript file to be
inserted on an existing page (such as a gadget), the html
, head
, and body
tags are all folded up into the parent
and thus removed from the output
in the HTML file. In addition, the title
element is also removed because the
code base is running in an existing container.
Last is the ERROR
message on
line 17, which tells us that with
blocks in JavaScript are not allowed in compiling. This error will
stop the cajoling process and not produce output files.
If we were to remove the with
block in question from our code, we would be able to produce cajoled
files. This involves changing our script
block to the following:
<script type="text/javascript"> //redirect user to new site function goRedirect(){ var href = "http://www.yahoo.com"; window.location = href; } </script>
If we were to then recajole the scripts:
notseveral-lm:caja jleblanc$ bin/cajole_html -i ../git/programming-social-applications/caja/ch9_caja_sample_html.html -o caja_sample
We would get the following two output files:
The sanitized HTML of our file
The cajoled JavaScript of our original file, with the added layers of Caja security
Next, we’ll explore these files to see what content is produced.
When we look at the content of the caja_sample_out.html file, we see the following:
<h1>Sample Redirection Script</h1> <a id="id_2___" target="_blank">Click to Redirect</a>
Our html
, head
, and body
elements have all been removed from the
output. Since the content of a cajoled file is meant to exist within
the body of some container, it will exist in the same DOM as that
container and thus must not include competing root nodes. The content
of our HTML file is stripped down to our h1
and the redirect <a>
tag. Within the <a>
tag, our onclick
event is silently stripped out of
the HTML content. When Caja runs into tags that are not allowed and it
can safely remove those tags without compromising valuable output, the
cajoler will silently strip them out and output usable files.
If you are embedding onclick
handlers directly into your markup
layer, Caja will most likely strip them from the returned HTML,
depending on how strict your implementation is. To avoid this, you
should attach JavaScript event handlers after the page content has
loaded by using the traditional methods
of working between object.onclick
, object
.attachEvent
, or object.addEventListener
.
Next let’s look at the
JavaScript file compiled during the cajoling process, caja_sample.out.js. If we open this file, we see
a much larger JavaScript construct than we defined in our
functional script
block:
{ ___.loadModule({ 'instantiate': function (___, IMPORTS___) { return ___.prepareModule({ 'instantiate': function (___, IMPORTS___) { var $v = ___.readImport(IMPORTS___, '$v', { 'getOuters': { '()': {} }, 'initOuter': { '()': {} }, 'cf': { '()': {} }, 'ro': { '()': {} } }); var moduleResult___, $dis, el___, emitter___, c_1___; moduleResult___ = ___.NO_RESULT; $dis = $v.getOuters(); $v.initOuter('onerror'), { emitter___ = IMPORTS___.htmlEmitter___; el___ = emitter___.byId('id_2___'), c_1___ = ___.markFuncFreeze(function (event, thisNode___) { $v.cf($v.ro('goRedirect'), [ ]); }); el___.onclick = function (event) { return plugin_dispatchEvent___(this, event, ___.getId(IMPORTS___), c_1___); }; emitter___.setAttr(el___, 'id', 'redirect-' + IMPORTS___.getIdClass___()); el___ = emitter___.finish(); } return moduleResult___; }, 'cajolerName': 'com.google.caja', 'cajolerVersion': '4319', 'cajoledDate': 1288626955029 })(IMPORTS___), ___.prepareModule({ 'instantiate': function (___, IMPORTS___) { var $v = ___.readImport(IMPORTS___, '$v', { 'getOuters': { '()': {} }, 'initOuter': { '()': {} }, 'so': { '()': {} }, 's': { '()': {} }, 'ro': { '()': {} }, 'dis': { '()': {} } }); var moduleResult___, $dis; moduleResult___ = ___.NO_RESULT; $dis = $v.getOuters(); $v.initOuter('onerror'), try { { $v.so('goRedirect', ___.markFuncFreeze(function () { var goRedirect; function goRedirect$_caller($dis) { var href; href = 'http://www.yahoo.com'; $v.s($v.ro('window'), 'location', href); } goRedirect$_caller.FUNC___ = 'goRedirect$_caller'; goRedirect = $v.dis(___.primFreeze(goRedirect$_caller), 'goRedirect'), return goRedirect; }).CALL___()); } } catch (ex___) { ___.getNewModuleHandler().handleUncaughtException(ex___, $v.ro('onerror'), 'ch9_caja_sample_html.html', '13'), } return moduleResult___; }, 'cajolerName': 'com.google.caja', 'cajolerVersion': '4319', 'cajoledDate': 1288626955094 })(IMPORTS___), ___.prepareModule({ 'instantiate': function (___, IMPORTS___) { var moduleResult___; moduleResult___ = ___.NO_RESULT; { IMPORTS___.htmlEmitter___.signalLoaded(); } return moduleResult___; }, 'cajolerName': 'com.google.caja', 'cajolerVersion': '4319', 'cajoledDate': 1288626955121 })(IMPORTS___); }, 'cajolerName': 'com.google.caja', 'cajolerVersion': '4319', 'cajoledDate': 1288626955128 }); }
The reason this new JavaScript block is so much more extensive than the code we started with is that the cajoled code applies error checks and security layers on top of our original code. Our original functionality is highlighted in the preceding example, now with secured access to our redirection code.
3.143.247.53