CHAPTER

9

Intel® Mash Maker

Rob Ennals
Intel Research

ABSTRACT

Intel® Mash Maker is a mashup creation tool that was initially developed at Intel Research and is now being developed by Intel’s Software Solutions Group.

Mash Maker allows a user to customize and improve Web pages by applying mashups that add additional content. Such “overlay mashups” add content that is visually distinguished from the host Web page, but is integrated into the normal page layout.

Mash Maker allows a user to apply a mashup to a Web page without having to trust the mashup or even know what the mashup does. Mash Maker suggests user-created mashups that a user might want to apply to the page that they are currently browsing. A user can try out one of these mashups to see if they like it, while knowing that it cannot do anything harmful, besides read the information on the current page.

Mash Maker uses a three-level structure to create mashups. Information is extracted from Web sites using wrappers that are written collaboratively by users in a wiki-like model. Widgets written in JavaScript query this information and add new information and visualizations to the page. A user can then arrange several widgets on a page to create a mashup which they publish and share with other users.

INTRODUCTION

Intel Mash Maker (Ennals & Gay, 2007; Ennals et al., 2007) is a browser extension for Firefox or Internet Explorer. Mash Maker allows a user to customize existing Web pages by applying mashups that add additional content. Such “overlay mashups” add content that is visually distinguished from the host Web page while being integrated into the normal page layout.

Mash Maker encourages users to take a “try it and see” approach to finding mashups that they like. As a user browses the Web, Mash Maker suggests mashups that it believes the user will find useful. If a mashup looks interesting then the user can turn it on, see if the added content looks useful, and turn it off again if they do not like it.

Mash Maker uses a three-level architecture (Figure 9.1) to create mashups: wrappers extract structured information from the HTML of a Web page; widgets query information that has been extracted from a page and publish new information and visualizations, and mashups specify what widgets should be added to a page, what their settings should be, and how their output visualizations should be integrated into the layout of a page.

Image

FIGURE 9.1

Wrappers, widgets, and mashups.

Most users interact with Mash Maker in a passive way, browsing to the sites that they normally use and occasionally turning on a suggested mashup. More skilled users may occasionally create a new mashup or edit a wrapper. Only expert users are expected to write their own widgets.

In this chapter, we discuss several of the key concepts behind Mash Maker:

1. Overlay mashups: Mash Maker allows a user to customize existing Web pages by applying mashups that add new content.

2. Mashup suggestions: Mash Maker tries to suggest mashups that a user will find useful.

3. Collaborative creation of Web wrappers: Mash Maker uses a wiki model in which each page type has a single canonical wrapper that can be edited by anyone.

4. The shared data tree: Mash Maker widgets communicate by reading and writing a shared data tree.

5. Untrusted widgets: Mash Maker restricts what a widget can do, allowing them to be untrusted.

6. Copy and paste: Mash Maker allows a user to combine Web sites using a simple “copy and paste” metaphor.

Intel Mash Maker was originally created at Intel Research and is now being developed and maintained by the Intel Software Solutions Group. You can download Intel Mash Maker from the following URL: http://mashmaker.intel.com.

EXAMPLE: NEWS ON A MAP

We will begin with an example of what it is like to browse the Web using Mash Maker.

Alice has the Mash Maker browser extension installed in her Web browser and opens the front page of CNN news. Mash Maker looks through its database of wrappers and finds that a wrapper called “CNN Home Page” knows how to extract information from this page. Mash Maker also finds that a mashup called “News Map” can be applied to pages that match “CNN Home Page”. Because other users who used this mashup said they liked it, Mash Maker suggests the mashup to Alice using a toggle button on its toolbar.

Alice isn’t sure what the “News Map” mashup does, or whether she will like it, but since it looks like it might be interesting, she turns on the mashup to see what it does.

When Alice turns on the “News Map” mashup, Mash Maker enhances the CNN home page by inserting a map into the existing page layout (Figure 9.2). This map shows the locations of all the stories on the CNN front page. Alice can browse around the map to see what is going on in different parts of the world, and can click on one of the pins on the map to see more information about a particular story.

Image

FIGURE 9.2

Showing CNN news stories on a map.

After a while, Alice decides that she doesn’t want to enhance the CNN home page in this way, and so she turns the mashup off by clicking on its toggle button again.

This example made use of the three layers of the Mash Maker architecture:

• A wrapper called “CNN Home Page” extracted the names and URLs of the stories on the front page and made this information available for use by widgets. The wrapper also specified good places where additional visualizations could be inserted into the page layout.

• A mashup called “News Map” was linked to the “CNN Home Page” wrapper. This mashup contains configured instances of the “Linked Data” and “Google Maps” widgets, and specifies a good place to insert the Google Maps map into the page layout.

• Two widgets called “Linked Data” and “Google Maps” made enhancements to the page. “Linked Data” followed the links for each of the stories,1 applied a wrapper to each story, and published the information extracted from each story page for other widgets to see. “Google Maps” looked for published information that had addresses, and showed it all on a map.

EXAMPLE: LEGROOM FOR FLIGHT LISTINGS

Bob wants to book a flight and is viewing a list of potential flights on Expedia. Mash Maker sees that this page can be handled by the “Expedia Flights” wrapper, and that the “Legroom” mashup is associated with this wrapper. Because Bob has rated the Legroom mashup highly in the past, Mash Maker suggests it to Bob.

When Bob turns on the “Legroom” mashup, Mash Maker enhances the Expedia search results by adding legroom information to each of the flight entries on the page (Figure 9.3). The legroom information is the range of legroom amounts that the airline offers for economy class seats. Bob decides that he would like legroom information to be present on Expedia searches by default, so he clicks the “pin” icon on the Mash Maker toolbar. The next time Bob does an Expedia search, the “Legroom” mashup will be applied as soon as the page has loaded.

As with the previous example, this example uses all three layers of the Mash Maker architecture.

• A wrapper called “Expedia Flights” extracts the airline for each flight, and specifies a location where additional information should be inserted.

Image

FIGURE 9.3

Adding legroom to Expedia.

• A mashup called “Legroom” enhances the page by adding configured instances of the Paste, Correct, and Join widgets.

• The Paste widget loads a remote page that contains legroom information from every airline, applies an extractor to it, and publishes the resulting information for use by other widgets. The Correct widget allows users to contribute alternative canonical names for airlines, to cope with the fact that the pasted page doesn’t use the same airline names as the Expedia page. For each flight, the Correct widget publishes a corrected airline name. The Join widget joins the corrected airline name with the airline in the pasted legroom table and then publishes a legroom field for each flight.

• The mashup specifies that the legroom field published by the Join widget should be made visible to the user, and specifies where it should be placed in the page layout.

OVERLAY MASHUPS

Mash Maker allows a user to enhance the Web sites that they browse by applying mashups that add additional content. In addition to the two examples given earlier there are many other possibilities. For example one might add price comparison information to a shopping site to see what the prices were like on other sites, add a button to every phone number that calls the number if you click it, or annotate craigslist housing ads to show the ratings of the nearest restaurants.

There has been a lot of prior work on building tools that modify existing Web sites. OreO (Brooks et al., 1995) and WBI (Barrett, Maglio, & Kellem, 1997; Maglio & Barrett, 2000) use a proxy to modify Web pages without requiring support from the Web browser. Other tools such as Greasemonkey and Chickenfoot (Chapter 3) run as extensions to the Web browser that modify Web pages on the client. Mash Maker was initially implemented as a Web proxy (Ennals & Gay, 2007) and was then re-implemented as a browser extension (Ennals et al., 2007).

Unlike previous work, Mash Maker only allows a mashup to add new content to an existing page. A mashup cannot edit or remove content. In addition, any content added to a page is surrounded by a blue border to visually distinguish it from the page’s original content. We refer to this restricted class of mashups as overlay mashups.

By restricting itself to overlay mashups, Mash Maker excludes some mashups that can be created in tools such as Chickenfoot. For example, a Chickenfoot mashup can change the text on a Web site, make text boxes resizable, add keyboard shortcuts to a Web site, or provide automatic login for Web sites.

There are three reasons why Mash Maker restricts mashups to be overlay mashups:

To make it clear what a mashup does: Unlike previous work, Mash Maker assumes that a user will apply a mashup to a Web site without knowing in advance what it is that the mashup does, or whether they should trust it. Because Mash Maker visually distinguishes the content it added, a user can quickly see what a mashup has done and thus evaluate whether it is useful.

To prevent mashups misbehaving: Since Mash Maker assumes that a user will be applying an untrusted mashup to a Web site without knowing what it does, it is important that Mash Maker restrict the extent to which mashups can do things that are unexpected or harmful. Restricting ourselves to overlay mashups makes this easier.

To reduce legal concerns: If a mashup can modify a page in an arbitrary way, what happens if a mashup modifies a page in a way that the content owner does not approve of? One of the reasons why Mash Maker restricts itself to overlay mashups is to reduce the likelihood that a content owner sees a mashup as an unlawful derived work. Since all additional content is visually distinguished from the page, and the mashup does not modify existing page content, one could argue that Mash Maker is a browser that overlays additional content about a page, rather than a tool that modifies someone else’s content.

More generally, if mashups are to become widespread then it is important that they be structured in a way that is beneficial or at least acceptable to content owners. This is the primary reason why Mash Maker cannot, at present, remove content from a Web page. If content could be removed then users could upset content owners by removing advertisements.

MASHUP SUGGESTIONS

As the user browses the Web, Mash Maker suggests mashups that it believes the user might like. Suggested mashups appear as toggle buttons on Mash Maker’s toolbar (Figures 9.1 and 9.4). A user can turn a particular mashup on or off by clicking on the button for that mashup. Mash Maker assumes that a user does not know what existing mashups will be useful for them, but that they will recognize useful mashups when they see them applied to a page.

Image

FIGURE 9.4

The Mash Maker toolbar.

Mash Maker builds on a lot of prior work that recommends content based on user browsing behavior (Teevan, Dumais, & Horvitz, 2005; Seo & Zhang, 2000; Rucker & Polanco, 1997; Kelly & Teevan, 2003; Gasparetti & Micarelli, 2007). The key feature that distinguishes Mash Maker from previous systems is that it recommends mashups rather than Web pages. Mash Maker bases its suggestions on the wrapper that matches the page currently being viewed, the user’s recent browsing history, the mashups the user seemed to like previously, and the behavior of other users.

If a user decides that they like a mashup, they can pin it by clicking on the pin button on the toolbar (Figure 9.4). Once a mashup is pinned, it will be enabled by default whenever the user visits a Web page that matches the current wrapper.

It is possible for a user to apply multiple mashups to the same page simultaneously. When a user has several mashups turned on, all mashups will add their content to the page. In some cases, one mashup can use information added by another mashup. For example if a user turns on a mashup that annotates apartment listings with nearby restaurants, and then turns on a mashup that displays everything with an address on a map, then the restaurants will also be shown on the map. If the new combination of mashups is itself an interesting mashup, then the user can save the new mashup by clicking on the share button on the toolbar (Figure 9.4).

Mash Maker’s suggestion system is largely orthogonal to the rest of the system. One could conceivably use Mash Maker’s suggestion toolbar to automatically suggest mashups created with a tool such as Chickenfoot (Chapter 3). The advantage of combining Mash Maker’s suggestion system with Mash Maker’s restricted overlay mashups is that a user can easily see what a mashup has done, it is easy to remove a mashup without reloading the page if the user decides they do not like it, and it is harder for a mashup to do something undesirable.

If the suggestion toolbar does not suggest an interesting mashup, a user can also use Mash Maker’s more conventional Web-based gallery to find mashups. The gallery allows one to search for mashups by keyword and can show either mashups associated with a particular wrapper, or mashups for arbitrary pages. Clicking on a mashup in the Web gallery takes the user to an example Web page with that mashup enabled.

There is a trade-off between privacy and suggestion accuracy. If Mash Maker knows more about user behavior then it can potentially be better at suggesting mashups, but this would be at the cost of sacrificing user privacy. In the current implementation, the client tells the server when a user turns a mashup on or off, or pins it – but it does not report browsing history, or any exact URLs. The Mash Maker client will sometimes take browsing history into account when suggesting a mashup, and will prefer mashups that use a service from a site the user recently browsed, but browsing history is not reported to the central server.

COLLABORATIVE CREATION OF WEB WRAPPERS

To apply a mashup to a Web page, Mash Maker needs to extract machine-readable data that it can use as input to the mashup. For example, if a mashup wants to add a map to an apartment listing site showing the location of each apartment, then Mash Maker needs to extract the apartment addresses from the Web site. Although standards such as microformats and RDFa exist to allow a Web site to expose machine-readable data directly in its HTML, at the time of writing most Web sites do not do this. Mash Maker thus extracts machine-readable information from raw HTML using user-created wrappers.

A wrapper (Laender et al., 2002) is a set of rules that can be used to extract machine-readable data from a Web site. Though some authors have had some success extracting data from a Web site without any human assistance (Crescenzi & Mecca, 2004; Simon & Lausen, 2005; Chang & Lui, 2001; Arasu & Garcia-Molina, 2003), most systems, including Mash Maker, require some form of user guidance to teach the wrapper-generator what a Web page means (Kushmerick, Weld, & Doorenbos, 1997; Ashish & Knoblock, 1997; Muslea, Minton, & Knoblock, 1999; Zheng et al., 2008; Irmak & Suel, 2006; Baumgartner, Flesca, & Gottlob, 2001b).

Mash Maker’s Web wrapper system has two unusual features. Firstly, Mash Maker organizes its wrappers in a wiki-style model in which there is a single canonical Web wrapper for any given Web page layout, and any user can edit any wrapper. Secondly, wrappers created by Mash Maker include drop zones that indicate where additional content should be inserted into the page layout.

In Mash Maker’s wiki-style wrapper model, any page is handled by one canonical wrapper, and any user can edit any wrapper. If a user believes that the canonical wrapper is not extracting the information they need then they must modify that wrapper, rather than creating a new one. CoScripter (Chapter 5) independently developed a similar wiki model for sharing scripts.

Mash Maker’s wiki-style model differs from the more conventional model in which multiple users create their own competing wrappers for different Web sites and writers of mashups pick the wrappers that work well for them. One advantage of the wiki model is that it removes the need for a user to look through a list of potentially broken wrappers in order to find the one that works best with their Web page. For example, if a user wants to create a new mashup that visualizes the data on a particular Web page using a calendar, then they can just drag a calendar widget onto the page, without having to choose what wrapper to use. Choosing a wrapper can be a confusing process: “I just want to apply the calendar to this page. Why do I have to choose something from this list?”

Another advantage of the wiki model is that it is easier to recover when a change to the structure of a Web page breaks a wrapper. In a study conducted by Dontcheva et al (2007a), 74% of tested Web sites underwent a significant structural change of the kind that would be likely to break a wrapper within the 5-month test period. If a broken wrapper was owned by a particular person then it would be necessary for every mashup creator to either wait for the wrapper owner to fix their wrapper, or to move their mashups to a new wrapper, adding the old wrapper to the set of dead wrappers that mashup authors need to avoid. With the wiki approach, the first user to notice that the wrapper has broken can open Mash Maker’s wrapper editor and fix the problem for all mashups. Although techniques exist to repair a broken wrapper automatically (Raposo et al., 2005; Meng, Hu, & Li, 2003; Chidlovskii, 2001), these techniques are not yet bulletproof.

The big disadvantage of the wiki model is that it opens Mash Maker up to potential vandalism (Denning et al., 2005). If anyone can edit a wrapper, then anyone can break a wrapper. A common problem during the early days of Mash Maker deployment was that a novice user would decide to experiment with the wrapper editing tools and break something high profile, such as Google search, without realizing that their changes were affecting all users. Mash Maker’s mechanisms for dealing with this problem are the same as those taken by text wikis such as Wikipedia (Riehle, 2006) – a history is recorded so that bad edits can be rolled back, and sensitive wrappers can be locked. In the future, one could potentially also use wrapper verification (Kushmerick, 2000) to prevent users from saving wrappers that had obviously been broken.

Image

FIGURE 9.5

Teaching Mash Maker what something means.

Mash Maker’s wiki model for Web wrappers is largely orthogonal to its other features. Mash Maker could have used the more conventional “roll your own” model, in which case a user would be required to choose a suitable wrapper when creating a mashup. Similarly, Mash Maker’s wrapper database can be useful for other tools; indeed Yahoo! SearchMonkey uses Mash Maker as one of its methods for extracting data from Web pages (Goer, 2008). Mash Maker’s wiki model is also largely independent of Mash Maker’s particular choice of wrapper editor. One could potentially adapt other wrapper editors to support Mash Maker’s wiki model by adding support for revision tracking and other collaborative features.

Like Chickenfoot (Chapter 3) and Greasemonkey, Mash Maker uses URL regular expressions to determine what pages a wrapper is applicable to. Users will not usually edit this regular expression themselves. Instead they provide example URLs that the wrapper should apply to, and Mash Maker automatically computes a regular expression that matches the known examples. If several wrappers match a URL, then Mash Maker chooses the one with the highest priority – where the priority is a user-editable number. More sophisticated techniques exist for determining what wrapper to apply to a page. Zheng et al. (2007) automatically cluster pages based on the similarity of their page layouts. Dontcheva et al. (2006) choose the wrapper with the highest number of matching rules. Although regular expressions can be difficult for end users, they have the advantage of being cheap and predictable.

Image

FIGURE 9.6

The wrapper editor (the UI calls a wrapper an extractor).

EXAMPLE: EDITING A WRAPPER FOR YELP

Carl is looking at business reviews on Yelp and has an idea for a useful mashup he could apply to the page. Carl opens the expert panel (Figure 9.6) and sees that the current wrapper for this page does not extract the address of a business.

Carl follows the process in Figure 9.5 to teach Mash Maker how to extract the address of a business. He clicks “pick from page” to tell Mash Maker that there is something on the page that he would like to extract. He then clicks on the address of one of the businesses on the current page. Mash Maker asks Carl whether what he clicked on is a property of the whole page, or a property of a repeated item on the page. Carl says that the name is a property of a repeated item, and that the repeated item is a “business review.”

A previous user had already shown Mash Maker how to find business reviews on the page, and the address that Carl clicked on was inside one of the business reviews that the wrapper already knew about. If this was not the case then Mash Maker would have asked Carl to show it what area of the page was occupied by the business review that the address was part of.

From this one example, Mash Maker attempts to infer a rule that will find an address for every business review on the page. If Mash Maker infers this rule incorrectly, then Carl can help Mash Maker out by either giving more examples or manually editing the underlying rules that Mash Maker has generated (revealed by turning on the “expert controls” option).

THE WRAPPER EDITOR

A Mash Maker wrapper extracts three kinds of information from a page:

• A top level property is something that appears once on a page. It has a name and a textual value.

• An item has a type and a set of item properties. A page may contain several items with the same type, and may contain items with multiple types. All items with the same type are found using the same matching rules. In our example we had items of type “business review.”

• An item property has a name and textual value and is contained inside an item.

Mash Maker’s wrapper editor works in a similar way to other user-guided wrapper editors, such as Pictor (Zheng et al., 2008), Lixto (Baumgartner, Flesca, & Gottlob, 2001b), WIEN (Kushmerick, Weld, & Doorenbos, 1997), STALKER (Muslea, Minton, & Knoblock, 1999), Dapper.net, and Irmak/Suel (Irmak & Suel, 2006). A user gives examples of things on the page that they think are interesting, and Mash Maker attempts to infer extraction rules from these examples. If Mash Maker’s inference does the wrong thing, then a user can either provide more examples or manually edit the inferred rules.

It addition to specifying where information can be found on a page, Mash Maker wrappers also specify the drop zones on a Web page where extra content should be added. A drop zone is a place on a Web page where it is good to insert additional content without disturbing the layout of the Web page. Since the best drop zones on a Web page are largely independent of the particular content being inserted, it makes sense that drop zones be factored into the wrapper, rather than requiring each mashup to specify its own layout from scratch. A mashup is free to ignore the drop zones in the wrapper and place content in other locations if the author thinks that is appropriate.

Mash Maker uses wrappers to extract data from a Web page even if a Web site provides programmatic APIs that can be used to query its data. This is for several reasons. Firstly, if an API was used, then it would still be necessary to use a wrapper to determine where the data was on the page (as done by d.mix (Hartmann et al., 2007)). Secondly, since the Web page is already loaded, it is more efficient to get the information from the HTML rather than accessing an external API. A mashup may however use APIs to bring in additional information from other sources that should be added to the current page.

Mash Maker wrappers do not follow “next” links to extract information from other pages on the same Web site. The purpose of a Mash Maker wrapper is only to extract information from the current page. If a mashup wants to obtain large amounts of information from a Web site then it can use a widget that loads that information using an API.

It is useful if wrappers agree on a common vocabulary to describe their data. For example, if one wrapper says “price” whereas another says “cost”, then it becomes harder to write widgets that can work across multiple Web sites. Mash Maker encourages wrapper authors to choose type and property names that conform to a common ontology. This ontology is editable by all users using a collaborative ontology editor. Mash Maker’s collaborative ontology editor is significantly more primitive than systems like Protege (Tudorache et al., 2008) or Freebase.com. It allows users to specify type names, associated property names, and simple subtype relationships, but lacks higher-level features. Unlike the ontology editor of FreeBase Mash Maker allows anyone to edit the properties that can be associated with a type, not just its owner. The motivation is to encourage people to reuse existing types rather than creating new ones.

Mash Maker does not currently use any kind of data detector (e.g., Miro (Chapter 4)) to detect objects on a page. Mash Maker does however take advantage of microformats when they are available.

THE SHARED DATA TREE

As mentioned in the introduction, Mash Maker uses a three-level architecture for creating mashups (Figure 9.1). Wrappers extract data from Web pages, widgets create additional data and visualizations, and mashups connect multiple widgets together and place their content in drop zones in the page layout. These three layers are largely independent: A wrapper may support many mashups, a widget may accept data from many wrappers, a mashup may use many widgets, and a widget may be used by many mashups.

Several other mashup tools allow one to create a mashup by composing multiple components. Good examples include Yahoo! Pipes, Microsoft Popfly, Clip Connect Clone (Chapter 8), and Marmite (Wong & Hong, 2006), all of which use some form of dataflow model. Pipes and Popfly adopt a visual dataflow programming model (Ingaiis, 1988; Koelma, van Balen, & Smeulders, 1992; Raeder, 1985) in which wires are drawn between widgets to allow data to flow between them, Clip Connect Clone allows one to specify dataflow using a spreadsheet metaphor, and Marmite behaves like Apple’s Automator by allowing one to create a mashup as a sequence of stages, each of which acts on the output of the previous stage. Mash Maker instead uses a tuple space–inspired (Gelernter, 1989) publish/subscribe model (Eugster et al., 2003) in which widgets communicate by reading and writing a shared data tree. The difference is similar to the difference between the Blackboard and Pipeline models in software engineering.

Mash Maker maintains a data tree for every Web page currently open in the browser, showing a structured view of the data on the Web page. Initially the data tree contains the information extracted from the page by the wrapper. Any widgets on the page can query the data on the page, add additional information to the data tree, and modify or remove information. An expert user can view the data tree for the current page using its tab in the expert side panel (Figure 9.7).

Image

FIGURE 9.7

A data tree showing additions from widgets.

The data tree is the only means by which widgets can communicate with each other. The Mash Maker API allows a widget to ask to be notified when the result of a query changes due to actions by another widget. For example, the map widget asks to be notified when the set of objects with addresses changes. This allows the map widget to dynamically update its map when other widgets add or remove objects with addresses.

The mental model is that adding a widget to a page creates an improved page, which can itself be enhanced further by adding more widgets. A widget is not expected to distinguish between information that was originally on the page and information that has been added to the page by other widgets. For example a price comparison widget does not care if the dollar price of an item was calculated by a separate currency conversion widget.

Mash Maker could have used the same visual dataflow approach that is used by Yahoo! Pipes and Microsoft Popfly. Under this model, the wrapper would be treated as being just another box in the network whose extracted data could be fed to other boxes. Work started on Mash Maker before Pipes or Popfly were publicly known, so no deliberate decision was taken to use a different model. There are however advantages of the “shared tree” model for the domain in which Mash Maker works.

One motivation for the “shared tree” model is that it allows a user to create a mashup by adding the features that they think that they want, without having to think about how they should fit into a logical structure. The widgets find each other by looking for information that they want that other widgets are providing. This works well for simple cases (e.g., find home country and visualize on a map), but for more complex mashups one may have to use a widget’s settings panel (Figure 9.8) to tell it which other widget it should be talking to.

Image

FIGURE 9.8

Steps to creating a mashup that shows Facebook friends.

Another motivation for the “shared tree” model is that it makes it easier for the physical location of a widget on the page to correspond to where its primary visualization will be inserted. If a visual dataflow model is used, then some layouts of boxes will make the wires hard to read. Since Mash Maker’s boxes are not connected by wires, this problem does not arise.

In addition to manipulating the data tree, a widget can also publish content that can be inserted into the layout of the page. Content can currently be either text, a clickable action icon, or an iframe that can contain arbitrary Web content. Figure 9.2 shows a map visualization running in an iframe that has been published by the Map widget. Figure 9.3 shows text annotations that have been associated with objects on a page.

A widget has no control of how its content will be integrated into the layout of the page, because the only view it has of the page is the data tree. It is entirely up to the user creating the mashup to insert content into the page layout appropriately. They can do this by either dropping content into drop zones described by the wrapper, or placing content at a physical location relative to the node that it is associated with. This separation allows widgets to focus on the high-level data processing task they are concerned with, without having to worry about how they might integrate into any particular layout.

Each widget can have a settings panel that can be used to configure their behavior. For some settings the default choice will usually be correct (e.g., a map widget should map everything on the page with an address), but for other settings a user is likely to want to set things manually in order to get good results (e.g., which property of an object should be used to decide its icon on the map). The options provided by a widget’s settings panel are entirely up to the widget – indeed the settings panel can contain arbitrary HTML.

Once a user has created a mashup that they think is useful, they can publish it by clicking on the “share” button. A user will then be prompted to enter a short description of their mashup and will be shown the preview screenshot that Mash Maker will save with the mashup (Figure 9.8). Once a mashup has been published, Mash Maker can suggest it to other users.

Figure 9.8 shows the process of creating a mashup that adds a map to the Facebook friends list. The map shows the location of each of the user’s friends, based on the “home town” information they provided in their profile. In this example the user opens the expert sidebar, double-clicks on two widgets to add them to the page, adjusts their settings appropriately, and then drops the map into an appropriate drop zone on the page. Once the user has created a mashup that they like, they can click the “share” button to make it available to other users looking at pages that are interpreted by the same wrapper.

UNTRUSTED WIDGETS

Mash Maker has to be particularly careful about security, because mashups have access to private data and users are encouraged to run untrusted mashups without investigating them beforehand.

Since Mash Maker runs inside the browser, it has access to all the information that the browser shows to the user. This includes Web pages that require logins, Web pages on intranets, and content that is generated dynamically inside the browser. The advantage of this approach is that it allows Mash Maker to mash up useful content that would not be easily accessible to a mashup that ran on a separate server (e.g., Pipes or Popfly). The disadvantage is that some of this information may be private information that should not be leaked to third parties.

Without proper precautions, one could easily open doors for hackers to steal confidential data. For example a mashup could scan email messages for passwords and use an external API to send them to a site run by an attacker. In the absence of security controls, any data that is visible on a Web site could be scraped by a wrapper and sent to a malicious Web site by a widget.

Other browser extension mashup tools like Greasemonkey and Chickenfoot (Chapter 3) suffer from this problem to a lesser extent. Unlike Mash Maker, Greasemonkey and Chickenfoot do not suggest mashups to users automatically. Instead, their model is more similar to installing desktop software; a user browses a list of recommended mashups hosted on a trusted Web site, picks one that seems useful and trustworthy, and installs it. By contrast, Mash Maker encourages users to turn on unvetted mashups written by unknown third parties, with little information available about them other than their name.

Mash Maker addresses this problem by distinguishing between trusted and untrusted widgets. A trusted widget is one that has been checked by the Mash Maker administrators to make sure it cannot leak data to an untrusted server. The choice of which widgets are trusted is subjective. The Google Maps widget is considered to be trusted, even though it sends addresses to Google. If the addresses were highly confidential and Google was considered to be untrusted then one might not want this widget to be applied to a page.

If a widget is not trusted, then it is not permitted to see any data that is fetched with cookies or HTTP authentication enabled. This restriction also prevents an untrusted widget being applied to information that another widget fetched with cookies. For example, if a calendar widget inserted information from your personal calendar then this could not be viewed by an untrusted widget. The intention is to restrict an untrusted widget to only be able to see content that it could see if it was running on another machine and prevent it from seeing content that was personalized for the current user. No mashups can be applied to a URL served as HTTPS.

One loophole in the “no cookies” security model is that an untrusted widget will still be able to see content that is private to a local intranet. The correct solution would be to consider any page fetched from a corporate intranet to be a “secure” page that cannot be seen by untrusted widgets, however it is difficult to determine what pages are on the intranet rather than being on the outside Web. In particular, simply checking whether a page can be fetched from a remote server is not sufficient, because some intranets provide private information on pages that have the same URL as a nonprivate page that is externally accessible. Mash Maker does not currently have a solution for this problem, and so Mash Maker is not recommended for use on corporate intranet Web pages.

A Mash Maker widget is implemented rather like a Google gadget. A widget is a piece of JavaScript code that runs inside its own iframe, embedded on the page. The browser’s same-origin policy prevents a widget from being able to directly manipulate the page that is being mashed up. This is in contrast to Greasemonkey, which injects mashup scripts directly onto the page.

Mash Maker provides extensions to the standard browser JavaScript API to allow widget code to query the shared data tree and publish information and visualizations. The implementation is similar to MashupOS (Howell et al., 2007) and SMash (Keukelaere et al., 2008) – although it was created independently.

COPY AND PASTE

As a special case, Mash Maker allows a user to create a mashup by using a copy-and-paste metaphor to combine Web sites. To create a mashup that inserts content from site A into site B, the user browses to site A, clicks the “copy” button on the toolbar (Figure 9.4), and then browses to site B and clicks “paste”. For example, to add legroom information to a flight listing, one can browse to a Web site that gives legroom information for different airlines, click “copy”, and then browse to a list of flights and click “paste”.

Mash Maker will try to guess how to combine the two sites together and create an appropriate mashup. In the current version of Mash Maker, the support for copy and paste is fairly simple. Mash Maker will look at the information extracted from the two Web sites and try to find a matching property that can be used for a simple join. The resulting mashup is implemented by adding an instance of the “paste” widget to the page. If the copy and paste result was not as desired then the user can tweak it using the settings panels on the Join widget. Subsequent to the release of Mash Maker, this concept has been improved on by Karma (Tuchinda, Szekely, & Knoblock, 2008a), which uses more intelligent techniques to guess how Web sites should be combined.

The core idea of using copy and paste to create Web sites was inspired by previous work on Web clipping tools (Lingam & Elbaum, 2008; schraefel & Zhu, 2002). While Mash Maker uses Web wrappers to extract data from a copied Web site, d.mix (Hartmann et al., 2007) takes a more elegant approach by determining an API that could be used to obtain information from the source site. In contrast, Mash Maker’s copy and paste system combines a pair of pages.

Dontcheva et al. (2007) take a more general approach in which the pages that should be joined together are found using a Web search.

SUMMARY

Mash Maker combines several novel ideas into a complete system: an overlay mashup extends a Web page by adding new content that is visually distinguished but integrated into the page layout; suggestions allow users to find useful mashups as they browse. The collaborative wiki model for wrapper creation allows a mashup to keep working even when the wrapper it depends on breaks. The shared data tree allows a user to create a mashup by adding a set of widgets to each other and letting them work out how to help each other. The untrusted widget model allows a user to apply a mashup to their page without having to trust it. And the copy and paste model allows a user to combine Web sites together by simply choosing which Web sites have information that they like.

Mash Maker was originally developed as a technology demonstration at Intel Research and is now being developed and maintained by a group in the Intel Software Solutions Group. At the time of writing, the preview version of Mash Maker has around 18,000 registered users, of whom 822 have created or edited a wrapper, and 387 have created a mashup. The Mash Maker database has 1190 mashups.

INTEL MASH MAKER

Intended users:

All users

Domain:

Web pages with structured content

Description:

Intel® Mash Maker is a browser extension that allows a user to customize Web pages by applying mashups that add additional content. Such “overlay mashups” add content that is visually distinguished from the host page while being integrated into the normal page layout.

Example:

When booking a flight, a user may wish to know how much legroom each flight is likely to provide. When the user browses a flight listing Mash Maker will suggest a mashup that adds legroom information to each flight on the page. If the user decides that they like this mashup then they can “pin” it so that it is always enabled for pages that list flights.

Automation:

Mash Maker automates the task of collecting and compositing information from different sources.

Mashups:

Yes, Mash Maker can combine information from multiple Web sites within the browser.

Scripting:

No, though expert users can write new “widgets” in JavaScript.

Natural language:

No.

Recordability:

No.

Inferencing:

Simple heuristics are used to infer what mashups a user might like, and to infer how widgets should be connected together to create a mashup.

Sharing:

Yes, mashups can be published to a Web site. Published mashups can be suggested to other users and users may vote for mashups that they like.

Comparison to other systems:

Like other systems, Mash Maker is a browser plugin that modifies existing Web pages. Mash Maker mashups are more restricted, since they can only add content to a page. This restriction prevents most malicious mashups and encourages users to try untrusted mashups that Mash Maker suggests.

Platform:

Implemented as an extension to the Mozilla Firefox browser.

Availability:

Publicly available at http://mashmaker.intel.com.

1To improve performance, Linked Data caches all of its fetched data on a shared server. The map will appear immediately provided that the cache already contains the data for the linked story pages.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.209.131