Improving on the First Solution

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Improving on the First Solution

Naturally, we wondered what we could do to improve our newly parallel program.

What’s wrong with using parallel_for in this example, you ask? The problem is that the applications that would use ODE are dynamic: the number of objects in the scene changes, the number of islands also changes over time, and so on. It’s hard to come up with a good grainsize parameter for parallel_for. Also, the more objects you have, the more time you spend searching for islands. What if we could make the island processing overlap with other operations? Our prototype does not allow this to happen, and it becomes a major problem.

The task scheduler interface is a natural place to go when you have a dynamic ability to find more tasks. Islands make good tasks, but in the parallel_for structure they are not balanced well because they pop up as surprises.

We created yet another class, process_one_island (Example 11-60), that has a class task as a parent. We overrode the pure virtual execute method where we just called dInternalIslandStepFast for the given island. So, we just described our own logical task.

Tip

Aha! Slipping in a task when you can identify work is easy and can be very powerful.

We wanted processIslandsFast to spawn tasks. To do this we needed to create a root task. We wrote a new class for this root task, process_world, which would just call a modified processIslandsFast (processIslandsFastTask, shown in Example 11-61). The modified processIslandsFast needed to be changed to accept a new parameter—a reference to the root task, which is used to create, spawn, and wait for process_one_island tasks. The new children tasks were created and spawned as soon as the new island was found (so that we did not have to wait until all islands were found to start processing, like we did in the parallel_for case).

Example 11-60. ODE: process_one_island (task)

// Task: calls solver for the given island
class process_one_island: public tbb::task
{
    island_t &p;
    int obj_num;
public:
    process_one_island (island_t& params, int num):
                        p (params), obj_num (num) {}
    task* execute ()
    {
        dInternalStepIslandFast (obj_num, p);
        return NULL;
    }
};

Example 11-61. ODE: processIslandsFastTask

void processIslandsFastTask (tbb::task& tbb_root,
                             int counter, world_t& world);
// Root task: calls processIslandsFastTask to search for islands
// spawns process_one_island tasks for the found islands
class process_world: public tbb::task
{
    int counter;
    world_t& world;
public:
    process_world (int num, world_t& w) :
                   world (w), counter(num) {}
    task* execute ()
    {
        processIslandsFastTask (*this, counter, world);
        return NULL;
    }
};

// Modified processIslandsFast: spawns process_one_island task
// for the island new parameter "tbb_root" is added
void processIslandsFastTask (tbb::task& tbb_root,
                             int counter, world_t& world)
{
    int obj_num = counter;
    for (size_t i = 0; i < world.size(); i++) {
        busy (search_time);
        if (i % counter == 0) {
            // Spawn the solver task immediately,
            // not wait until all islands are found
            tbb::task& t =
              *new (tbb_root.allocate_additional_child_of (tbb_root))
                 process_one_island (island_t(), obj_num);
            tbb_root.spawn (t);
        }
    }
    // Join all the children tasks
    tbb_root.wait_for_all ();
}

We are seeing more parallel work now, and better resource utilization. The new version can adapt to the dynamically changing environment better than the parallel_for version.

The first attempt gave a 1.1X speedup, whereas the second effort gave us a 1.29X speedup when run with 400 simple objects on a quad-core system, all in the span of less than a day of effort.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Improving on the First Solution

Create new playlist

Sign In

Sign Up

Improving on the First Solution

Tip

Table of Contents for
Improving on the First Solution