Multi-Robot Planning and Coordination

Motivation

For many applications, a team of robots can be effectively used. A robot team can accomplish a given task more quickly than a single agent can by dividing the task into sub-tasks and executing them concurrently. A team can also make effective use of specialists designed for a single purpose (e.g., scouting an area, picking up objects, hauling payload), rather than requiring that a single robot be a generalist, capable of performing all tasks but expert at no tasks.

The Problem

Centralized Approaches

The difficulty arises in coordinating all of these robots to perform a single, global task. One approach is to consider the robot team to be a single robot “system” with many degrees of freedom. A central computer coordinates the group optimally to perform the specified task. The problem is that optimal coordination is computationally difficult—the best known algorithms are exponential in complexity. Thus, the approach is intractable for teams larger than a few robots, even if a powerful computer is available. Additionally, the approach assumes that all information about the robots and their environment can be transmitted to a single location for processing and that this information does not change during the time that an optimal plan is constructed. These assumptions are unrealistic for problems in which the environment is unknown and/or changing, communication is limited, and robots behave in unpredictable ways. Another weakness with this approach is that it produces a highly vulnerable system. That is, if the leader (the central planning unit) malfunctions, a new leader must be available or the entire team is disabled.

Distributed Approaches

Distributed approaches address the problems that arise with centralized, globally coordinated approaches. The idea is that each robot operates largely independently, acting on information that is locally available through its sensors. A robot may coordinate with other robots in its vicinity, perhaps to divide a problem into multiple sub-problems or to work together on a sub-task that cannot be accomplished by a single robot. The advantage of this approach is that it typically requires little computation, since each robot need only plan and execute its own activities. Also, little communication is required, since the robots only communicate with others in their vicinity. The robots are better able to respond to unknown or changing environments, since they sense and respond to the environment locally. Moreover, the system is more robust since the entire team’s performance no longer depends on the guidance of a single leader. The approach works best for problems that can be decomposed into largely unrelated sub-problems, or problems for which a desired group behavior results from the aggregate of individual behaviors and interactions. The primary weakness is that the robots can execute the global task in a grossly suboptimal way.

Generalized Robotic Autonomous Mobile Mission Planning System (GRAMMPS)

My early work in multi-robots focused on addressing the problem of uncertainty and change in the robots' environment. Barry Brumitt and I developed the GRAMMPS system for planning and executing mobile missions using a team of heterogeneous robots [Brummit, Stentz, 96] [Brummit, Stentz, 98] . GRAMMPS is a two-layer architecture. The top layer runs on a single robot, designated the leader, and the bottom layer runs on all of the robots. The top layer accepts a mission description statement that specifies a set of goals to be visited by the robot team, with some constraints, such as ordering constraints or limitations on which robots can visit which goals. The mission statements are required to satisfy the following grammar:

m -> M(r,g) | (m and m) | (m or m) | (m -> m)

r -> Ri | (r and r) | (r or r)

r -> Gj | (g and g) | (g or g)

where lower case m, r, and g symbols are variables for move commands, robots, and goals respectively, and the upper case letters are specific instances of the same. An example mission statement is:

M(R1 or R2 or R3, G1 and G2 and G3 and G4) -> M(R1 and R2 and R3, S)

This mission specification states that at least one of the team of three robots (R1, R2, and R3) should visit all four goals (G1, G2, G3, and G4), followed by a return of all three team members to the start state S. This mission implements the distributed travelling salesman problem (MTSP).

The top layer of GRAMMPS parses the mission statement and assigns goals to robots, including the ordering. Since the grammar permits NP-complete problems to be specified, large problems are solved approximately and small problems optimally. The leader then dispatches the assignments to the robots in the team (i.e., lower layer of architecture). The robots plan routes to the sequence of goals and begin executing them. As the robots drive through the uncertain and changing environments, they discover new obstacles and re-plan the routes, using the D* algorithm [Stentz, 94] [Stentz, 95] . If the cost of a new route is significantly different than the old route, the robot reports the cost change to the leader (top layer). The leader then re-plans the entire mission, possibly cancelling old goal assignments and making new assignments for one or more robots.

GRAMMPS was implemented and tested on real robots. The following figures show results for a team of two HMMWVs equipped with laser rangefinders and GPS sensors, capable of driving to goal locations while avoiding obstacles. The mission called for the team to visit a total of eight goals, minimizing total travel time. The environment was initially assumed to contain no obstacles. GRAMMPS produced the initial assignments and tours for the two robots (one in blue and one in green) as shown below. Note that one robot received three goals and the other five.

As the robots drove the mission, the blue vehicle was deflected off course, due to unexpected obstacles. GRAMMPS decided to re-plan assignments and to give the blue vehicle one of green's goals since it was headed in that direction anyway.

Finally, the robot team visited all eight goals and returned to base. The resultant tours taken by the two robots are shown below.

GRAMMPS is quite good at responding to uncertainty and changing conditions. It is able to re-plan both routes and goal assignments as new information is discovered. However, GRAMMPS is a centralized system. If the leader is incapacited or is unable to communicate to its team members, the system breaks. Also, GRAMMPS is limited in its ability to solve large problems. The top layer is the bottleneck, since it must solve (at least approximately) what amounts to NP-complete problems each time substantially new information is discovered by any robot, leading to increasingly larger delays in the system.

TraderBots

Market Economies

GRAMMPS has problems scaling up to large problems and is dependent on centralized planning and heavy communications. My interests shifted to addressing these problems. I noted that an economic system, in particular a market economy, addresses these issues quite well, even for a very large number of citizens. Bernardine Dias and I developed the TraderBots architecture [Stentz, Dias, 99] [Dias, Stentz, 00] that implements a market economy for a team of robots. Market techniques have been used extensively by software agents, but TraderBots is the first application of such techniques to physical robots.

To understand TraderBots, consider an economic system for coordinating robots. An economy is nothing more than a population of agents (i.e., citizens) producing a global output. The agents coordinate with each other to produce an aggregate set of goods. Centralized economies, such as socialist/communist systems, suffer from an inability to gather all salient information, uncertainty in how to optimize with it, and unresponsiveness to changing conditions. Additionally, since economic output is divided equally amongst the entire population, individuals have little incentive to work harder or more efficiently than what is required to minimally comply with the economic plan. Individual input is de-coupled from individual output. The net effect is a sluggish, brittle, inefficient economy.

Free market economies are unencumbered by centralized planning; instead, individuals are free to exchange goods and services and enter into contracts as they see fit. Despite the fact that individuals in the economy act only to advance their own self-interests, the aggregate effect is a highly productive society. Individuals are in the best position to understand their needs and the means to satisfy them. Thus, individuals reap the direct benefits of their own good decisions and suffer the direct consequences of their bad ones. At times they cooperate with other members of the society to achieve an outcome greater than that possible by each member alone. At times they compete with other members to provide goods or services at the lowest possible cost, thus eliminating waste and inefficiency. But at every turn, the individual members act solely to reap the greatest profit for themselves.

Determining Revenues and Costs

Consider a team of robots assembled to perform a particular mission. The goal of the team is to perform the mission well while minimizing costs. A function, Fo, is needed that maps possible mission outcomes onto revenue values. Another function, Fi, is needed that maps possible schemes for performing the mission onto cost values. As a team, the goal is to execute some plan P such that profit, Fo(P) - Fi(P), is maximized.

But it is not enough to define just the revenue and cost functions for the team. These functions must provide a means for distributing the revenue and assessing costs to individual robots. Preferably, these individual revenues and costs are assigned based on factors over which the individuals have direct control. For example, if the mission is to find and retrieve a scattering of objects, the team’s revenue, Fo, could be the number of objects retrieved (converted to a “cash” value), and the team’s cost, Fi, could be the amount of energy consumed by the entire team to find the objects (again, converted to a cash value). The individual revenues and costs, fo and fi, could be the cash value of the number of objects turned in and the energy expended, respectively, by that individual.

Therefore, the sum of the individual revenues and costs equals the team’s revenues and costs. However, the distribution is not even: individuals are compensated in accordance with their contribution to the overall mission, based on factors that are within the control of the individual. An individual that maximizes its own personal production and minimizes its own personal cost receives a larger share of the overall profit. Therefore, by acting strictly in their own self-interests, individuals maximize not only their own profit but also the overall profit of the team.

The various features of the TraderBots architecture are described below.

The Role of Price and the Bidding Process

Robots receive revenue and incur costs for accomplishing a specific team mission, but the team’s revenue function is not the only source of income. A robot can also receive revenue from another robot in exchange for goods or services. For example, a robot may not be equipped to find objects for which the team function provides revenue, but it can transport the objects to the goal once they have been found. Therefore, this haulage robot provides a service to the robots that find the objects, and it receives payment for performing such a service.

The price dictates the payment amount for the good or service. How is the price determined? Assume that robot A would like to purchase a service from robot B. Robot B incurs a cost Y for performing the service. Robot A can make an additional revenue of X if B performs the service for it. Therefore, if X > Y, then both parties have an incentive to execute the deal. But how should the composite profit, X - Y, be divided amongst the two parties? It may sound fair to split the winnings (X - Y) / 2 by setting the price at (X + Y) / 2. But robots A and B may have other opportunities—they may be considering other deals that contend for the same money and resources. Robot A may have another way of spending its money that nets it more than (X - Y) / 2 dollars. Likewise, robot B may have another opportunity to provide its service for a net exceeding (X - Y) / 2 dollars.

Since all of the factors that contribute to a robot’s decision about how much it can charge or pay for a given good or service may be hidden or difficult to reason about, a common approach is to bid for a good or service until a mutually acceptable price is found. For example, robot A could start by bidding a price of Y (i.e., robot A receives the entire profit). Robot B could decline and counter with a bid of X (i.e., robot B receives the entire profit). The idea is to start by bidding a price that is personally most favorable, and then successively retreat from this position until a price is mutually agreed upon.

Note that a given robot can negotiate several potential deals at the same time. It begins by bidding the most favorable price for itself for all of the deals, successively retreats from this position with counter bids, and closes the first deal that is mutually acceptable. Note also that a deal can be multi-party, requiring that all parties agree before any part of the deal is binding.

The negotiated price will tend toward the intersection of the supply and demand curves for a given service. If a service is in high demand or short supply, the price will be high. This information will prompt other suppliers to enter the fray, driving the price down. Likewise, if demand is low or supply high, the low price will drive suppliers into another line of business. Thus, price serves to optimize the matching of supply to demand.

Finally, it is important to note that price and bidding are low bandwidth mechanisms for communicating aggregate information about costs. When consumers decide between purchasing apple juice or orange juice for breakfast, they do not analyze land acreage dedicated to both crops, the costs of producing each, the demand for each, and the impact of weather and pest infestations. Instead, they merely look at the price of each and weigh them against their own personal preferences. Yet the price encodes all of these factors in a concise fashion that enables them to make a locally optimal decision based on low-bandwidth information available at the point of sale.

Cooperation vs. Competition

As described above, robots interact with each other to exchange goods and services. Two robots are cooperative if they have complementary roles, that is, if both robots can make more profit by working together than by working individually. Generally, robot teams foster cooperation between members of different types (heterogeneous). For instance, a robot able to grasp and lift objects and a robot able to transport objects could team together to provide a pick-and-place service that neither one could offer independently.

Conversely, two robots are competitive if they have the same role, that is, if the amount of profit that one can make is negatively affected by the presence of the other robot. Generally, robot teams foster competition amongst members of the same type (homogeneous). For instance, two robots that are able to transport objects compete for the services of a given grasping robot, thus driving the price down. Either one could charge more money if the other were not present.

These delineations are not strict however. Subgroups of heterogeneous robots could form that provide a given service. These subgroups would compete with each other, thus providing an example where robots of different types compete rather than cooperate with each other. Heterogeneous robots could also compete if the same task can be accomplished in different ways. Conversely, two robots of the same type may cooperate by agreeing to segment the market. This can make sense if the relative positions of the two robots make for a cost-effective division of labor (i.e. if the robots cooperate they could both make more profit than if they compete). In this sense, even homogeneous robots can be considered heterogeneous since they differ in location or situation. Homogeneous robots can also cooperate if accomplishing a specific task requires more than one robot. For example, several robots with grasping capability may need to cooperate in order to move a heavy object, or two robots with cameras may need to cooperate to provide stereo-mapping data. The flexibility of the market-model allows the robots to cooperate and compete as necessary to accomplish a task, regardless of the homogeneity or heterogeneity of the team.

Self Organization

Conspicuously absent from the free market system is a rigid, top-down hierarchy. That is not to say that the system is in anarchy; instead, the robots organize themselves in a way that is mutually beneficial. Since the aggregate profit amassed by the individuals is directly tied to the success of the mission, this self-organization yields the best results.

Consider a group of ten robots. An eleventh robot, A, offers its services as their leader. It does not become their leader by coercion or decree, but by convincing the group that they will make more money by following its advice than by acting individually or in subgroups. A does this by investigating “plans” for utilizing all ten robots. If A comes up with a truly good plan, it will maximize profit across the whole group. The prospective leader can use this large profit to bid for the services of the group members, and of course, retain a portion of the profit for itself. The leader may be bidding not only against the individuals’ plans, but also against group plans produced by other prospective leaders. This self-organization need not be limited to two levels. A leader could organize a group of subgroups, each of which has its own leader.

From this example, it is evident that leadership roles are generally reserved for the “deep thinkers”, the robots with the computational capacity to reason about potentially complicated group interactions. If the team has no deep thinkers, the mission may still be achievable but in a less optimal fashion. The deep thinkers offer the opportunity to milk every last bit of efficiency out of the team. But there is a limit to this organization. As the group becomes larger, the combinatorics become intractable and the process of gathering all of the relevant information to produce a good plan becomes increasingly difficult. A leader will realize this when it can no longer convince its subjects (via bidding for their services) to follow its plans.

Learning and Adaptation

The robot economy is able to learn new behaviors and strategies as it executes its mission. This learning applies to both individual behaviors and negotiations as well as to the entire team. Individual robots may learn that certain strategies are not profitable, or that certain robots are apt to break a contract by failing to deliver the goods or proper payment. Individuals may also learn successful bidding strategies or which deals to offer when. The robot team may learn that certain types of robots are in over-supply, indicated by widespread bankruptcy or an inability to make much money. Conversely, the robot team may learn that certain types of robots are in under-supply, evidenced by excessive profits captured by members of the type. Thus, the population can learn to exit members of one type and enter members of another. Moreover, in this approach, successful agents are able to accumulate wealth and perpetuate their winning strategies because of their ability to offer higher payments to other agents.

One of the greatest strengths of the market economy is its ability to deal successfully with changing conditions. Since the economy does not rely on a hierarchical structure for coordination and task assignment, the system is highly robust to changes in the environment, including malfunctioning robots. Disabling any single robot should not jeopardize the system’s performance. By adding escape clauses for “broken deals”, any tasks undertaken by a robot that malfunctions can be re-bid to other robots, and the entire mission can thus be accomplished. Furthermore, if a robot is partially disabled in some manner (e.g., a robot is immobilized due to losing a wheel), it can still be useful to the economy by seeking a new role that makes use of its remaining resources (e.g., computation or communication resources). Thus, the market model allows the robots to deal with the uncertainties of a dynamic environment in an opportunistic and adaptive manner.

Subcontracting Example

Consider the following example. We would like a team of two robots to pick up two objects, A and B, and return to their home base. We would like for the robots to minimize their total travel cost when fetching the objects. The problem is depicted in the figure below. We define the costs to be one dollar for each foot of travel. We open by requesting bids from the two robots to perform the two tasks. For the task to retrieve object A, Robot 1 incures a cost of $50 to drive to A and $50 to drive back. Robot 1 bids $120 for the task, covering its cost plus $20 profit. Robot 2 incures a cost of $110 to drive to A and $110 to drive back. Robot 2 bids $240 for the task, covering its cost plus $20 profit. Similarly, Robot 1 bids $220 and Robot 2 bids $180 for Task B. Based on lowest bids, we award Task A to Robot 1 and Task B to Robot 2. Note that Robot 1 makes a profit of $20, and Robot 2 makes $30.

Note that although this solution is not globally optimal, it is reasonable. In fact, it may be the best solution the team can come up with, assuming they are limited in their computational abilities, or are pressed for time.

After the initial assignments are made, assume the robots get some more time to think. Robot 2 notices that Robot 1 is already committed to going to A. Since A is near B, it may be cost effective for Robot 1 to also retrieve object B. If Robot 2 subcontracts Task B to Robot 1, it will save $150, namely its travel distance. Therefore, Robot 2 would be willing to subcontract Task B if it ends up paying less than $150 to have Robot 1 retrieve B. Likewise, Robot 1 notices that by detouring to pick up object B, the cost of its tour becomes $50 + $60 + $100 = $210. This is $210 - $100 = $110 more cost than it would incur if it just retrieved A. Therefore, Robot 1 would be willing to fetch B if it receives more than $110 for the subcontract. See the figure below.

Therein are the makings of a deal. Robots 1 and 2 negotiate and arrive at a subcontract price of $130. Note that both of the robots increase their profits through the subcontract, therefore they are incentivized to make the deal. But more importantly, we are happier because the team task is solved for less cost: 210 feet driven instead of 250 feet.

Mapping Environments with Known Structure

TraderBots was initially implemented and tested in simulation for a mapping application [Dias, Stentz, 00] . A team of robots was tasked with visiting a set of locations inside of a building with a known floor plan to collect sensor data, while minimizing total distance travelled. An OpExec, representing the user, started the process by soliciting bids from the robots for each of the locations. Using a greedy algorithm, the OpExec awarded the tasks, one by one, to the lowest bidder. After initial assignments were made, the robots were free to buy/sell task assignments amongst themselves.

The figure below shows the case for four robots and 20 locations. The robots started in the four corners of the environment and bid on the 20 interior locations. The initial robot tours are shown in the left figure below. Note that the tours are relatively inefficient. The figure on the right shows the final tours after the robots traded tasks amongst themselves to optimize costs, and no more profitable deals could be made.

The figure below shows the cost of the global solution as a function of number of rounds of trades. The total cost was reduced by 40% over the greedy solution by the time the market reached quiescence.

Mapping Unknown Environments

Rob Zlot joined our team, and we ported the TraderBots architecture to our robot testbeds. Our test bed consisted of ten Pioneer robots equipped with sonar rings for sensing, gyro/odometry for position estimation, cameras for cooperative stereo, and wireless comms for negotiating with each other. See the figure below.

We modified the application to have the robots map an unknown environment. Initial task assignments could not be made, since the size, layout, and scope of the environment were unknown. Instead, the robots created new tasks whenever they detected new areas to explore. These new areas were "owned" by the robot that discovered them, and the task of exploring each one was put out to bid on the market. Thus, new tasks were constantly created as the robots mapped the environment. To the best of our knowledge, the resulting mapping system was more efficient, in terms of area covered per unit distance travelled, than any other published system [Zlot, Stentz, Dias, Thayer, 02] .

The figure below shows the results of this mapping system. The top two figures show the Field Robotics Center highbay that was mapped. Note all of the clutter. The bottom figure shows an overhead view of the aggregate map built by the robot team using their sonar sensors.

Recently, we upgraded the robot test beds to use SICK laser rangefinders, rather than sonar sensors, to map. The upgraded robots are shown below.

The figure below shows an overhead view of an area mapped by four of the robots. The dark green areas are walls, objects, and people. The light green areas are sensed free space. The black areas are unknown space. Note that the map is built incrementally as the robots discover and explore new areas. The area covered was about 200 feet on a side.

Opportunistic Optimization and Clustering

Since the first implementation of TraderBots, Bernardine Dias has worked on improving aspects of the market system [Dias, Stentz, 01] . The first version of the system permitted only single-task auctions. Because of this limitation, the market would sometimes reach quiescence in a fairly shallow local minimum. By clustering tasks that can be performed together, cost effectively, a lower global minimum can be reached [Dias, Stentz, 02] .

Bernardine also pushed on the concept of the "leader" robot. A leader is a robot that produces cost efficient plans, involving several robots, and sells them on the market. One type of leader is a robot that figures out better assignments of tasks to robots. Bernardine employed a combinatorial exchange to better map tasks onto robots and lower the global cost.

Hierarchical Task Decomposition

Rob Zlot is working on accommodating more complex tasks in the market architecture. He introduced the notion of AND/OR trees for representing an abstract task at the root level along with multiple decompositions into more detailed, primitive tasks. Each tree constitutes one or more "plans" for solving the abstract task. Robots may bid on the leaf nodes of the tree to implement the plan, or if they have a better decomposition, they may bid on intermediate nodes--perhaps even the root. Rob developed a computationally efficient algorithm for clearing task trees on the market [Zlot, Stentz, 03] . He is applying it to the problem of area reconnaissance, whereby a team of robots are deployed to sense a collection of areas. Optimizing the problem is difficult, since areas can be typically observed from many different viewing locations.

Tight Coordination of Robots

Nidhi Kalra is working on the problem of tight robot coordination. To date, we have applied the TraderBots architecture to tasks where the robots need to coordinate only for the purpose of subdividing the problem. For some tasks, such as jointly picking up an object or guarding an area, the actions of the robots must be tightly coupled. Nidhi developed an architecture whereby the robot's actions are decomposed into a sequence of fine-grained moves, and each robot attempts to influence its neighbor's actions by bidding on the neighbor's moves. She is applying this scheme to the problem of sweeping an area using an expanding perimeter [Kalra, Stentz, 03] . The robots are rewarded for the amount of area they clear, and they are penalized if they break the perimeter and permit an object or person to enter the cleared area.

TraderBot Legacy

The original Traderbots architecture was developed in the DARPA- funded Cognitive Colonies project. Since then, it has been ported to new projects and applications. Traderbots is now used by the ARL-funded Robotics Collaborative Technology Alliance project to control a section of unmanned ground vehicles performing a reconnaissance mission. The new work includes handling more complex tasks and tight coordination between robots.

Traderbots is also used by the NASA-funded Federation of Intelligent Robotic Explorers (FIRE) project to coordinate a prototype colony of robot explorers on Mars. The new work includes the incorporation of scheduling techniques into the market. Traderbots forms the market layer of FIRE's three-tiered architecture.

Back to home page.