An Interconnection Method for Multiple Computers using Sorting Nets Hans P. Moravec Artificial Intelligence Lab Computer Science Dept. Stanford University Stanford, Ca. 94305 Patent Disclosure, 1978 by Hans P. Moravec This invention concerns an improved method for connecting large numbers of digital computers to each other, or to large numbers of memories. Computers are becoming cheaper, and the demands on their processing power is growing. This trend is leading to systems with increasing numbers of cooperating computers. The usefulness of such systems depends strongly on the richness of the interconnections between the machines. A crossbar provides full generality, and is economical for systems with up to about 100 computers. The crossbar construction uses a number of components proportional to the square of the number of computers interconnected, and becomes prohibitively expensive for larger systems. My invention provides the generality of a crossbar, at far smaller cost. It can economically interconnect more than one million computers. Sorting Nets The interconnection method employs sorting nets, and the number of switching elements and the delay in the scheme depends on the construction of the sorting nets. A sorting net is a circuit which can N accept numbers on N input ports and produce them in numerical order a short time later at N output ports. Kenneth E. Batcher has invented a sorting net which can sort N numbers and which can be constructed with about N (log2 N)^2 simple elements. Each number must pass through about (log2 N)^2 simple elements between the input and the output ports. There is no known reason why a sorting net with only about N log2 N elements cannot be built, but no one has discovered a construction with this few elements. My invention could use such a discovery to advantage. Overview The interconnection method delivers messages from N originating devices to N or fewer destination devices. There can be fewer than N destinations if some of the outputs are left off. The originating devices can be computers issuing arbitrary messages or memories responding to memory requests. The destination devices can be computers receiving messages from other computers or responses from memories, or memories recieving requests from computers. Simultaneously the N originating devices each send out one message. Each message is addressed to one destination device, and includes a priority number. These messages pass through the interconnection network and arrive at their destinations at the same time. The originating devices can begin sending a subsequent wave of N messages as soon as they finish the first, even though the first wave may not yet have reached its destinations. The interconnection scheme is pipelined. Each destination device receives the message with the highest priority addressed to it in each message wave. If no one sent it a message in that wave, it recieves a null (dummy) message instead. At the same instant a message wave arrives, the devices that had originated each delivered message receive an acknowledgement, the processors whose messages were blocked by higher priority ones receive notices of failure. The notice of failure is a copy of the undelivered message. The success notice is a null message. Using fast but conventional contemporary electronics, and Batcher's sorting net construction, 100 to over 1,000,000 small to medium size computers could be interconnected economically by my method. The interconnection net would cost about as much as the the computers do. Messages waves could come about every 250 nanoseconds, and each wave would arrive about a microsecond after it had been sent. Detail The scheme is described in terms of processor to processor communication for simplicity. It is understood that the destination devices are not necessarily the same as the source ones, nor need there be the same number of destinations as sources. Also the messages need not be serial, nor must the duty cycle be 100%, nor must every message wave have the same length. The interconnection scheme is diagrammed in Figs 1, 2 and 3. Each processor is assigned a number, its "address", as indicated. In the sorters and the merger the smaller numbers come out towards the top of the diagram. Messages are serial bit streams, and consist of a destination processor address, a priority number (invented by the originating computer), a one bit "dummy" flag field (set to 0 for actual messages), the address of the source processor (i.e. a return address), and the data to be communicated. A low priority number implies high priority. Zero is the highest priority. The net is assumed to run at 100% duty cycle, with the processors emitting successive synchronized waves of messages. Every processor emits a message every message interval. The following discussion examines a single message wave. The first sorting net orders the messages by destination address, and within a given destination by priority number. Thus the upper inputs of the merger receive a list of messages, grouped by destination, with the highest priority message to each processor heading its group. The lower inputs of the merger receive N dummy messages, exactly one for each destination processor. The priority field is the highest possible (i.e. zero), the dummy flag is 1, the source address is the same as the destination, and the data portion is unused. Merging these dummies with the sorted list of real messages results in a list still grouped by destination, with each group headed by a dummy, by virtue of its high priority, followed immediately by the highest priority real message, if any, for the destination. This list is fed to the exchange network, which examines adjacent pairs of messages (considering overlapping pairs), and exchanges the data portions of a pair if the first member happens to be a dummy to a given address, and the second is a real message to the same address (i.e. it is the highest priority real message to that destination). The sorting network following the exchanger sorts the messages by the field begining with the dummy flag, which acts as the high order bit, followed by the source address. Since there were N real messages, one from each processor, and N dummies, also nominally one from each processor, and since real messages are sorted ahead of dummy messages due to the high order bit, (i.e. the dummy flag) being 0, the second sorter restores the messages into the same order in which they were introduced. Each processor has two input ports, one labeled "acknowledgement", and the other "incoming message". The acknowledgement input of processor i is connected to output i of the second sorter. The incoming message input is connected to output N+i (i.e. the i'th output of the lower half). In the absence of the exchange network, the i'th processor would receive its own message back on its acknowledgement input, and the i'th dummy message on the incoming message input. Because of the exchanger, however, if the message that processor i had sent happened to be the highest priority message to the requested destination, then the data portion of the message on the acknowledgement input would be that of the dummy it had been swapped with (signaling success). Also, if any messages had been addressed to processor i, the data portion of the highest priority one would arrive on the incoming message port, in place of the dummy message. Thus a processor receives the highest priority message addressed to it on its incoming message port, or a dummy if nobody wanted to talk to it. It receives a dummy on its acknowledgement port if its message has gotten through, or the message back if it hasn't, due to the existence of a higher priority message to the same destination. Actually the serial nature of the sorter causes the destination and priority field to be lost in the source address sorter (it tails after the previous wave of messages). In the case of messages that fail to get delivered, this means that the originating processor must remember to whom it sent the message (about four message times ago in a typical design, due to the latency of the net), if it wants to try again. This is probably undesirable. Also, delivered messages contain no indication of who sent them, having had their source address field exchanged with that of a dummy, unless the source address is included in the data field. These shortcomings can be overcome if the exchanger shuffles the destination address, source address and priority fields in the manner suggested by Figs 4, 5 and 6. Such shuffling can be accomplished with an amount of storage at each exchanger position equal to the number of bits in the destination and priority fields. Reference BATCHER, K.E. Sorting Networks and their Applications, 1968 Spring Joint Computer Conf. Proceedings April 1968, 307-314. Communication scheme organization +---------+ +-------+ +---+ +----------+ 1 -+| |--+| |--+| |--+| |-+ 1 | | | | | | | | 2 -+| |--+| |--+| |--+| |-+ 2 Processor | N | | | | | | | Processor 3 -+| |--+| |--+| |--+| |-+ 3 Outgoing | Input | | | | | | | Acknowledge- | | | | | | | | ment Message . | Sorter | . | | . | | . | | . . | | . | | . | | . | | . Input Ports . | (Desti- | . | | . | | . | 2N | . | nation | | | | | | | Ports | field) | | | | E | | Input | | | | N | | x | | | N -+| |--+| |--+| c |--+| Sorter |-+ N +---------+ +-+ M | | h | | | | e | | a | | (Source | -------------- +-+ r | | n | | field) | 1 --+| g |--+| g |--+| |-+ 1 | e | | e | | | 2 --+| r |--+| r |--+| |-+ 2 | | | | | | Processor Dummy 3 --+| |--+| |--+| |-+ 3 | | | | | | Incoming Message | | | | | | . | | . | | . | | . Message Injection . | | . | | . | | . . | | . | | . | | . Ports | | | | | | | | | | | | N --+| |--+| |--+| |-+ N +-------+ +---+ +----------+ FIG 1: Block diagram of the interconnection scheme. +----------------------+-----------------+---+----------+-----------------+ | Data | Source Address | 0 | Priority | Destination Ad | +----------------------+-----------------+---+----------+-----------------+ FIG 2: Processor message format. High order bit is on the right. +----------------------+-----------------+---+----------+-----------------+ | Unused | Position Number | 1 | Zero | Position Number | +----------------------+-----------------+---+----------+-----------------+ FIG 3: Dummy message format. Before: +----------------------+-----------------+---+----------+-----------------+ | Unused | Destination Ad | 1 | Zero | Destination Ad | +----------------------+-----------------+---+----------+-----------------+ +----------------------+-----------------+---+----------+-----------------+ | Data | Source Address | 0 | Priority | Destination Ad | +----------------------+-----------------+---+----------+-----------------+ After: +----------+-----------------+----------------------+-----------------+---+ | Priority | Source Address | Data | Destination Ad | 1 | +----------+-----------------+----------------------+-----------------+---+ +----------+-----------------+----------------------+-----------------+---+ | Zero | Destination Ad | Unused | Source Address | 0 | +----------+-----------------+----------------------+-----------------+---+ FIG 4: Rearrangements effected by the exchanger in an exchanged pair. Before: +----------------------+-----------------+---+----------+-----------------+ | Data | Source Address | 0 | Priority | Destination Ad | +----------------------+-----------------+---+----------+-----------------+ After: +----------+-----------------+----------------------+-----------------+---+ | Priority | Destination Ad | Data | Source Address | 0 | +----------+-----------------+----------------------+-----------------+---+ FIG 5: Rearrangements in an unsuccessful message. Before: +----------------------+-----------------+---+----------+-----------------+ | Unused | Destination Ad | 1 | Zero | Destination Ad | +----------------------+-----------------+---+----------+-----------------+ After: +----------+-----------------+----------------------+-----------------+---+ | Zero | Destination Ad | Unused | Destination Ad | 1 | +----------+-----------------+----------------------+-----------------+---+ FIG 6: Rearrangements in an isolated dummy message.