References

 

[1] July Abraham, S., Padmanabhan, K. “Performance of multicomputer networks under pin-out constraints”. Journal of Parallel and Distributed Computing. 1991;vol. 12(no. 3):237–248.

[2] March Adve, V.S., Vernon, M.K. “Performance analysis of mesh interconnection networks with deterministic routing”. IEEE Transactions on Parallel and Distributed Systems. 1994;vol. 5(no. 3):225–246.

[3] October Agarwal, A. Limits on interconnection network performance. IEEE Transactions on Parallel and Distributed Systems. 1991;vol. 2(no. 4):398–412.

[4] June Agarwal, A., et al. APRIL: A processor architecture for multiprocessing. Proceedings of the 17th International Symposium on Computer Architecture. 1990:104–114.

[5] Agerwala, T., et al. SP2 system architecture. IBM Systems Journal. 1995;vol. 34(no. 2):152–184.

[6] April Akers, S.B., Krishnamurthy, B. A group-theoretic model for symmetric interconnection networks. IEEE Transactions on Computers. 1989;vol. C-38(no. 4):555–566.

[7] April Allen, J.D., et al. Ariadne—an adaptive router for fault-tolerant multicomputers. Proceedings of the 21st International Symposium on Computer Architecture. 1994:278–288.

[8] April Anjan, K.V., Pinkston, T.M. DISHA: A deadlock recovery scheme for fully adaptive routing. Proceedings of the 9th International Parallel Processing Symposium. 1995:537–543.

[9] June Anjan, K.V., Pinkston, T.M. An efficient fully adaptive deadlock recovery scheme: DISHA. Proceedings of the 22nd International Symposium on Computer Architecture. 1995:201–210.

[10] April Anjan, K.V., Pinkston, T.M., Duato, J. Generalized theory for deadlock-free adaptive routing and its application to Disha Concurrent. Proceedings of the 10th International Parallel Processing Symposium. 1996:815–821.

[11] Technical Report Aoyama, K., Chien, A.A. The cost of adaptivity and virtual lanes in a wormhole router. Urbana-Champaign: Department of Computer Science, University of Illinois; 1994.

[12] published as Lecture Notes in Computer Science. Arvind, Nikhil, R.S. Executing a program on the MIT tagged token dataflow architecture. In: Proceedings of the PARLE Conference. Eindhoven: Springer-Verlag; 1987:1–29.

[13] August Athas, W.C., Seitz, C.L. Multicomputer: Message-passing concurrent computers. IEEE Computer. 1988;vol. 21(no. 8):9–24.

[14] Technical Report NAS-95-020. Available from. Bailey, D., et al. The NAS parallel benchmarks 2.0 1995;December. http://www.nas.nasa.gov/NAS/NPB.

[15] June Baker, W.E., et al. A flexible ServerNet-based fault tolerant architecture. Proceedings of the Fault Tolerant Computing Symposium. 1995:2–11.

[16] Bakoglu, H.B. Circuits, Interconnections, and Packaging for VLSI. Reading, MA: Addison-Wesley; 1990.

[17] April Balakrishnan, S., Panda, D.K. Impact of multiple consumption channels on wormhole routed k-ary n-cube networks. Proceedings of the 7th International Parallel Processing Symposium. 1993:163–167.

[18] December Basak, D., Panda, D.K. Scalable architectures with k-ary n-cube cluster-c organization. Proceedings of the 5th IEEE Symposium on Parallel and Distributed Processing. 1993:780–787.

[19] September Basak, D., Panda, D.K. Designing clustered multiprocessor systems under packaging and technological advancements. IEEE Transactions on Parallel and Distributed Systems. 1996;vol. 7(no. 9):962–978.

[20] November Beckmann, C.J., Polychronopoulos, C.D. Fast barrier synchronization hardware. Proceedings of Supercomputing ’90. 1990:180–189.

[21] May Bedichek, R.C. Talisman: Fast and accurate multicomputer simulation. Proceedings of ACM Joint International Conference on Measurement and Modeling of Computer Systems/SIGMETRICS’95, and Performance Evaluation Review. 1995;vol. 23(no. 1):14–24.

[22] November Beecroft, J., Homewood, M., McLaren, M. Meiko CS-2 interconnect elan-elite design. Parallel Computing. 1994;vol. 20(no. 10-11):1627–1638.

[23] Berge, C., Graphs and Hypergraphs. 1973.

[24] June Berman, P.E., et al. Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks. Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures. 1992:3–12.

[25] April Bhuyan, L.M., Agrawal, D.P. Generalized hypercube and hyperbus structures for a computer network. IEEE Transactions on Computers. 1984;vol. C-33(no. 4):323–333.

[26] October Blough, D., Bagherzadeh, N. Near-optimal message routing and broadcasting in faulty hypercubes. International Journal of Parallel Programming. 1990;vol. 19(no. 5):405–423.

[27] April Blough, D., Najand, S. Fault tolerant multiprocessor system routing using incomplete diagnostic information. Proceedings of the 6th International Parallel Processing Symposium. 1992:398–402.

[28] June Blough, D., Wang, H. Cooperative diagnosis and routing in fault-tolerant multiprocessor systems. Journal of Parallel and Distributed Computing. 1995;vol. 27:205–211.

[29] April Blumrich, M.A., et al. Virtual memory mapped network interface for the SHRIMP multicomputer. Proceedings of the 21st International Symposium on Computer Architecture. 1994:142–153.

[30] February Boden, N.J., et al. Myrinet—A gigabit per second local area network. IEEE Micro. 1995:29–36.

[31] Technical Report UW-CSE-93-12-03 Bolding, K. Multicomputer interconnection network channel design. North-Holland, Amsterdam: Department of Computer Science and Engineering, University of Washington; 1993.

[32] September 1993, and in IFIP Transactions A. Bolding, K., et al. The Chaos router chip: Design and implementation of an adaptive router. Proceedings of the International Conference on VLSI’93. 1994;vol. A-42:311–320.

[33] August Bolding, K., Kostantinidou, S. On the comparison of hypercube and torus networks. Proceedings of the 1992 International Conference on Parallel Processing. 1992;vol. I:62–66.

[34] Bolding, K., Snyder, L. Mesh and torus chaotic routing. Proceedings of the MIT/Brown Conference on Advanced Research in VLSI. 1992.

[35] May Bolding, K., Yost, W. Design of a router for fault tolerant networks. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:226–240.

[36] May Boppana, R.V., Chalasani, S. A comparison of adaptive wormhole routing algorithms. Proceedings of the 20th International Symposium on Computer Architecture. 1993:351–360.

[37] July Boppana, R.V., Chalasani, S. Fault-tolerant wormhole routing algorithms for mesh networks. IEEE Transactions on Computers. 1995;vol. 44(no. 7):848–864.

[38] October Boppana, R.V., Chalasani, S., Raghavendra, C.S. On multicast wormhole routing in multicomputer networks. Proceedings of the 6th IEEE Symposium on Parallel and Distributed Processing. 1994:722–729.

[39] November Borkar, S., et al. iWarp: An integrated solution to high-speed parallel computing. Proceedings of Supercomputing’88. 1988:330–339.

[40] May Borkar, S., et al. Supporting systolic and memory communication in iWarp. Proceedings of the 17th International Symposium on Computer Architecture. 1990:70–81.

[41] May Boughton, G.A. Arctic routing chip. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:310–317.

[42] June Boura, Y.M., Das, C.R. Efficient fully adaptive wormhole routing in n-dimensional meshes. Proceedings of the 14th International Conference on Distributed Computing Systems. 1994:589–596.

[43] August Boura, Y.M., Das, C.R. Fault-tolerant routing in mesh networks. Proceedings of the 1995 International Conference on Parallel Processing. 1995;vol. I:106–109.

[44] Bridges, P., et al. User’s Guide to MPICH, a Portable Implementation of MPI. Argonne: Argonne National Laboratory; 1995.

[45] Technical Report TM-ANL-92-17 Butler, R., Lusk, E. Users guide to the p4 programming system. Argonne: Argonne National Laboratory; 1992.

[46] April Butler, R., Lusk, E. Monitors, messages, and cluster: The p4 parallel programming system. Journal of Parallel and Distributed Computing. 1994;vol. 20(no. 4):547–564.

[47] August Byrd, G.T., Saraiya, N.P., Delagi, B.A. Multicast communication in multi-processor systems. Proceedings of the 1989 International Conference on Parallel Processing. 1989;vol. I:196–200.

[48] August Carbonaro, J., Verhoorn, F. Cavallino: The teraflops router and NIC. Proceedings of Hot Interconnects Symposium IV. 1996.

[49] July Chalasani, S., Boppana, R.V. Fault-tolerant wormhole routing in tori. Proceedings of the 8th International Conference on Supercomputing. 1994:146–155.

[50] Chalasani, S. Communication in multicomputers with nonconvex faults. Proceedings of Euro-Par’95. 1995:673–684. August

[51] October Chandra, S., Larus, J.R., Rogers, A. Where is time spent in message-passing and shared-memory programs. Proceedings of the 6th International Conference on the Architectural Support for Programming Languages and Operating Systems. 1994:61–73.

[52] April Chen, M.-S., Shin, K.G. Depth-first search approach for fault-tolerant routing in hypercube multicomputers. IEEE Transactions on Parallel and Distributed Systems. 1990;vol. 1(no. 2):152–159.

[53] December Chen, M.-S., Shin, G. Adaptive fault-tolerant routing in hypercube multi-computers. IEEE Transactions on Computers. 1990;vol. C-39(no. 12):1406–1416.

[54] May Chiang, C.-M., Ni, L.M. Multi-address encoding for multicast. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:146–160.

[55] Chiang, C.-M., Ni, L.M. Deadlock-free multi-head wormhole routing. Proceedings of the First High Performance Computing-Asia. 1995.

[56] Chiang, C.-M., Ni, L.M. Efficient software multicast in wormhole-routed unidirectional multistage networks. Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing. 1995:106–113.

[57] August Chien, A.A. A cost and speed model for k-ary n-cube wormhole routers. Proceedings of Hot Interconnects’93. 1993.

[58] May Chien, A.A., Kim, J.H. Planar-adaptive routing: Low-cost adaptive networks for multiprocessors. Proceedings of the 19th International Symposium on Computer Architecture. 1992:268–277.

[59] July Chiu, G., Chalasani, S., Raghavendra, C.S. Flexible, fault-tolerant routing criteria for circuit switched hypercubes. Proceedings of the 11th International Conference on Distributed Systems. 1991:582–589.

[60] June Choi, H.A., Esfahanian, A.H. On complexity of a message-routing strategy for multicomputer systems. Proceedings of the 16th International Workshop on Graph-Theoretic Concepts in Computer Science. 1990:170–181.

[61] April Choi, Y., Pinkston, T.M. Crossbar analysis for optimal deadlock recovery router architecture. Proceedings of the 11th International Parallel Processing Symposium. 1997:583–588.

[62] May-June Chow, E., et al. Hyperswitch network for the hypercube computer. Proceedings of the 15th International Symposium on Computer Architecture. 1988:90–99.

[63] August Cypher, R., Gravano, L. Requirements for deadlock-free, adaptive packet routing. Proceedings of the 11th ACM Symposium on Principles of Distributed Computing. 1992:25–33.

[64] August Cypher, R., Gravano, L. Adaptive, deadlock-free packet routing in torus networks with minimal storage. Proceedings of the 1992 International Conference on Parallel Processing. 1992;vol. III:204–211.

[65] December Cypher, R., Gravano, L. Storage-efficient, deadlock-free packet routing algorithms for torus networks. IEEE Transactions on Computers. 1994;vol. 43(no. 12):1376–1385.

[66] May Cypher, R., Ho, A., Konstantinidou, S., Messina, P. Architectural requirements of parallel scientific applications with explicit communication. Proceedings of the 20th International Symposium on Computer Architecture. 1993:2–13.

[67] Cypher, R., Konstantinidou, S. Bounds on the efficiency of message passing protocols for parallel computers. Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. 1993:173–181.

[68] October Cypher, R., Konstantinidou, S. Bounds on the efficiency of message passing protocols for parallel computers. SIAM Journal of Computing. 1996;vol. 25(no. 5):1082–1104.

[69] August Dai, D., Panda, D.K. Reducing cache invalidation overheads in wormhole routed DSMs using multidestination message passing. Proceedings of the 1996 International Conference on Parallel Processing. 1996:138–145.

[70] June Dally, W.J. Performance analysis of k-ary n-cube interconnection networks. IEEE Transactions on Computers. 1990;vol. C-39(no. 6):775–785.

[71] September Dally, W.J. Express cubes: Improving the performance of k-ary n-cube interconnection networks. IEEE Transactions on Computers. 1991;vol. C-40(no. 9):1016–1023.

[72] March Dally, W.J. Virtual-channel flow control. IEEE Transactions on Parallel and Distributed Systems. 1992;vol. 3(no. 2):194–205.

[73] April Dally, W.J., Aoki, H. Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Transactions on Parallel and Distributed Systems. 1993;vol. 4(no. 4):466–475.

[74] May Dally, W.J., et al. The Reliable Router: A reliable and high-performance communication substrate for parallel computers. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:241–255.

[75] August Dally, W.J., et al. Architecture and implementation of the Reliable Router. Proceedings of Hot Interconnects Symposium II. 1994.

[76] October Dally, W.J., Seitz, C.L. The torus routing chip. Journal of Distributed Computing. 1986;vol. 1(no. 3):187–196.

[77] May Dally, W.J., Seitz, C.L. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers. 1987;vol. C-36(no. 5):547–553.

[78] October Dally, W.J., Song, P. Design of self-timed VLSI multicomputer communication controller. Proceedings of the International Conference on Computer Design. 1987:230–234.

[79] June Dao, B.V., Duato, J., Yalamanchili, S. Configurable flow control mechanisms for fault-tolerant routing. Proceedings of the 22nd International Symposium on Computer Architecture. 1995:220–229.

[80] January Dao, B.V., Duato, J., Yalamanchili, S. Dynamically configurable message flow control for fault-tolerant routing. IEEE Transactions on Parallel and Distributed Systems. 1999;vol. 10(no. 1):7–22.

[81] February Dao, B.V., Yalamanchili, S., Duato, J. Architectural support for reducing communication overhead in multiprocessor interconnection networks. Proceedings of the Third International Symposium on High-Performance Computer Architecture. 1997:343–352.

[82] May Davis, A., et al. R2: A damped adaptive router design. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:295–309.

[83] February Dehkordi, P., Ramamurthi, K., Bouldin, D. Early cost/performance cache analysis of a split MCM based MicroSparc CPU. Proceedings of the Multi-Chip Module Conference. 1996:148–153.

[84] Dickens, P.M., Heidelberger, P., Nicol, D.M. Parallelized network simulators for message passing parallel programs. Proceedings of the International Workshop on Modeling, Analysis, Simulation of Computer and Telecommunication Systems. 1995:72–76.

[85] Technical Report UT-CS-95-229 Dongarra, J.J., Dunigan, T., Message-passing performance of various computers. Available at. The benchmark for point-to-point communication is available at. Computer Science Department, University of Tennessee; 1995. http://www.cs.utk.edu/~library/1995.html. http://www.netlib.org/benchmark/comm.shar.

[86] June Duato, J. On the design of deadlock-free adaptive routing algorithms for multicomputers: Design methodologies. Proceedings of Parallel Architectures and Languages Europe. 1991:390–405.

[87] December Duato, J. Deadlock-free adaptive routing algorithms for multicomputers: Evaluation of a new algorithm. Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. 1991:840–847.

[88] June Duato, J. Improving the efficiency of virtual channels with time-dependent selection functions. Proceedings of Parallel Architectures and Languages Europe. 1992:635–650.

[89] December Duato, J. Channel classes: A new concept for deadlock avoidance in wormhole networks. Parallel Processing Letters. 1992;vol. 2(no. 4):347–354.

[90] December Duato, J. A new theory of deadlock-free adaptive multicast routing in wormhole networks. Proceedings of the 5th IEEE Symposium on Parallel and Distributed Processing. 1993:64–71.

[91] December Duato, J. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Transactions on Parallel and Distributed Systems. 1993;vol. 4(no. 12):1320–1331.

[92] August Duato, J. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. Proceedings of the 1994 International Conference on Parallel Processing. 1994;vol. I:142–149.

[93] June Duato, J. A theory to increase the effective redundancy in wormhole networks. Parallel Processing Letters. 1994;vol. 4(nos. 1 and 2):125–138.

[94] April Duato, J. Improving the efficiency of virtual channels with time-dependent selection functions. Future Generation Computer Systems. 1994;vol. 10(no. 1):45–58.

[95] December Duato, J. A theory of fault-tolerant routing in wormhole networks. Proceedings of the International Conference on Parallel and Distributed Systems. 1994:600–607.

[96] September Duato, J. A theory of deadlock-free adaptive multicast routing in wormhole networks. IEEE Transactions on Parallel and Distributed Systems. 1995;vol. 6(no. 9):976–987.

[97] October Duato, J. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Transactions on Parallel and Distributed Systems. 1995;vol. 6(no. 10):1055–1067.

[98] August Duato, J. A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. IEEE Transactions on Parallel and Distributed Systems. 1996;vol. 7(no. 8):841–854.

[99] December Duato, J., et al. Scouting: Fully adaptive, deadlock-free routing in faulty pipelined networks. Proceedings of the International Conference on Parallel and Distributed Systems. 1994:608–613.

[100] May Duato, J., López, P. Performance evaluation of adaptive routing algorithms for k-ary n-cubes. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:45–59.

[101] August Duato, J., et al. A high performance router architecture for interconnection networks. Proceedings of the 1996 International Conference on Parallel Processing. 1996;vol. I:61–68.

[102] August Duato, J., Malumbres, M.P. Optimal topology for distributed shared-memory multiprocessors: Hypercubes again? Proceedings of Euro-Par’96. 1996;vol. 1:205–212.

[103] May Dubnicki, C., Li, K., Mesarina, M. Network interface support for user-level buffer management. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:256–265.

[104] February von Eicken, T., Basu, A., Buch, V. Low latency communication over ATM networks using active messages. IEEE Micro. 1995;vol. 15(no. 1):46–53.

[105] May von Eicken, T., et al. Active messages: A mechanism for integrated communication and computation. Proceedings of the 19th International Symposium on Computer Architecture. 1992:256–266.

[106] January Felderman, R.E., et al. ATOMIC: A high-speed local communication architecture. Journal of High Speed Networks. 1994;vol. 3(no. 1):1–28.

[107] June Feldmann, A., et al. Subset barrier synchronization on a private memory parallel system. Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures. 1992:209–218.

[108] Technical Report 5241:TR:87 Flaig, C.M. VLSI mesh routing systems. Dept. of Computer Science, California Institute of Technology; 1987.

[109] August Fleury, E., Fraigniaud, P. Strategies for multicasting in meshes. Proceedings of the 1994 International Conference on Parallel Processing. 1994;vol. III:151–158.

[110] July Fleury, E., Fraigniaud, P. A general theory for deadlock avoidance in wormhole-routed networks. IEEE Transactions on Parallel and Distributed Systems. 1998;vol. 9(no. 7):626–638.

[111] Flynn, M., Computer Architecture: Pipelined and Parallel Processor Design. Boston, MA: Jones and Bartlett; 1995:63–140.

[112] Fountain T.J., Shute M.J., eds. Multiprocessor Computer Architectures. 1990.

[113] Technical Report COMP TR90-141 Fox, G., et al, Fortran D language specification. December. Houston, TX: Rice University, Department of Computer Science; 1990.

[114] Fox, G.C., et al. Solving Problems on Concurrent Processors. Volume I: General Techniques and Regular Problems. Prentice Hall, Englewood Cliffs; 1988.

[115] August Frank, M.I., Vernon, M.K. A hybrid shared memory/message passing parallel machine. Proceedings of the 1993 International Conference on Parallel Processing. 1993;vol. I:232–236.

[116] August Fulgham, M.L., Snyder, L. A comparison of input and output driven routers. Proceedings of Euro-Par’96. 1996;vol. 1:195–204.

[117] August Galles, M. Scalable pipelined interconnect for distributed endpoint routing: The SPIDER chip. Proceedings of Hot Interconnects Symposium IV. 1996.

[118] January–February Galles, M. Spider: A high speed network interconnect. IEEE Micro. 1997;vol. 17(no. 1):34–39.

[119] December García, J.M., Duato, J. An algorithm for dynamic reconfiguration of a multicomputer network. Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. 1991:848–855.

[120] Garey, M.R., Johnson, D.S. The rectilinear Steiner tree problem is NP-complete. SIAM Journal of Applied Math. vol. 32, 1977.

[121] Garey, M.R., Johnson, D.S. Computer and Intractability, A Guide to the Theory of NP-Completeness. San Francisco: W.H. Freeman; 1979.

[122] March Garg, V., et al. Incorporating multi-chip module packaging constraints into system design. Proceedings of the European Design and Test Conference. 1996.

[123] June Gaughan, P.T., et al. Distributed, deadlock-free routing in faulty, pipelined, direct interconnection networks. IEEE Transactions on Computers. 1996;vol. 45(no. 6):651–665.

[124] December Gaughan, P.T., Yalamanchili, S. Pipelined circuit-switching: A fault-tolerant variant of wormhole routing. Proceedings of the 4th IEEE Symposium on Parallel and Distributed Processing. 1992:148–155.

[125] May Gaughan, P.T., Yalamanchili, S. Adaptive routing protocols for hypercube interconnection networks. IEEE Computer. 1993;vol. 26(no. 5):12–23.

[126] May Gaughan, P.T., Yalamanchili, S. A family of fault-tolerant routing protocols for direct multiprocessor networks. IEEE Transactions on Parallel and Distributed Systems. 1995;vol. 6(no. 5):482–497.

[127] August Gaughan, P.T., Yalamanchili, S. A performance model of pipelined k-ary n-cubes. IEEE Transactions on Computers. 1995;vol. 44(no. 8):1059–1063.

[128] October Gelernter, D. A DAG-based algorithm for prevention of store-and-forward deadlock in packet networks. IEEE Transactions on Computers. 1981;vol. C-30:709–715.

[129] May Glass, C.J., Ni, L.M. The turn model for adaptive routing. Proceedings of the 19th International Symposium on Computer Architecture. 1992:278–287.

[130] August Glass, C.J., Ni, L.M. Maximally fully adaptive routing in 2D meshes. Proceedings of the 1992 International Conference on Parallel Processing. 1992.

[131] June Glass, C.J., Ni, L.M. Fault-tolerant wormhole routing in meshes. Proceedings of the 23rd International Symposium on Fault-Tolerant Computing. 1993:240–249.

[132] Goke, L.R., Lipovski, G.J. Banyan networks for partitioning multiprocessing systems. Proceedings of the First International Symposium on Computer Architecture. 1973:21–28.

[133] December Gopal, I.S. Prevention of store-and-forward deadlock in computer networks. IEEE Transactions on Communications. 1985;vol. COM-33(no. 12):1258–1264.

[134] January Gordon, J.M., Stout, Q.F. Hypercube message routing in the presence of faults. Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications. 1988:318–327.

[135] Graham, R.L., Foulds, L.R. Unlikelihood that minimal phylogenies for realistic biological study can be constructed in reasonable computational time. Mathematical Biosciences. 1982;vol. 60:133–142.

[136] December Gravano, L., et al. Adaptive deadlock- and livelock-free routing with all minimalpaths in torus networks. IEEE Transactions on Parallel and Distributed Systems. 1994;vol. 5(no. 12):1233–1251.

[137] June Greenberg, A.G., Hajek, B. Deflection routing in hypercube networks. IEEE Transactions on Communications. 1992;vol. COM-40(no. 6):1070–1081.

[138] Gropp, W., Lusk, E., Skejellum, A. Using MPI: Portable Parallel Programming with the Message Passing Interface. Cambridge, MA: MIT Press; 1994.

[139] Technical Report UIUC-DCS-R-89-1486 Grunwald, D., Reed, D.A., Analysis of backtracking routing in binary hypercube computers. February. Urbana, IL: Department of Computer Science, University of Illinois at Urbana-Champaign; 1989.

[140] April Gunther, K.D. Prevention of deadlocks in packet-switched data transport systems. IEEE Transactions on Communications. 1981;vol. COM-29:512–524.

[141] April Gupta, R. The fuzzy barrier: A mechanism for the high speed synchronization of processors. Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems. 1989:54–63.

[142] June Gupta, A., et al. Comparative evaluation of latency reducing and tolerating techniques. Proceedings of the 18th International Symposium on Computer Architecture. 1991:254–263.

[143] North-Holland, Amsterdam Gurd, J., Kirkham, C.C., Boehm, A.P., The Manchester Dataflow Computing System. Dongarra, J., eds. Experimental Parallel Computing Systems.

[144] January Gurd, J., Kirkham, C.C., Watson, I. The Manchester prototype dataflow computer. Communications of the ACM. 1985;vol. 28(no. 1):34–52.

[145] January Hadas, R.L., Brandt, E. Origin-based fault-tolerant routing in the mesh. Proceedings of the First International Symposium on High-Performance Computer Architecture. 1995:102–111.

[146] Harary, F. Graph Theory. Addison-Wesley; 1972.

[147] Hedetniemi, S.M., Hedetniemi, S.T., Liestman, A.L. A survey of gossiping and broadcasting in communication networks. Networks. 1988;vol. 18(no. 4):319–349.

[148] November Heinlein, J., et al. Integration of message passing and shared memory in the Stanford FLASH multiprocessor. Proceedings of the 6th International Conference on the Architectural Support for Programming Languages and Operating Systems. 1994:38–50.

[149] May Heller, S. Congestion-free routing on the CM-5 data router. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:176–184.

[150] October Henry, D.S., Joerg, C.F. A tightly coupled processor-network interface. Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. 1992:111–122.

[151] (version 1.0, draft), January High Performance Fortran Forum. High Performance Fortran language specification 1993.

[152] August Ho, C.-T., Johnsson, S.L. Distributed routing algorithms for broadcasting and personalized communication in hypercubes. Proceedings of the 1986 International Conference on Parallel Processing. 1986:640–648.

[153] February Ho, C.-T., Kao, M. Optimal broadcast in all-port wormhole-routed hypercubes. IEEE Transactions on Parallel and Distributed Systems. 1995;vol. 6(no. 2):200–204.

[154] August Honeywell, Inc. Remote exploration and experimentation (REE) project study phase: Interim technical report. Reading, MA: National Aeronautics and Space Administration/Jet Propulsion Laboratory; 1996.

[155] February Horst, R. TNet: A reliable system area network. IEEE Micro. 1995;vol. 15(no. 1):36–44.

[156] April Horst, R. ServerNet deadlock avoidance and fractahedral topologies. Proceedings of the 10th International Parallel Processing Symposium. 1996:274–280.

[157] April Horst, R., et al. Performance modeling of ServerNet topologies. Proceedings of the 10th International Parallel Processing Symposium. 1996:518–523.

[158] August Hsu, J.-M., Banerjee, P. Hardware support for message routing in a distributed memory multicomputer. Proceedings of the 1990 International Conference on Parallel Processing. 1990;vol. I:508–515.

[159] July Hsu, J.-M., Banerjee, P. Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer. IEEE Transactions on Parallel and Distributed Systems. 1992;vol. 3(no. 4):451–464.

[160] March Hsu, W.T., Yew, P.C. The impact of wiring constraints on hierarchical network performance. Proceedings of the 6th International Parallel Processing Symposium. 1992:580–588.

[161] Hwang, K. Advanced Computer Architecture. New York: McGraw-Hill; 1993.

[162] Hwang, K., Briggs, F.A. Computer Architecture and Parallel Processing. New York: McGraw-Hill; 1984.

[163] Beaverton, OR Intel. iPSC/1 Reference Manual 1986.

[164] Intel. Paragon XP/S Product Overview. Beaverton, OR: Supercomputer Systems Division; 1991.

[165] June Special Issue on Interconnection Networks. IEEE Computer. 1987;vol. 20(no. 6).

[166] Technical Report Izu, C., et al. A router node architecture for cut-through torus networks. Spain: Departamento de Arquitectura y Tecnologia de Computadores, Universidad del Pais Vasco; 1994.

[167] May-June Jesshope, C.R., Miller, P.R., Yantchev, J.T. High performance communications in processor networks. Proceedings of the 16th International Symposium on Computer Architecture. 1989:150–157.

[168] September Johnsson, S.L., Ho, C.-T. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers. 1989;vol. C-38(no. 9):1249–1268.

[169] January Jump, J.R., Lakshmanamurthy, S. NETSIM: A general-purpose interconnection network simulator. Proceedings of the International Workshop on Modeling, Analysis, Simulation of Computer and Telecommunication Systems. 1993:121–125.

[170] May Karamcheti, V., Chien, A.A. Do faster routers imply faster communication? Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:1–15.

[171] October Karamcheti, V., Chien, A.A. Software overhead in messaging layers: Where does the time go? Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. 1994:51–60.

[172] Kermani, P., Kleinrock, L. Virtual cut-through: A new computer communication switching technique. Computer Networks. 1979;vol. 3:267–286.

[173] August Kesavan, R., Panda, D.K. Minimizing node contention in multiple multicast on wormhole k-ary n-cube networks. Proceedings of the 1996 International Conference on Parallel Processing. 1996;vol. 1:188–195.

[174] Kessler, R.E., Schwarzmeier, J.L. CRAY T3D: A new dimension for Cray Research. Proceedings of Compcon. 1993:176–182.

[175] December Kim, J.H., Chien, A.A. An evaluation of the planar/adaptive routing. Proceedings of the 4th IEEE Symposium on Parallel and Distributed Processing. 1992:470–478.

[176] January Kim, J.H., Chien, A.A. Evaluation of wormhole-routed networks under hybrid traffic loads. Proceedings of the 26th Hawaii International Conference on System Sciences. 1993:276–285.

[177] June Kim, J.H., Chien, A.A. The impact of packetization in wormhole-routed networks. Proceedings of Parallel Architectures and Languages Europe 93. 1993:242–253.

[178] July Kim, J., Das, C.R. Hypercube communication delay with wormhole routing. IEEE Transactions on Computers. 1994;vol. C-43(no. 7):806–814.

[179] April Kim, J.H., Liu, Z., Chien, A.A. Compressionless routing: A framework for adaptive and fault-tolerant routing. Proceedings of the 21st International Symposium on Computer Architecture. 1994:289–300.

[180] March Kim, J.H., Liu, Z., Chien, A.A. Compressionless routing: A framework for adaptive and fault-tolerant routing. IEEE Transactions on Parallel and Distributed Systems. 1997;vol. 8(no. 3):229–244.

[181] August Kim, J.-Y., et al. Drop-and-reroute: A new flow control policy for adaptive wormhole routing. Proceedings of the 1995 International Conference on Parallel Processing. 1995;vol. I:60–67.

[182] April Knight, T.F., Krymm, A. A self-terminating low-voltage swing CMOS output driver. IEEE Journal of Solid-State Circuits. 1988;vol. 23(no. 2):457–464.

[183] April Koike, N. NEC Cenju-3: A microprocessor-based parallel computer. Proceedings of the 8th International Parallel Processing Symposium. 1994:396–401.

[184] April Konstantinidou, S. Adaptive, minimal routing in hypercubes. Proceedings of the Sixth MIT Conference on Advanced Research in VLSI. 1990:139–153.

[185] May Konstantinidou, S. On the effect of queues sizes and channel scheduling policies in the segment router. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:72–85.

[186] June Konstantinidou, S. Segment router: A novel router design for parallel computers. Proceedings of the 6th Annual ACM Symposium on Parallel Algorithms and Architectures. 1994:364–373.

[187] July Konstantinidou, S., Snyder, L. The Chaos router: A practical application of randomization in network routing. Proceedings 2nd Annual ACM Symposium on Parallel Algorithms and Architectures. 1990:21–31.

[188] June Konstantinidou, S., Snyder, L. Chaos router: Architecture and performance. Proceedings of the 18th International Symposium on Computer Architecture. 1991:79–88.

[189] December Konstantinidou, S., Snyder, L. The Chaos router. IEEE Transactions on Computers. 1994;vol. 43(no. 12):1386–1397.

[190] Kowalik J.S., ed. Parallel MIMD Computation: HEP Supercomputer and Its Applications. Cambridge, MA: MIT Press, 1985.

[191] December Kruskal, C., Snir, M. The performance of multistage interconnection networks for multiprocessors. IEEE Transactions on Computers. 1983;vol. C-32(no. 12):1091–1098.

[192] April Kushkin, J., et al. The Stanford FLASH multiprocessor. Proceedings of the 21st International Symposium on Computer Architecture. 1994:302–313.

[193] September Lam, K., Dennison, L.R., Dally, W.J. Simultaneous bidirectional signaling for IC systems. Proceedings of the International Conference on Computer Design. 1990:430–433.

[194] August Lan, Y. Multicast in faulty hypercubes. Proceedings of the 1992 International Conference on Parallel Processing. 1992;vol. I:58–61.

[195] January Lan, Y., Esfahanian, A.H., Ni, L.M. Distributed multi-destination routing in hypercube multiprocessors. Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications. 1988:631–639.

[196] January Lan, Y., Esfahanian, A.H., Ni, L.M. Multicast in hypercube multiprocessors. Journal of Parallel and Distributed Computing. 1990;vol. 8(no. 1):30–41.

[197] August Lan, Y., Ni, L.M., Esfahanian, A.H. A VLSI router design for hypercube multiprocessors. Integration: The VLSI Journal. 1989;vol. 7:103–125.

[198] October Lee, T., Hayes, J.P. A fault-tolerant communication scheme for hypercube computers. IEEE Transactions on Computers. 1992;vol. C-41(no. 10):1242–1256.

[199] August van Leeuwen, J., Tan, R.B. Interval routing. The Computer Journal. 1987;vol. 30(no. 4):298–307.

[200] Leighton, F.T. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. San Francisco: Morgan Kaufmann; 1992.

[201] October Leiserson, C.E. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers. 1985;vol. C-34:892–901.

[202] June-July Leiserson, C.E., et al. The network architecture of the Connection Machine CM-5. Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures. 1992:272–285.

[203] March Lenoski, D., et al. The Stanford DASH multiprocessor. IEEE Computer. 1992;vol. 25(no. 3):63–79.

[204] December Lin, X., et al. Adaptive wormhole routing in hypercube multicomputers. Proceedings of the 5th IEEE Symposium on Parallel and Distributed Processing. 1993:72–79.

[205] June Lin, X., McKinley, P.K., Esfahanian, A.H. Adaptive multicast wormhole routing in 2D mesh multicomputers. Proceedings of Parallel Architectures and Languages Europe 93. 1993:228–241.

[206] August Lin, X., McKinley, P.K., Ni, L.M. Performance evaluation of multicast wormhole routing in 2D-mesh multicomputers. Proceedings of the 1991 International Conference on Parallel Processing. 1991;vol. I:435–442.

[207] August Lin, X., McKinley, P.K., Ni, L.M. The message flow model for routing in wormhole-routed networks. Proceedings of the 1993 International Conference on Parallel Processing. 1993;vol. I:294–297.

[208] August Lin, X., McKinley, P.K., Ni, L.M. Deadlock-free multicast wormhole routing in 2D mesh multicomputers. IEEE Transactions on Parallel and Distributed Systems. 1994;vol. 5(no. 8):793–804.

[209] July Lin, X., McKinley, P.K., Ni, L.M. The message flow model for routing in wormhole-routed networks. IEEE Transactions on Parallel and Distributed Systems. 1995;vol. 6(no. 7):755–760.

[210] August Lin, X., Ni, L.M. Multicast communication in multicomputer networks. Proceedings of the 1990 International Conference on Parallel Processing. 1990;vol. III:114–118.

[211] May Lin, X., Ni, L.M. Deadlock-free multicast wormhole routing in multicomputer networks. Proceedings of the 18th International Symposium on Computer Architecture. 1991:116–125.

[212] October Lin, X., Ni, L.M. Multicast communication in multicomputer networks. IEEE Transactions on Parallel and Distributed Systems. 1993;vol. 4(no. 10):1104–1117.

[213] January Linder, D.H., Harden, J.C. An adaptive and fault tolerant wormhole routing strategy for k-ary n-cubes. IEEE Transactions on Computers. 1991;vol. C-40(no. 1):2–12.

[214] February Littlefield, R.J. Characterizing and tuning communications performance for real applications. Proceedings of the First Intel DELTA Applications Workshop. 1992.

[215] October Liu, Z., Chien, A.A. Hierarchical adaptive routing. Proceedings of the 6th IEEE International Symposium on Parallel and Distributed Processing. 1994:688–695.

[216] January Liu, Z., Duato, J. Adaptive unicast and multicast in 3D mesh networks. Proceedings of the 27th Hawaii International Conference on System Sciences. 1994:173–182.

[217] June Liu, Z., Duato, J., Thorelli, L.-E. Grouping virtual channels for deadlock-free adaptive wormhole routing. Proceedings of Parallel Architectures and Languages Europe 93. 1993:254–265.

[218] Ph.D. Dissertation López, P. Diseño de un Circuito de Comunicaciones de Altas Prestaciones para Redes de Interconexión con Control de Flujo “Wormhole”. Spain: Polytechnical University of Valencia; 1995.

[219] August López, P., et al. A methodology to speed up the evaluation of interconnection networks. IEEE Technical Committee on Computer Architecture Newsletter. 1995:32–37.

[220] June López, P., Duato, J. Deadlock-free adaptive routing algorithms for the 3D-torus: Limitations and solutions. Proceedings of Parallel Architectures and Languages Europe 93. 1993:684–687.

[221] López, P., Duato, J. Deadlock-free fully-adaptive minimal routing algorithms: Limitations and solutions. Computers and Artificial Intelligence. 1995;vol. 14(no. 1):105–125.

[222] October Loucif, S., Ould-Khaoua, M., Mackenzie, L.M. The express channel concept in hypermeshes and k-ary n-cubes. Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing. 1996:566–569.

[223] Technical Report 119/R19 Mackenzie, L.M., et al. COBRA: A high-performance interconnection network for large multicomputer. Computer Science Department, University of Glasgow; 1991.

[224] October Malumbres, M.P., Duato, J., Torrellas, J. An efficient implementation of tree-based multicast routing in distributed shared-memory multiprocessors. Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing. 1996:186–189.

[225] Available from Martin, R., HPAM: An active message layer for a network of HP workstations. Proceedings of Hot Interconnects Symposium II 1994. ftp://ftp.cs.berkeley.edu/CASTLE/Active_Messages/hotipaper.ps.

[226] August Martínez, J.M., et al. Software-based deadlock recovery technique for true fully adaptive routing in wormhole networks. Proceedings of the 1997 International Conference on Parallel Processing. 1997.

[227] April May, M.D. The next generation transputers and beyond. Proceedings of the 2nd European Distributed Memory Computing Conference. 1991:7–22.

[228] May M.D., Thompson P.W., Welch P.H., eds. Networks, Routers and Transputers: Function, Performance and Application. IOS Press, 1993.

[229] May McKenzie, N.R., et al. Cranium: An interface for message passing on adaptive packet routing networks. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:266–281.

[230] January McKinley, P.K., Trefftz, C. MultiSim: A simulation tool for the study of large-scale multiprocessors. Proceedings of the International Workshop on Modeling, Analysis, Simulation of Computer and Telecommunication Systems. 1993:57–62.

[231] Trefftz, C. Efficient broadcast in all-port wormhole-routed hypercubes. Proceedings of the 1993 International Conference on Parallel Processing. 1993;vol. II:288–291. August

[232] December McKinley, P.K., Tsai, Y.-J., Robinson, D.F. Collective communication in wormhole-routed massively parallel computers. IEEE Computer. 1995;vol. 28(no. 12):39–50.

[233] August McKinley, P.K., et al. Unicast-based multicast communication in wormhole-routed networks. Proceedings of the 1992 International Conference on Parallel Processing. 1992.

[234] December McKinley, P.K., et al. Unicast-based multicast communication in wormhole-routed networks. IEEE Transactions on Parallel and Distributed Systems. 1994;vol. 5(no. 12):1252–1265.

[235] November McKinley, P.K., et al. ComPaSS: Efficient communication services for scalable architectures. Proceedings of Supercomputing’92. 1992:478–487.

[236] Technical Report SNARC 92-02 Merlin, J. Techniques for the automatic parallelization of Distributed Fortran 90. University of Southampton, Southampton Novel Architecture Research Center; 1992.

[237] March Merlin, P.M., Schweitzer, P.J. Deadlock avoidance in store-and-forward networks-I: Store-and-forward deadlock. IEEE Transactions on Communications. 1980;vol. COM-28:345–354.

[238] Miguel, J., et al. Assessing the performance of the new IBM SP2 communication subsystem. IEEE Parallel & Distributed Technology. 1996;vol. 4(no. 4):12–22.

[239] Ph.D. Dissertation Miller, P.R. Efficient Communications for Fine-Grain Distributed Computers. U.K.: Southampton University; 1991.

[240] February Minnich, R., Burns, D., Hady, F. The memory-integrated network interface. IEEE Micro. 1995;vol. 15(no. 1):11–20.

[241] Technical Report TR-ACAR-95-03 Mohapatra, P. Wormhole routing techniques in multicomputer systems. Department of Electrical and Computer Engineering, Iowa State University; 1995.

[242] Available at Message Passing Interface Forum, MPI: A message-passing interface standard. International Journal of Supercomputer Applications and High Performance Computing. 1994;vol. 8(no. 3/4). ftp://www.netlib.org/mpi/mpi-report.ps.

[243] Web page at. http://www.erc.mstate.edu/mpi.

[244] Web page at. http://cisr.anu.edu.au/pub/papers/meglicki/mpi/tutorial/mpi.

[245] Web page at. http://www.osc.edu/Lam/mpi/mpi-ezstart.html.

[246] June Mudge, T.N., Hayes, J.P., Winsor, D.C. Multiple bus architectures. IEEE Computer. 1987;vol. 20(no. 6):42–48.

[247] January-February Mukherjee, S.S., Bannon, P., Lang, S., Spink, A., Webb, D. The 21364 network architecture. IEEE Micro. 2002;vol. 22(no. 1):26–35.

[248] NCUBE Corporation, NCUBE 6400 Processor Manual, 1990.

[249] June Ngai, J.Y., Seitz, C.L. A framework for adaptive routing in multicomputer networks. Proceedings of the 1st Annual ACM Symposium on Parallel Algorithms and Architectures. 1989:1–9.

[250] May Nguyen, J., Pezaris, J., Pratt, G., Ward, S. Three-dimensional network topologies. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:101–115.

[251] May Nguyen, T.D., Snyder, L. Performance analysis of a minimal adaptive router. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:31–44.

[252] August Ni, L.M. Should scalable parallel computers support efficient hardware multicast? Proceedings of the 1995 ICPP Workshop on Challenges for Parallel Processing. 1995:2–7.

[253] August Ni, L.M. Issues in designing truly scalable interconnection networks. Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing. 1996:74–83.

[254] August Ni, L.M., Gui, Y., Moore, S. Performance evaluation of switch-based wormhole networks. Proceedings of the 1995 International Conference on Parallel Processing. 1995;vol. 1:32–40.

[255] February Ni, L.M., McKinley, P.K. A survey of wormhole routing techniques in direct networks. IEEE Computer. 1993;vol. 26(no. 2):62–76.

[256] May Noakes, M., Wallach, D.A., Dally, W.J. The J-Machine multicomputer: An architectural evaluation. Proceedings of the 20th International Symposium on Computer Architecture. 1993:224–235.

[257] June Nowatzyk, A.G., et al. S-Connect: From networks of workstations to supercomputer performance. Proceedings of the 22nd International Symposium on Computer Architecture. 1995:71–82.

[258] January Nugent, S. The iPSC/2 direct connect communications technology. Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications. 1988:51–59.

[259] Oed, W. The Cray Research Massively Parallel Processing System: Cray T3D. Cray Research; 1993.

[260] August O’Keefe, M.T., Dietz, H.G. Hardware barrier synchronization: Static barrier MIMD (SBM). Proceedings of the 1990 International Conference on Parallel Processing. 1990:35–42.

[261] August O’Keefe, M.T., Dietz, H.G. Hardware barrier synchronization: Dynamic barrier MIMD (DBM). Proceedings of the 1990 International Conference on Parallel Processing. 1990:43–46.

[262] April Oik, E. PARSE: Simulation of message passing communication networks. Proceedings of the 27th Annual Simulation Symposium. 1994:115–124.

[263] August Oruç, A.Y. Multiple tracks of research on interconnection networks. Proceedings of the 1995 ICPP Workshop on Challenges for Parallel Processing. 1995:16–23.

[264] November Pakin, S., Lauria, M., Chien, A.A. High performance messaging on workstations: Illinois fast messages on Myrinet. Proceedings of Supercomputing 95. 1995.

[265] January Panda, D.K. Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms. Proceedings of the First International Symposium on High-Performance Computer Architecture. 1995:200–209.

[266] April Panda, D.K. Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms. Proceedings of the 10th International Parallel Processing Symposium. 1995:652–659.

[267] August Panda, D.K. Issues in designing efficient and practical algorithms for collective communication on wormhole-routed systems. Proceedings of the 1995 ICPP Workshop on Challenges for Parallel Processing. 1995:8–15.

[268] January Panda, D.K., Singal, S., Kesavan, R. Multidestination message passing in wormhole k-ary n-cube networks with base routing conformed paths. IEEE Transactions on Parallel and Distributed Systems. 1999;vol. 10(no. 1):76–96.

[269] May Panda, D.K., Singal, S., Prabhakaran, P. Multidestination message passing mechanism conforming to base wormhole routing scheme. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:131–145.

[270] November Papadopoulos, G., et al. *T: Integrating building blocks for parallel computing. Proceedings of Supercomputing’93. 1993:624–635.

[271] August Park, J., et al. Construction of optimal multicast trees based on the parameterized communication model. Proceedings of the 1996 International Conference on Parallel Processing. 1996;vol. I:180–187.

[272] Technical Report, Cray Research, January Pase, D.M. MPP Fortran programming model 1992.

[273] October Patel, J.H. Performance of processor-memory interconnections for multiprocessors. IEEE Transactions on Computers. 1981;vol. C-30:771–780.

[274] April Petrini, F., Vanneschi, M. Performance analysis of minimal adaptive wormhole routing with time-dependent deadlock recovery. Proceedings of the 11th International Parallel Processing Symposium. 1997:589–595.

[275] October Pfister, G., Norton, A. Hot spot contention and combining in multistage interconnect networks. IEEE Transactions on Computers. 1985;vol. C-34:943–948.

[276] January Pierce, P. The NX/2 operating system. Proceedings of the 3rd Conference on Hypercube Concurrent Computers and Applications. 1988:384–390.

[277] June Pifarré, G.D., et al. Fully-adaptive minimal deadlock-free packet routing in hypercubes, meshes and other networks. Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures. 1991:278–290.

[278] March Pifarré, G.D., et al. Fully adaptive minimal deadlock-free packet routing in hypercubes, meshes, and other networks: Algorithms and simulations. IEEE Transactions on Parallel and Distributed Systems. 1994;vol. 5(no. 3):247–263.

[279] Technical Report CENG-96-34 Pinkston, T.M., Borsody, J., Kostis, W., Turn selection enhancements to deadlock recovery algorithms. December. University of Southern California; 1996.

[280] February Pinkston, T.M., Ha, J.-H. SPEED DMON: Cache coherence on an optical multi-channel interconnect architecture. Journal of Parallel and Distributed Computing. vol. 41(no. 1), 1997.

[281] March Pinkston, T.M., Raksapatcharawong, M., Choi, Y. Smart-pixel implementation of network router deadlock handling mechanisms. Spring Topical Meeting on Optics in Computing Technical Digest. 1997.

[282] August Pinkston, T.M., Raksapatcharawong, M., Kuznia, C. An asynchronous optical token smart-pixel design based on hybrid CMOS/SEED integration. IEEE/LEOS 1996 Summer Topical Meeting on Smart Pixels Technical Digest. 1996:40–41.

[283] June Pinkston, T.M., Warnakulasuriya, S. On deadlocks in interconnection networks. Proceedings of the 24th International Symposium on Computer Architecture. 1997.

[284] May Preparata, F.P., Vuillemin, J. The cube-connected cycles: A versatile network for parallel computation. Communications of the ACM. 1981;vol. 24(no. 5):300–309.

[285] August Qiao, W., Ni, L.M. Adaptive routing in irregular networks using cut-through switches. Proceedings of the 1996 International Conference on Parallel Processing. 1996;vol. I:52–60.

[286] July Raghavendra, C.S., Yang, P.-J., Tien, S.-B. Free dimensions, an effective approach to achieving fault tolerance in hypercubes. Proceedings of the 22nd International Symposium on Fault-Tolerant Computing. 1992:170–177.

[287] September Raghavendra, C.S., Yang, P.-J., Tien, S.-B. Free dimensions, an effective approach to achieving fault tolerance in hypercubes. IEEE Transactions on Computers. 1995;vol. 44(no. 9):1152–1157.

[288] June Reed, D.A., Grunwald, D.C. The performance of multicomputer interconnection networks. IEEE Computer. 1987;vol. 20(no. 6):63–73.

[289] March Reeves, D.S., Gehringer, E.F., Chandiramani, A. Adaptive routing and deadlock recovery: A simulation study. Proceedings of the 4th Conference on Hypercube Concurrent Computers and Applications. 1989:331–337.

[290] April Reinhardt, S.K., Larus, J.R., Wood, D.A. Tempest and Typhoon: User-level shared memory. Proceedings of the 21st International Symposium on Computer Architecture. 1994:325–336.

[291] April Rexford, J., et al. PP-MESS-SIM: A simulator for evaluating multicomputer interconnection networks. Proceedings of the Simulation Symposium. 1995:84–93.

[292] April Rexford, J., Dolter, J., Shin, K.G. Hardware support for controlled interaction of guaranteed and best-effort communication. Proceedings of the Workshop on Parallel and Distributed Real-Time Systems. 1994:188–193.

[293] January Rexford, J., et al. PP-MESS-SIM: A flexible and extensible simulator for evaluating multicomputer networks. IEEE Transactions on Parallel and Distributed Systems. 1997;vol. 8(no. 1):25–40.

[294] May Rexford, J., Shin, K.G. Support for multiple classes of traffic in multicomputer routers. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:116–130.

[295] November Robinson, D.F., et al. Efficient collective data distribution in all-port wormhole-routed hypercubes. Proceedings of Supercomputing’93. 1993:792–801.

[296] December Robinson, D.F., et al. Efficient multicast in all-port wormhole-routed hypercubes. Journal of Parallel and Distributed Computing. 1995;vol. 31(no. 2):126–140.

[297] June Robles, A., Duato, J. Multilinks: A new approach to the design of adaptive routing algorithms for multicomputers. Proceedings of the IMACS-IFAC Symposium on Parallel and Distributed Computing in Engineering Systems. 1991:405–410.

[298] Oxford University Computing Laboratory Report Roscoe, A.W. Routing messages through networks: An exercise in deadlock avoidance 1987.

[299] June Rothberg, E., Singh, J.P., Gupta, A. Working sets, cache sizes, and node granularity issues for large scale multiprocessors. Proceedings of the 20th International Symposium on Computer Architecture. 1993:14–25.

[300] July Saad, Y., Schultz, M.H. Topological properties of hypercubes. IEEE Transactions on Computers. 1988;vol. C-37(no. 7):867–872.

[301] April Samatham, M.R., Pradhan, D.K. The de Bruijn multiprocessor network: A versatile parallel processing and sorting network for VLSI. IEEE Transactions on Computers. 1989;vol. C-38(no. 4):567–581.

[302] June Sandborn, P.A., Abadir, M.S., Murphy, C.F. The trade-off between peripheral and area array bonding of components in multi-chip modules. IEEE Transactions on Components, Packaging, and Manufacturing Technologies. 1994;vol. 17(no. 2):249–256.

[303] Sandborn, P.A., Moreno, H. Conceptual Design of Multi-chip Modules and Systems. Boston, MA: Kluwer Academic Publishers; 1994.

[304] Technical Report GIT/CSRL-94/1 Schimmel, D.E., Allen, J.D., Gaughan, P.T., Efficient self-timed distributed mutual exclusion. available via anonymous ftp at, February. Georgia Institute of Technology; 1994. ftp://ee.gatech.edu:pub/csrl.

[305] Technical Report SRC Research Report 59, DEC, April Schroeder, M.D., et al. Autonet: A high-speed, self-configuring local area network using point-to-point links 1990.

[306] November Schwiebert, L., Jayasimha, D.N. Optimal fully adaptive wormhole routing for meshes. Proceedings of Supercomputing’93. 1993:782–791.

[307] July Schwiebert, L., Jayasimha, D.N. A universal proof technique for deadlock-free routing in interconnection networks. Proceedings of the Symposium on Parallel Algorithms and Architectures. 1995:175–184.

[308] May Schwiebert, L., Jayasimha, D.N. Optimally fully adaptive minimal wormhole routing for meshes. Journal of Parallel and Distributed Computing. 1995;vol. 27:56–70.

[309] January Schwiebert, L., Jayasimha, D.N. A necessary and sufficient condition for deadlock-free wormhole routing. Journal of Parallel and Distributed Computing. 1996;vol. 32(no. 1):103–117.

[310] October Scott, S.L. Synchronization and communication in the T3E multiprocessor. Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. 1996:26–36.

[311] January Scott, S.L., Goodman, J.R. The impact of pipelined channels on k-ary n-cube networks. IEEE Transactions on Parallel and Distributed Systems. 1994;vol. 5(no. 1):2–16.

[312] May Scott, S.L., Thorson, G. Optimized routing in the Cray T3D. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:281–294.

[313] August Scott, S.L., Thorson, G. The Cray T3E network: Adaptive routing in a high performance 3D torus. Proceedings of Hot Interconnects Symposium IV. 1996.

[314] January Seitz, C.L. The Cosmic Cube. Communications of the ACM. 1985;vol. 28(no. 1):22–33.

[315] Seitz, C.L., Su, W. A family of routing and communication chips based on the Mosaic. Proceedings of the Washington Symposium on Integrated Systems. 1993.

[316] April Shafer, S., Ghose, K. Improving parallel program execution time with message consolidation. Proceedings of the 8th International Parallel Processing Symposium. 1994:736–742.

[317] June Shin, K.G., Daniel, S.W. Analysis and implementation of hybrid switching. IEEE Transactions on Computers. 1996;vol. C-45(no. 6):684–692.

[318] December Siegel, H.J., et al. Using the multistage cube network topology in parallel supercomputers. Proceedings of the IEEE. 1989;vol. 77:1932–1953.

[319] December Silla, F., Duato, J. Improving the efficiency of adaptive routing in networks with irregular topology. Proceedings of the 1997 Conference on High Performance Computing. 1997.

[320] February Silla, F., et al. Efficient adaptive routing in networks of workstations with irregular topology. Proceedings of the Workshop on Communications and Architectural Support for Network-Based Parallel Computing. 1997:46–60.

[321] December Singhal, M. Dllel and Distributed Processing. 1993:780–787.

[322] October Sivaram, R., Panda, D.K., Stunkel, C.B. Efficient broadcast and multicast on multistage interconnection networks using multiport encoding. Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing. 1996:36–45.

[323] Technical Report GIT-CC-93/63 Sivasubramaniam, A., et al, Machine abstractions and locality issues in studying parallel systems. October. Georgia Institute of Technology; 1993.

[324] Snir, M., et al. The communication software and parallel environment of the IBM SP2. IBM Systems Journal. 1995;vol. 34(no. 2):205–221.

[325] Snir, M., et al. MPI: The Complete Reference. Cambridge, MA: MIT Press; 1996.

[326] February Stone, H.S. Parallel processing with the perfect shuffle. IEEE Transactions on Computers. 1971;vol. 20:153–161.

[327] April Stunkel, C.B., et al. Architecture and implementation of Vulcan. Proceedings of the 8th International Parallel Processing Symposium. 1994:266–274.

[328] May Stunkel, C.B., et al. The SP1 high-performance switch. Proceedings of the Scalable High Performance Computing Conference. 1994:150–157.

[329] Technical Report, IBM Stunkel, C.B., et al, The SP2 communication subsystem. August. T. J. Watson Research Center; 1994.

[330] February Stunkel, C.B., et al. The SP2 high-performance switch. IBM Systems Journal. 1995;vol. 34(no. 2):185–204.

[331] August Su, C., Shin, K.G. Adaptive deadlock-free routing in multicomputers using only one extra channel. Proceedings of the 1993 International Conference on Parallel Processing. 1993.

[332] Su, C., Shin, K.G. Adaptive fault-tolerant deadlock-free routing in meshes and hypercubes. IEEE Transactions on Computers. 1996;vol. 45(no. 6):666–683. June

[333] August Suh, Y.-J., et al. Software based fault-tolerant oblivious routing in pipelined networks. Proceedings of the 1995 International Conference on Parallel Processing. 1995;vol. 1:101–105.

[334] March Sullivan, H., Bashkow, T.R. A large scale, homogeneous, fully distributed parallel machine. Proceedings of the 4th International Symposium on Computer Architecture. 1977.

[335] April Sunderam, V.S., et al. The PVM concurrent computing system: Evolution, experiences, and trends. Parallel Computing. 1994;vol. 20(no. 4):531–545.

[336] January Szymanski, T. Hypermeshes: Optical interconnection networks for parallel processing. Journal of Parallel and Distributed Computing. 1995;vol. 26:1–23.

[337] August Tanabe, N., et al. Base-m n-cube: High performance interconnection networks for highly parallel computer PRODIGY. Proceedings of the 1991 International Conference on Parallel Processing. 1991.

[338] Tanenbaum, A.S. Computer Networks, 2nd ed. Prentice Hall, Englewood Cliffs; 1988.

[339] January Tang, D., Iyer, R. Dependability measurement and modeling of computer systems. IEEE Transactions on Computers. 1993;vol. 42(no. 1):62–75.

[340] January Thinking Machines Corporation. CM Fortran Programming Guide 1991.

[341] October Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary 1991.

[342] May Tsai, Y.-J., McKinley, P.K. An extended dominating node approach to collective communication in wormhole-routed 2D meshes. Proceedings of the Scalable High-Performance Computing Conference. 1994:199–206.

[343] February Tseng, Y.-C., Panda, D.K., Lai, T.-H. A trip-based multicasting model in wormhole-routed networks with virtual channels. IEEE Transactions on Parallel and Distributed Systems. 1996;vol. 7(no. 2):138–150.

[344] January Upadhyay, J.H., Varavithya, V., Mohapatra, P. An efficient and balanced routing in two-dimensional meshes. Proceedings of the First International Symposium on High-Performance Computer Architecture. 1995:112–121.

[345] Valiant, L.G. A scheme for fast parallel communication. SIAM Journal on Computing. 1982;vol. 11:350–361.

[346] August Varavithya, V., Mohapatra, P. Tree-based multicasting on wormhole routed multistage interconnection networks. Proceedings of the 1997 International Conference on Parallel Processing. 1997.

[347] April Warnakulasuriya, S., Pinkston, T.M. Characterization of deadlocks in interconnection networks. Proceedings of the 11th International Parallel Processing Symposium. 1997:80–86.

[348] May Wills, S., et al. The offset cube: An optoelectronic interconnection network. Proceedings of the Workshop on Parallel Computer Routing and Communication. 1994:86–100.

[349] Wittie, L.D. Communication structures for large networks of microcomputers. IEEE Transactions on Computers. 1981;vol. C-30(no. 4):264–273. April

[350] December Worley, P.H., Foster, I.T. Available at: PSTSWM v4.0 (ParkBench MPI version) 1995. http://www.netlib.org/parkbench/compapps.

[351] August Wu, J. Unicasting in faulty hypercubes using safety levels. Also available as Technical Report TR-CSE-95-2, Department of Computer Science and Engineering, Florida Atlantic University. Proceedings of the 1995 International Conference on Parallel Processing. 1995;vol. III:133–136.

[352] August Wu, C.L., Feng, T-Y. On a class of multistage interconnection networks. IEEE Transactions on Computers. 1980;vol. C-29:694–702.

[353] Xu, H., Gui, Y., Ni, L.M. Optimal software multicast in wormhole routed multistage networks. Proceedings of Supercomputing. 1994:703–712.

[354] Spring Xu, Z., Hwang, K. Modeling communication overhead: MPI and MPL performance on the IBM SP2. IEEE Parallel & Distributed Technology. 1996:9–23.

[355] October Xu, H., McKinley, P.K., Ni, L.M. Efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers. Journal of Parallel and Distributed Computing. 1992;vol. 16:172–184.

[356] Master’s Thesis Yost, W. Cost Effective Fault Tolerance for Network Routing. Seattle: Department of Computer Science and Engineering, University of Washington; 1995.

[357] Zima, H., Brezany, P., Chapman, B., Mehrotra, P., Schwald, A. Vienna Fortran: A Language Specification (Version 1.1) 1991.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.61.133