◇◇新语丝(www.xys.org)(xys.dxiong.com)(xys.3322.org)(xys.freedns.us)◇◇

清华大学Zhixin Ba, Haichang Zhou, Huai Zhang and Zhenxiao Yang IEEE会议文章剽窃

BerkeleyWolf

详细比较了一篇发表在IEEE上的剽窃文章。剽窃文章的作者为署名清华大学的
Zhixin Ba ,Haichang Zhou ,Huai Zhang and Zhenxiao Yang，发表于2000 The
Fourth International Conference on High-Performance Computing in the
Asia-Pacific Region-Volume 1 。被剽窃的文章发表于1994 Scalable
Parallel Libraries Conference，作者Natawut Nupairoj and Lionel M. Ni。
2000年的文章是比较稀疏的3页，1994年的文章是比较密集的8页。本文中，未加
【】的是2000年的剽窃文章，因为xys格式所限除去了图表和公式，为文章全文。
加【】的是1994年文章，仅为和2000年文章对应部分。因为PDF转换TXT技术所限，
个别单词可能有所错误。

该剽窃事件已经通过电子邮件发送通知现在仍然在密西根州立大学的Lionel M.
Ni教授。 (http://www.cse.msu.edu/~ni/)。

剽窃文章由 anarch 最早发现报料，并发在MITBBS上。感谢XYS的读者Chen先生
(未经许可，故不提供全名，见谅)提供了两篇文章的电子文本。

Performance Evaluation of some MPI Implementations
on Workstation Clusters
Zhixin Ba ,Haichang Zhou ,Huai Zhang and Zhenxiao Yang
High performance computing center
Cernet, Tsinghua University, 100084
bazx@chpcc.edu.cn

http://csdl.computer.org/comp/proceedings/hpc/2000/0589/01/05890392abs
.htm
The Fourth International Conference on High-Performance Computing in
the Asia-Pacific Region-Volume 1
May 14 - 17, 2000
Beijing, China

【Performance Evaluation of Some MPI Implementations on
Workstation Clusters *
Natawut Nupairoj and Lionel M. Ni
Department of Computer Science
Michigan State University
East Lansing, MI 48824-1027
{nupairoj, ni}@cps.msu.edu
http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=376999
Scalable Parallel Libraries Conference, 1994., Proceedings of the 1994
】

Abstract
Message Passing Interface (MPI) has already become a standard of the
communication library for distributed-memory computing systems.
【Message Passing Interface (MPI) is an attempt to standardize the
communication library for distributed-memory computing systems. 】

Since the release of the new versions of MPI specification, several
MPI implementations have been made public available. .
【Since the re-lease of the recent MPI specification, several MPI
implementations have been made publicly available.】

Different implementations employ different approaches. It is critical
to selecting an appropriate MPI implementation for message passing
based of communication is extremely crucial to these applications.

【Different implementations employ different approaches, and thus, the
performance of each implementation may vary. Since the performance of
communication is extremely crucial to message-passing based
applications, selecting an appropriate MPI implementation becomes
critical. 】

Our study is intended to provide a guideline on how to submit a task
and how to perform such a task, economically and effectively, on
workstation clusters in high performance computing.

【Our study is intended to provide a guideline on how to perform such
a task on workstation clusters which are known to be an economical and
effective platform in high performance computing.】

We investigate several MPI aspects including its implementations,
supporting hardware environment and derived datatype which affect the
communication performance. In the end, our results point out the
strength and weakness of different implementations on our experimental
system.

【We investigate several MPI aspects including its functionalities and
performance. Our results also point out the strength and weakness of
each implementation on our experimental system. 】

1. Introduction
In our study, four popular MPI implementations, shown in Figure 1 ,
are considered. Our testing environment is based on IBM SP2 system
interconnected via both Ethernet and high performance switch. The high
performance switch can provide up to 1 OOMbps per channel.

【Our testing environment consists of 6 DEC Alpha workstations
interconnected via both Ethernet and a DEC GIGAswitch. The DEC GIGA
switch can provide up to 100 Mbps per channel. 】

Figure 'I. The model of the communication modes

【identical to figure in 1994 paper: Figure 1. The model of the
communication modes】

We have developed a set of benchmarks to evaluate the performance of
both point-to-point and collective communication services. These
benchmark programs include:

【We have developed a set of benchmarks to evaluate the performance of
both point-to-point and collective communication services. These
benchmark programs include: 】

1. Ping: to measure the peak performance of the point-to-point
communication over a communication channel;

【1. Ping: to measure the peak performance of the point-to-point
communication over a communication channel; 】

2. PingPong: to evaluate the end-to-end communication latency which
include the effect of the communication protocol;
【2. PingPong: to evaluate the end-to-end communication latency which
includes the effect of the communication protocol; and】

3. Collective: to evaluate the performance of some collective
communication, including broadcast, and barrier synchronization.
【3. Collective: to evaluate the performance of some collective
communication, including broadcast, and barrier synchronization.】

The rest of this paper is organized as follow. In section 2, we
discuss the model and performance metrics used in our study. Section 3
introduces experimental results. In Section 4, we conclude our paper.

Due to space limitation, only partial results are presented in this
paper.
【Due to space limitation, only partial results are presented.
Interested readers may refer to [6] for additional performance results.
】

2 Model and Metrics

【3 Model and Metrics 】

2.1 Measurement Model

【3.1 Measurement Model】

Figure 2. The model of the communication

【identical to 1994 paper: Figure 2. The measurement model 】

2.2 Performance Metrics

【3.2 Performance Metrics】

We compare with our program benchmarks on different communication
models. We also focus on the difference of communication performance
of different communication models. The following two metrics are
sufficient for the evaluation.

【Comparing two communication systems requires measuring several
metrics. In our study, we compare the implementation of different
communication libraries. Thus, only two metrics are sufficient for the
evaluation.】

Communication latency (t)
The communication latency (t) is defined to be the time that a process
spend when it sends or receives (or both) a message. The communication
latency is proportional to the message size, which is given by

【We define the communication latency (t ) to be the time that a
process has to spend when it sends or receives (or both) a message.
The communication latency is proportional to the message size which is
given by 】

t = t_s +n x t i +[n /p] J X t_p (1)

【t = t_s +n x t i +[n /p] J X t_p (1) 】

where t, is the start-up latency which is fixed for each message, n
indicates the size of message, t, is the transmission latency (usually
much less then t_s), and t_p, is the packaging latency. The start-up
latency also includes the fixed cost of system call and initialization
overhead.

【where t , is the start-up latency which is fixed for each message, n
indicates the size of the message, t t is the transmission latency
(usually much less than t S ), and t , is the packetization latency.
The start-up latency also includes the fixed cost of system call and
initialization overhead. 】

Channel throughput ( p)
Channel throughput (p ) or bandwidth is the rate at which the network
can deliver data (usually in Mbits per second). It IS widely used
among the vendors because of its simplicity. We use this metrics when
we compare the performance of different message sizes. The throughput
can be directly computed from the communication latency by

【The channel throughput (p ) or bandwidth is the rate at which the
network can deliver data (usually in Mbits per second). It is widely
used among the vendors because of its simplicity. We use this metric
when we compare the performance of different message sizes. The
throughput can be directly computed from the communication latency by】

/####### formula 2 is omitted here ########### (2)
【identical formula 】
if we substitute t with Equation (1 ), the throughput becomes
【if we substitute t with Equation (1 ), the throughput becomes 】

/####### formula 2 is omitted here ########### (3)
【identical formula 】

So the peak throughput will be limited to (10^-6)/t_t when the message
size is infinite.
【Thus the peak throughput will be limited to (10^-6)/t_t when the
message size is infinite.】

Furthermore, the maximum throughput that can be achieved is defined as
the sustained throughput.
By sending messages as fast as possible, such as in the buffered mode,
we can compute the sustained throughput form Equation (2).
【We further define the sustained throughput as the maximum throughput
that can be achieved. By injecting messages to the communication
channel as fast as possible, such as repeatedly sending messages in
the buffered mode. we can compute the sustained throughput from
Equation (2 ).】

2.3 Communication Parameters
【3.3 Communication Parameters】

There may be relationship between some communication parameters and
communication performance. The communication performance can be
greatly improved when appropriate values are set for the parameters.
In our benchmarks, we focus on two parameters: message size and the
buffer size.
【Some communication parameters may have dramatic impact on the
communication performance. The
communication performance can be greatly improved when appropriate
values are used for the parameters.
In our benchmarks, we study two major parameters: message size and the
buffer size.】

3 Experiments
3.1 Testing Environment
In our study, we perform our experiments on workstation clusters of
IBM SP2, which consists of 28 RS/6000 nodes interconnected via network
and high performance switches, including 4 broad-nodes with 512 Mbytes
of main memory and 24
narrow-nodes with 256M Bytes of main memory. The parallel programs are
performed on the narrow-nodes. The Operation System is AIX 4.1.5.

3.2 Experiments Results
In this section, we mainly present the results from our experiments
and then analysis these results. Each data-point in our results is
average of 10 testing data. More exact to gain, the maximum length of
messages is 10 Kbytes.

Figure 3. Sending Latency (short messages)
4 Conclusion
【7 Conclusion】

In this paper, we discuss the performance of some MPI implementations
publicly available on workstation clusters. From the analysis, we can
figure out that the software overhead is very high and it plays very
important pole on improving the overall network throughput.

【In this paper, we discuss the evaluation of some MPI implementations
which are currently publicly available on workstation clusters. Our
results indicate that the software overhead is very high and has to be
greatly reduced in order to fully exploit the bandwidth of the
high-speed switch. 】

Among all these implementations, we suggest you selecting the buffered
mode as the best communication mode on IBM SP2 machine, because this
mode can efficiently employ the bandwidth of high performance switches,
so as to improve the overall programs communication throughput.
Certainly, choosing this communication mode, you should be carefully
because this mode requests a lots of memory.

If the communication is between end-to-end, we can replace the
standard functions with sendrecv function, so as to simplify the
program and to prevent communication from deadlock.

Since the space of this paper is limited, we cannot discuss the
situation which performing end-to-end communication with non-blocking
communication function rather than with blocking functions. Moreover,
some other communication modes MPI provided, such as non-contiguous
datatype and pack/unpack, will be discussed later.

【Because of time limitation, we could not conduct an extensive set of
experiments on different distribution of non-contiguous datatypes. But
our initial results based on simple vector datatype show that the cost
of sending non-contiguous datatype is not much higher than sending
contiguous datatype of the same size. Further investigation on the
impact of the noncontiguous datatype is needed. We are also
investigating the performance of other collective communication
services. .】

References
1. M. P . I. Forum. MPI: A Message-Passing Interface Standard. Mar.
1994
2. B. Gropp. R. Lusk. T. Skjellum. and N. Doss. Portable MPI Model
Implementation .
Argonne National Laboratory, July 1994
3 , H I Nupairoj and L a N i p " Performance evaluation of some MPI
implementations I “Tech. Rep I MSUCPS-ACS-94. Department of Computer
Science . Michigan State University, Sept I 1994

(XYS20050313)

◇◇新语丝(www.xys.org)(xys.dxiong.com)(xys.3322.org)(xys.freedns.us)◇◇