超算概况
1、TOP500
https://top500.org/lists/top500/2022/06/
2、国家超算中心
我校超算
1、硬件资源
2、软件资源
2.1 商业软件
2.2 开源软件
使用超算
1、上机指南
2、申请账号
3、正式上机
3.1 安装模拟器,登录集群
软件下载地址:https://www.xshell.com/zh/free-for-home-school/
3.2 准备pbs脚本
这里假设以下各种运行任务的脚本内容保存到test.pbs文件中。
1)简化版
#PBS -N hello
#PBS -q blades
#PBS -l nodes=2:ppn=20
#PBS -j oe
#PBS -l walltime=0:5:0
cd $PBS_O_WORKDIR
JOBID=`echo $PBS_JOBID | awk -F. '{print $1}'`
echo This job id is $JOBID | tee job_info.log
echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log
echo Start time is `date` | tee -a job_info.log
echo This job runs on the following nodes: | tee -a job_info.log
echo `cat $PBS_NODEFILE | sort | uniq` | tee -a job_info.log
NPROCS=`cat $PBS_NODEFILE | wc -l`
NNODES=`uniq $PBS_NODEFILE | wc -l`
PPROCS=$(($NPROCS/$NNODES))
echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log
uniq $PBS_NODEFILE | sort | sed s/$/:$PPROCS/ > $PBS_O_WORKDIR/hostfile
source /public/software/profile.d/mpi_intelmpi-2017.u1.sh
MPIRUN="mpiexec.hydra -np $NPROCS -ppn $PPROCS -f $PBS_O_WORKDIR/hostfile "
JOBCMD="./hello.intel"
{ time $MPIRUN $JOBCMD; } >$PBS_O_WORKDIR/output_$JOBID.log 2>&1
echo End time is `date`| tee -a job_info.log
rm -f $PBS_O_WORKDIR/hostfile
pkill -P $$
exit 0
2)ANSYS mechanical
#PBS -N Mechanical_Test
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=16
#PBS -l walltime=20:0:0
#PBS -q blades
#PBS -j oe
cd $PBS_O_WORKDIR
echo This job id is $PBS_JOBID | tee job_info.log
echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log
cd $PBS_O_WORKDIR
echo Job start time is `date` | tee -a job_info.log
echo This job runs on the following processors: | tee -a job_info.log
echo `cat $PBS_NODEFILE|uniq` | tee -a job_info.log
NPROCS=`wc -l < $PBS_NODEFILE`
NNODES=`cat $PBS_NODEFILE | sort | uniq | wc -l`
PPROCS=$(($NPROCS/$NNODES))
echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log
machines=`uniq -c ${PBS_NODEFILE} | awk '{print $2 ":" $1}' | paste -s -d ':'`
ANSYS_HOME="/public/software/apps/ansys_inc/v182"
MECHANICAL="${ANSYS_HOME}/ansys/bin/mapdl"
$MECHANICAL -b -dis -mpi INTELMPI -machines ${machines} -j "test" -i test.inp 2>&1 | tee -a mechanical_out.txt
echo End time is `date`| tee -a job_info.log
pkill -P $$
exit 0
3)ANSYS lsdyna
#PBS -N ansys_lsdyna
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=16
#PBS -l walltime=72:00:0
#PBS -q blades
#PBS -j oe
cd $PBS_O_WORKDIR
echo This job id is $PBS_JOBID | tee job_info.log
echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log
cd $PBS_O_WORKDIR
echo Job start time is `date` | tee -a job_info.log
echo This job runs on the following processors: | tee -a job_info.log
echo `uniq $PBS_NODEFILE` | tee -a job_info.log
NPROCS=`wc -l < $PBS_NODEFILE`
NNODES=`cat $PBS_NODEFILE | sort | uniq | wc -l`
PPROCS=$(($NPROCS/$NNODES))
echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log
uniq $PBS_NODEFILE | sort | sed s/$/i:$PPROCS/ > $PBS_O_WORKDIR/hostfile
hostlist=`cat hostfile | xargs | sed "s/ /:/g"`
ANSYS_HOME="/public/software/apps/ansys_inc/v182"
LSDYNA="${ANSYS_HOME}/ansys/bin/ansys182 -lsdynampp -dis -mpi intelmpi -machines $hostlist memory=60000000"
$LSDYNA i=inclinedcylinder.k 2>&1 | tee -a out_lsdyna.txt
echo End time is `date`| tee -a job_info.log
rm -f $PBS_O_WORKDIR/hostfile
pkill -P $$
exit 0
4)MATLAB
#PBS -N MATLAB
#PBS -l nodes=1:ppn=20
#PBS -j oe
#PBS -q blades
#PBS -l walltime=72:0:0
cd $PBS_O_WORKDIR
JOBID=`echo $PBS_JOBID | awk -F. '{print $1}'`
echo This job id is $JOBID | tee job_info.log
echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log
echo Start time is `date` | tee -a job_info.log
echo This job runs on the following nodes: | tee -a job_info.log
echo `cat $PBS_NODEFILE | sort | uniq` | tee -a job_info.log
NPROCS=`cat $PBS_NODEFILE | wc -l`
NNODES=`uniq $PBS_NODEFILE | wc -l`
PPROCS=$(($NPROCS/$NNODES))
echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log
uniq $PBS_NODEFILE | sort | sed s/$/i:$PPROCS/ > $PBS_O_WORKDIR/hostfile
#source your profile, then uncomment line below
export PATH=$PATH:/public/software/apps/MATLAB/R2018a/bin
#matlabfile without ".m" extension
matlab -c 27000@admin1 -nodesktop -nodisplay -r matlabfile > matlab1.out 2>&1
echo End time is `date`| tee -a job_info.log
rm -f $PBS_O_WORKDIR/hostfile
pkill -P $$
exit 0
5)Fluent
#PBS -N FLUENT
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=20
#PBS -l walltime=24:0:0
#PBS -q blades
#PBS -j oe
cd $PBS_O_WORKDIR
echo This job id is $PBS_JOBID | tee job_info.log
echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log
cd $PBS_O_WORKDIR
echo Job start time is `date` | tee -a job_info.log
echo This job runs on the following processors: | tee -a job_info.log
echo `cat $PBS_NODEFILE` | tee -a job_info.log
NPROCS=`wc -l < $PBS_NODEFILE`
NNODES=`cat $PBS_NODEFILE | sort | uniq | wc -l`
PPROCS=$(($NPROCS/$NNODES))
echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log
#Generate hostfile for IB
cat $PBS_NODEFILE | uniq | sort > $PBS_O_WORKDIR/hostfile
#Job command
ANSYS_HOME="/public/software/apps/ansys_inc/v180"
FLUENT="${ANSYS_HOME}/fluent/bin/fluent"
$FLUENT 3ddp -t$NPROCS -mpi=intel -cnf=hostfile -g -i inputfile.jou 2>&1 | tee -a fluent.txt
echo End time is `date`| tee -a job_info.log
#rm -f $PBS_O_WORKDIR/mpi.hosts
pkill -P $$
exit 0
6)更多示例
/public/software/pbs_examples
3.3 传输文件
使用XFTP将在windows端准备好的pbs文件以及数据文件传输到集群用户目录下,也可以将计算结果从集群下载到本地windows目录中。
1)打开XFTP
在登录的界面上,点击如图所示图标即可打开XFTP软件。
2)传输文件
在打开的窗口中,左侧是本地windows目录,右侧是集群上的用户目录。使用鼠标将左侧文件拖动至右侧,则表示将windows上的文件上传到集群;使用鼠标将右侧文件拖动至左侧,则表示将集群上的文件下载到本地windows目录中。
3.4 作业管理
1)提交作业
$ qsub test.pbs
81693.admin-ha
2)查询作业
$ qstat
Job id Name User Time Use S Queue
---------------- ---------- ---------- ---- - ----
81602.admin1 G09 test 107:36:4 R blades
81604.admin1 CBN test 01:18:17 C fnode
81693.admin1 G63 test 0 Q blades
作业状态说明:
E:退出 Q:排队 H :挂起 R :运行 C:结束
显示作业运行在哪些节点上:
$qstat –n 81602
81602.admin1
c1437/0+c1437/1+c1437/2+c1437/3+c1437/4+c1437/5+c1437/6+c1437/7+c1437/8
+c1437/9+c1437/10+c1437/11+c1437/12+c1437/13+c1437/14+c1437/15+c1437/16
+c1437/17+c1437/18+c1437/19
查询作业详细信息:
$ qstat -f 81602
Job Id: 81602.admin1
Job_Name = G09
Job_Owner = test@login1
resources_used.cput = 108:04:40
resources_used.mem = 13133068kb
resources_used.vmem = 16141896kb
resources_used.walltime = 05:48:16
job_state = R
queue = blades
server = admin1
Checkpoint = u
ctime = Tue May 16 22:47:16 2017
Error_Path = login1:/public/home/wu/G09.e81602
exec_host = c1437/0+c1437/1+c1437/2+c1437/3+c1437/4+c1437/5+c1437/6+c1437/….
Hold_Types = n
Join_Path = oe
……
3)终止作业
$ qdel 81693
3.5 查询作业日志(hello.o484890)
可以在集群上使用vim命令打开日志文件进行查看,也可以使用XFTP将日志文件传回到windows系统中进行查看。
This job id is 484890
Working directory is /public/home/songchao/samples/hello
Start time is YYYY年 MM月 DD日 星期五 hh:mm:ss CST
This job runs on the following nodes:
c1234 c5678
This job has allocated 2 nodes, 20 processors.
End time is YYYY年 MM月 DD日 星期五 hh:mm:ss CST
3.6 查看作业输出(output_484890.log)
可以在集群上使用vim命令打开输出文件进行查看,也可以使用XFTP将输出文件传回到windows系统中进行查看。
myid is 1, coming form processor c1234
myid is 2, coming form processor c1234
......
4、收费标准
http://hpc.dlut.edu.cn/yhzx/sfbz.htm
5、查机时费
http://hpc.dlut.edu.cn/cjsf/cjsf.htm
6、交机时费
http://hpc.dlut.edu.cn/jjsf/jjsf.htm
疫情期间,经费报销单签字盖章,填好超算账号,然后拍照发给陈永刚老师(ygchen@dlut.edu.cn)
常见问题
1)常见问题及解答参考: http://hpc.dlut.edu.cn/yhzx/cjwt1.htm
2)往期培训视频参考: http://video.dlut.edu.cn/vod-show-detail/273
附件【超算中心培训课件.pdf】