基础培训
基础培训
当前位置: 首页 >> 基础培训 >> 基础培训


超算概况


1、TOP500

https://top500.org/lists/top500/2022/06/



2、国家超算中心



我校超算


1、硬件资源

2、软件资源


2.1 商业软件

2.2 开源软件

使用超算


1、上机指南

2、申请账号

3、正式上机


3.1 安装模拟器,登录集群

软件下载地址:https://www.xshell.com/zh/free-for-home-school/

3.2 准备pbs脚本

这里假设以下各种运行任务的脚本内容保存到test.pbs文件中。


1)简化版


#PBS -N hello

#PBS -q blades

#PBS -l nodes=2:ppn=20

#PBS -j oe

#PBS -l walltime=0:5:0


cd $PBS_O_WORKDIR

JOBID=`echo $PBS_JOBID | awk -F. '{print $1}'`

echo This job id is $JOBID | tee job_info.log

echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log

echo Start time is `date` | tee -a job_info.log

echo This job runs on the following nodes: | tee -a job_info.log

echo `cat $PBS_NODEFILE | sort | uniq` | tee -a job_info.log

NPROCS=`cat $PBS_NODEFILE | wc -l`

NNODES=`uniq $PBS_NODEFILE | wc -l`

PPROCS=$(($NPROCS/$NNODES))

echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log

uniq $PBS_NODEFILE | sort | sed s/$/:$PPROCS/ > $PBS_O_WORKDIR/hostfile

source /public/software/profile.d/mpi_intelmpi-2017.u1.sh


MPIRUN="mpiexec.hydra -np $NPROCS -ppn $PPROCS  -f $PBS_O_WORKDIR/hostfile "

JOBCMD="./hello.intel"

{ time $MPIRUN $JOBCMD; } >$PBS_O_WORKDIR/output_$JOBID.log 2>&1


echo End time is `date`| tee -a job_info.log

rm -f  $PBS_O_WORKDIR/hostfile

pkill -P $$

exit 0



2)ANSYS mechanical


#PBS -N Mechanical_Test

#PBS -S /bin/bash

#PBS -l nodes=1:ppn=16

#PBS -l walltime=20:0:0

#PBS -q blades

#PBS -j oe


cd $PBS_O_WORKDIR

echo This job id is $PBS_JOBID | tee job_info.log

echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log

cd $PBS_O_WORKDIR

echo Job start time is `date` | tee -a job_info.log

echo This job runs on the following processors: | tee -a job_info.log

echo `cat $PBS_NODEFILE|uniq` | tee -a job_info.log

NPROCS=`wc -l < $PBS_NODEFILE`

NNODES=`cat $PBS_NODEFILE | sort | uniq | wc -l`

PPROCS=$(($NPROCS/$NNODES))

echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log


machines=`uniq -c ${PBS_NODEFILE} | awk '{print $2 ":" $1}' | paste -s -d ':'`


ANSYS_HOME="/public/software/apps/ansys_inc/v182"

MECHANICAL="${ANSYS_HOME}/ansys/bin/mapdl"

$MECHANICAL -b -dis -mpi INTELMPI -machines ${machines} -j "test" -i test.inp  2>&1 | tee -a mechanical_out.txt


echo End time is `date`| tee -a job_info.log

pkill -P $$

exit 0


3)ANSYS lsdyna


#PBS -N ansys_lsdyna

#PBS -S /bin/bash

#PBS -l nodes=1:ppn=16

#PBS -l walltime=72:00:0

#PBS -q blades

#PBS -j oe


cd $PBS_O_WORKDIR


echo This job id is $PBS_JOBID | tee job_info.log

echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log

cd $PBS_O_WORKDIR

echo Job start time is `date` | tee -a job_info.log

echo This job runs on the following processors: | tee -a job_info.log

echo `uniq $PBS_NODEFILE` | tee -a job_info.log

NPROCS=`wc -l < $PBS_NODEFILE`

NNODES=`cat $PBS_NODEFILE | sort | uniq | wc -l`

PPROCS=$(($NPROCS/$NNODES))

echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log


uniq $PBS_NODEFILE | sort | sed s/$/i:$PPROCS/ > $PBS_O_WORKDIR/hostfile

hostlist=`cat hostfile | xargs | sed "s/ /:/g"`


ANSYS_HOME="/public/software/apps/ansys_inc/v182"

LSDYNA="${ANSYS_HOME}/ansys/bin/ansys182 -lsdynampp -dis -mpi intelmpi -machines $hostlist  memory=60000000"

$LSDYNA i=inclinedcylinder.k 2>&1 | tee -a out_lsdyna.txt


echo End time is `date`| tee -a job_info.log

rm -f  $PBS_O_WORKDIR/hostfile

pkill -P $$

exit 0


4)MATLAB

#PBS -N MATLAB

#PBS -l nodes=1:ppn=20

#PBS -j oe

#PBS -q blades

#PBS -l walltime=72:0:0


cd $PBS_O_WORKDIR

JOBID=`echo $PBS_JOBID | awk -F. '{print $1}'`

echo This job id is $JOBID | tee job_info.log

echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log

echo Start time is `date` | tee -a job_info.log

echo This job runs on the following nodes: | tee -a job_info.log

echo `cat $PBS_NODEFILE | sort | uniq` | tee -a job_info.log

NPROCS=`cat $PBS_NODEFILE | wc -l`

NNODES=`uniq $PBS_NODEFILE | wc -l`

PPROCS=$(($NPROCS/$NNODES))

echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log

uniq $PBS_NODEFILE | sort | sed s/$/i:$PPROCS/ > $PBS_O_WORKDIR/hostfile


#source your profile, then uncomment line below

export PATH=$PATH:/public/software/apps/MATLAB/R2018a/bin


#matlabfile without ".m" extension

matlab -c 27000@admin1 -nodesktop -nodisplay -r matlabfile > matlab1.out 2>&1


echo End time is `date`| tee -a job_info.log

rm -f  $PBS_O_WORKDIR/hostfile

pkill -P $$

exit 0


5)Fluent

#PBS -N FLUENT

#PBS -S /bin/bash

#PBS -l nodes=2:ppn=20

#PBS -l walltime=24:0:0

#PBS -q blades

#PBS -j oe


cd $PBS_O_WORKDIR


echo This job id is $PBS_JOBID | tee job_info.log

echo Working directory is $PBS_O_WORKDIR | tee -a job_info.log

cd $PBS_O_WORKDIR

echo Job start time is `date` | tee -a job_info.log

echo This job runs on the following processors: | tee -a job_info.log

echo `cat $PBS_NODEFILE` | tee -a job_info.log

NPROCS=`wc -l < $PBS_NODEFILE`

NNODES=`cat $PBS_NODEFILE | sort | uniq | wc -l`

PPROCS=$(($NPROCS/$NNODES))

echo This job has allocated $NNODES nodes, $NPROCS processors.| tee -a job_info.log


#Generate hostfile for IB

cat $PBS_NODEFILE | uniq | sort > $PBS_O_WORKDIR/hostfile


#Job command

ANSYS_HOME="/public/software/apps/ansys_inc/v180"

FLUENT="${ANSYS_HOME}/fluent/bin/fluent"

$FLUENT 3ddp -t$NPROCS -mpi=intel -cnf=hostfile -g -i inputfile.jou  2>&1 | tee -a fluent.txt


echo End time is `date`| tee -a job_info.log

#rm -f  $PBS_O_WORKDIR/mpi.hosts

pkill -P $$

exit 0


6)更多示例

/public/software/pbs_examples


3.3 传输文件

使用XFTP将在windows端准备好的pbs文件以及数据文件传输到集群用户目录下,也可以将计算结果从集群下载到本地windows目录中。


1)打开XFTP

在登录的界面上,点击如图所示图标即可打开XFTP软件。

2)传输文件

在打开的窗口中,左侧是本地windows目录,右侧是集群上的用户目录。使用鼠标将左侧文件拖动至右侧,则表示将windows上的文件上传到集群;使用鼠标将右侧文件拖动至左侧,则表示将集群上的文件下载到本地windows目录中。


3.4 作业管理


1)提交作业

$ qsub test.pbs

81693.admin-ha


2)查询作业

$ qstat


Job id           Name        User      Time Use      S       Queue

----------------   ----------    ----------    ----         -       ----

81602.admin1          G09    test         107:36:4      R       blades

81604.admin1          CBN    test        01:18:17       C       fnode

81693.admin1          G63    test           0        Q      blades


作业状态说明:

E:退出      Q:排队   H :挂起   R :运行    C:结束


显示作业运行在哪些节点上:

$qstat –n  81602

81602.admin1


c1437/0+c1437/1+c1437/2+c1437/3+c1437/4+c1437/5+c1437/6+c1437/7+c1437/8

+c1437/9+c1437/10+c1437/11+c1437/12+c1437/13+c1437/14+c1437/15+c1437/16

+c1437/17+c1437/18+c1437/19


查询作业详细信息:

$ qstat -f 81602


Job Id: 81602.admin1

Job_Name = G09

Job_Owner = test@login1

resources_used.cput = 108:04:40

resources_used.mem = 13133068kb

resources_used.vmem = 16141896kb

resources_used.walltime = 05:48:16

job_state = R

queue = blades

server = admin1

Checkpoint = u

ctime = Tue May 16 22:47:16 2017

Error_Path = login1:/public/home/wu/G09.e81602

exec_host = c1437/0+c1437/1+c1437/2+c1437/3+c1437/4+c1437/5+c1437/6+c1437/….

Hold_Types = n

Join_Path = oe

……


3)终止作业

$ qdel 81693


3.5 查询作业日志(hello.o484890)

可以在集群上使用vim命令打开日志文件进行查看,也可以使用XFTP将日志文件传回到windows系统中进行查看。


This job id is 484890

Working directory is /public/home/songchao/samples/hello

Start time is YYYY年 MM月 DD日 星期五 hh:mm:ss CST

This job runs on the following nodes:

c1234 c5678

This job has allocated 2 nodes, 20 processors.

End time is YYYY年 MM月 DD日 星期五 hh:mm:ss CST


3.6 查看作业输出(output_484890.log)

可以在集群上使用vim命令打开输出文件进行查看,也可以使用XFTP将输出文件传回到windows系统中进行查看。


myid is 1, coming form processor c1234

myid is 2, coming form processor c1234

......


4、收费标准

http://hpc.dlut.edu.cn/yhzx/sfbz.htm


5、查机时费

http://hpc.dlut.edu.cn/cjsf/cjsf.htm


6、交机时费

http://hpc.dlut.edu.cn/jjsf/jjsf.htm

疫情期间,经费报销单签字盖章,填好超算账号,然后拍照发给陈永刚老师(ygchen@dlut.edu.cn)




常见问题

1)常见问题及解答参考: http://hpc.dlut.edu.cn/yhzx/cjwt1.htm

2)往期培训视频参考: http://video.dlut.edu.cn/vod-show-detail/273




附件【超算中心培训课件.pdf