Computação Distribuída com TORQUE Resource Manager - Parte 2

A intenção deste artigo, é fornecer uma visão geral para submissão e controle de trabalhos na estrutura do PBS/TORQUE.

[ Hits: 14.818 ]

Por: Juno Kim em 21/10/2013 | Blog: http://www.kim.eti.br


Exemplo de JOBS para o TORQUE



Segue os exemplos de scripts simples, intermediário e avançado para o TORQUE:

Exemplo 1:

#!/bin/bash

# Name of my job:
#PBS -N My-Program

# Run for 1 hour:
#PBS -l walltime=1:00:00

# Where to write stderr:
#PBS -e myprog.err

# Where to write stdout:
#PBS -o myprog.out

# Send me email when my job aborts, begins, or ends
#PBS -m abe

# This command switched to the directory from which the "qsub" command was run:

cd $PBS_O_WORKDIR

# Now run my program
./myprog argument1 argument2

echo Done!

Exemplo 2:

#!/bin/bash
#PBS -q ccs_short
#PBS -N my_serial_job
#PBS -l walltime=01:00:00
#PBS -l nodes=1:noib:ppn=1
#PBS -m e
#PBS -M user@tulane.edu


echo Start Job
cd /u00/scratch/run1
date
pwd
./a.out
echo End Job

Exemplo 3:

###
### Skeleton of a PBS script for submitting sequential or parallel
### jobs on the PSI batch cluster:
###
### https://www.millennium.berkeley.edu/PSI
###
### You MUST edit this skeleton and insert the parameters and commands
### for your program.
###
### PBS is the name of the batch scheduler protocol. The batch
### scheduler is responsible for assigning jobs to nodes in a cluster.
### It tries to make sure that jobs don't fight for processing
### resources, so that they can get the best performance. Without a
### batch scheduler, you can never be sure that some other job might
### interrupt your job on a node.
###
### A PBS script is a standard *nix shell script, with special
### commands for the PBS batch scheduler. Lines beginning with "#PBS"
### are NOT comments; they are PBS commands. You can comment them out
### by putting "### " in front of them.
###
### Once you've written the script, you can submit the job using
### the qsub command. See, for example,
###
### http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission
###
### For example, if your script is called "pbs.sh", run the following:
###
### qsub pbs.sh
###
### Note that the PBS script itself only runs once, on one compute
### node. It is used to start either a process (which may optionally
### create threads on that node), or to call gexec or mpirun in order
### to start multiple MPI processes (which is only for MPI programs).
###

#!/bin/sh
### Set the job name
#PBS -N myprogram

### Declare myprogram non-rerunable
#PBS -r n

### Uncomment to send email when the job is completed:
### #PBS -m ae
### #PBS -M your@email.address

### Optionally specify destinations for your program's output.
### Specify localhost and an NFS filesystem to prevent file copy errors.
### Without these lines or if the paths are invalid, PBS will redirect
### stdout and stderr to files in your home directory.
### #PBS -e localhost:/work/username/test.err
### #PBS -o localhost:/work/username/test.log

### Optional similar syntax words might work for staging files.
### #PBS -W stagein=/scratch/file.in@localhost:/full/file.in
### #PBS -W stageout=/scratch/file.out@localhost:/full/file.out

### Set the queue to "batch", "psi", or "zen".
### psi/batch serves the newer 8-core/48GB and 24-core/256GB nodes
### zen serves the older 3GHz/2-core/3GB and 8-core/16GB nodes
#PBS -q batch
###PBS -q psi
###PBS -q zen

### Specify the number of cpus for your job. This example will run on 16 cpus
### using 8 nodes with 2 processes per node.
### You MUST specify some number of nodes or Torque will fail to load balance.
#PBS -l nodes=8:ppn=2

### You should tell PBS how much memory you expect your job will use. mem=1g or mem=1024m
#PBS -l mem=256m

### Some portions of some clusters, such as older nodes in the PSI BATCH cluster,
### support Myrinet GM which is a faster interface for doing internode communication
### with MPI programs.
### If you want to run your MPI program with GM, you should add one of the following
### flags (probably just the "gm" flag) to ensure that you are assigned nodes
### with Myrinet cards:
###
### gm
### gm_pci64b
### gm_pcixd
###
### For example, to run on 4 processes on each of 2 nodes with Myrinet cards you write
### #PBS -l nodes=2:ppn=4:gm

### Other options may be available too. For example, to run on only 3GHz PSI
### nodes, add "cpu3000" to your nodes line.
###
### Valid options are:
### PSI 24-core: cpu2000 nehalem nehalemEx beckton mem256g
### PSI 8-core: cpu2667 nehalem gainestown mem48g
### ZEN 8-core: cpu2333 pe1950 mem16g
### ZEN 2-core: cpu3000 gm gm_pci64b pe1850 mem3g
### #PBS -l nodes=1:ppn=24:cpu2000 # for an exclusive reservation of a 24-core node (PSI cluster)
### #PBS -l nodes=1:ppn=8:cpu2667 # for an exclusive reservation of an 8-core node (PSI cluster)
### #PBS -l nodes=1:ppn=1:cpu2667 # use one core of an 8-core node (PSI cluster)
### #PBS -l nodes=1:ppn=8:cpu2333 # for an exclusive reservation of an 8-core node (ZEN cluster)
### #PBS -l nodes=1:ppn=2:cpu3000 # for an exclusive reservation of a 2-core node (ZEN cluster)
### #PBS -l nodes=8:ppn=2:cpu3000 # for reserving eight 2-core nodes (ZEN cluster)

### You can override the default 1 hour real-world time limit and 24-hour CPU-time.
### Shortening the default CPU-time may improve your chances of running as a back-fill job.
### Check out the current back-fill availability by running /usr/local/maui/bin/showbf on psi.
### or see the tail end of the status page: http://freak.millennium.berkeley.edu/psi/status.html
### -l walltime=HH:MM:SS and -l cput=HH:MM:SS
### Jobs on the public clusters are currently limited to 10 days walltime.
#PBS -l walltime=1:00:00
#PBS -l cput=24:00:00

### Switch to the working directory; by default Torque launches processes from your home directory.
### Jobs should only be run from /work; Torque returns results via NFS.

echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR

### Run some informational commands.
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo `cat $PBS_NODEFILE`

### Define number of processors
NPROCS=`wc -l < $PBS_NODEFILE`
echo This job has allocated $NPROCS cpus

### Use gexec to run a program on the assigned processors. $GEXEC_SVRS is automatically defined for you.
gexec -n 0 myprogram

### Alternatively, run a parallel MPI executable.
mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS mympiprogram

### Or, if you're running on only one node, run your executable directly:
/some/path/to/a/binary

Assim, podemos pesquisar mais algumas "features" e avançar no conhecimento.

Obrigado e espero que ajude a todos!

Referências


Página anterior    

Páginas do artigo
   1. PBS TORQUE - Controle de filas
   2. Exemplo de JOBS para o TORQUE
Outros artigos deste autor

A essência de ser Livre

Configurando o proftpd com autenticação de usuário pelo passwd

Computação Distribuída com TORQUE Resource Manager

Leitura recomendada

Ganhe uma camiseta do Viva o Linux ajudando o Viva o Android

Como fazer RAID 0 com mdadm no Debian

Raspberry Pi 4B Como Servidor Linux de Baixo Custo

Como se comunicar com outros usuários da rede

GSlapt - Gerenciando os pacotes de seu Slackware

  
Comentários

Nenhum comentário foi encontrado.


Contribuir com comentário




Patrocínio

Site hospedado pelo provedor RedeHost.
Linux banner

Destaques

Artigos

Dicas

Tópicos

Top 10 do mês

Scripts