Subsections


4.2 How to use batch processing data analysis servers

Batch processing data analysis servers consist of 2 middle-range batch processing data analysis servers (kaibm[01-02].ana.nao.ac.jp). The purpose of these servers is to execute batch processes with job management system. In order to execute batch processes, the batch processing data analysis servers are installed PBS Professional (PBS: Portable Batch System), and the "kaibm01.ana.nao.ac.jp" works as a PBS management server and the "kaibm[01-02].ana.nao.ac.jp" work as calculation servers. PBS management server allocates jobs to calculation servers, so that jobs which need many CPU cores or many memory regions are executed with efficiency.
Figure 4.1: An illustrated outline of the batch processing data analysis servers.
Image batch-e


4.2.1 System configuration

Batch processing data analysis servers consist of 2 servers (FUJITSU Server PRIMERGY RX2530 M2). Each server is installed the Red Hat Enterprise Linux 7.
Table 4.6: Specification of the batch processing data analysis server
Host name kaibm[01-02].ana.nao.ac.jp
Machine FUJITSU Server PRIMERGY RX2530 M2
Quantity 2
OS Red Hat Enterprise Linux 7
CPU Intel Xeon E5 2667 V4 3.2 GHz 16 core
RAM DDR4 2400 RDIMM 192GB


4.2.2 Queue configuration

PBS Professional have job queue which control an execution sequence of jobs. When a user submit a job to one of job queues, the PBS management server makes a judgment whether the calculation servers can execute the job or not. If the PBS management server judges that the calculation servers cannot execute the job, then the processing of the job will be a waiting state. The usable computational resource and the execution priority differ by the job queues. Users must select a suitable queue depending on the scale of the user's own jobs because to use an unsuitable queue is a wasteful use of computational resource.

Table 4.7: Job queue configuration of the middle-range batch processing data analysis servers
Queue CPU cores Usable memory per a job Time limit for a job Number of executable jobs per an user
q1 1 11GB 30 days 32 (soft-limit: 2)
q4 4 44GB 30 days 8 (soft-limit: 1)
q8 8 88GB 15 days 4 (soft-limit: 1)
q16 16 176GB 15 days 2 (soft-limit: 1)


4.2.3 Tutorial

In order to use the batch processing data analysis servers, you need to make a shell script called PBS script and submit a job into a job queue using the "qsub" command on the middle-range or high-end interactive data analysis servers. In this section, we introduce the basic steps to submit your jobs.
  1. How to make a PBS script
  2. How to submit and delete a job
  3. How to display job status


1. How to make a PBS script

PBS script is a shell script in which the orders for the PBS Professional and executable programs are described. The following script is an example of the PBS script when we want to execute a program "a.out" on the queue "q1".
#!/bin/bash
#PBS -q q1
#PBS -m abe
#PBS -M taro.tenmon@nao.ac.jp
# Go to this job’s working director
cd $PBS_O_WORKDIR
# Run your executable
./a.out
The "#PBS" lines are the orders for the PBS Professional. In this script, we have made the following orders. Other orders are introduced in Section 4.2.4.
PBS -q q1: A job is submitted to the queue "q1".
PBS -m abe: An E-mail is sent when a job is stopped, when a submitted job begins execution, and when an executed job has completed, respectively.
PBS -M taro.tenmon@nao.ac.jp: An E-mail is sent to taro.tenmon@nao.ac.jp. Please use an E-mail address issued from your institution.
The "$PBS_O_WORKDIR" is an environment variable defined in the PBS Professional. The "$PBS_O_WORKDIR" expresses the path to the directory where the PBS script is submitted. We can execute several programs using the "&".


2. How to submit and delete a job

We can submit a job to a queue by executing the "qsub" command.
$ qsub Your_PBS_Script.sh
The submitted job can be deleted by executing the "qdel" command. As we shall see later, the Job_ID can be seen by executing the "qstat" command.
$ qdel Job_ID


3. How to display job status

We can display status of jobs submitted to queues by executing the "qstat" command.
$ qstat
Job id            Name       User       Time Use  S  Queue
----------------  ---------  --------   -------   -  -----
9013.a000         job1       user1      50:20:10  R  q1
9019.a000         job2       user2      40:32:13  R  q1
9030.a000         job3       user3      30:14:19  R  q1
9079.a000         job4       user4      00:59:15  R  q1
9102.a000         job5       user5             0  Q  q1
Each column represents the job ID, job name, user name, elapsed time, status of a job, and queue name, respectively. The status of a job has following states.
E (Exiting): A job has finished execution and is being terminated.
H (Held): A job is in a suspension state.
Q (Queued): A job is held in a queue in a suspended state.
R (Running): A job is under executing.
S (Suspended): A job is being suspended.


4.2.4 The details of the PBS Professional

This section introduces the details of the PBS Professional installed in the batch processing data analysis servers.
  1. About the PBS Professional
  2. How to control jobs
  3. Examples of the PBS script
  4. The orders for the PBS Professional
  5. The environment variables of the PBS Professional
  6. The way to display the job status
  7. About a priority control of the jobs
  8. About a Round-robin type control


1. About the PBS Professional

PBS Professional is a distributed workload management system, which performs management and monitoring of calculating processing workload on one or more computers. This application servers the following three main purposes:
Queuing
If an user submits a job to a resource management system, these jobs will be in the state of awaiting execution until their execution commences.
Scheduling
The process of selecting when and where a job is being submitted to according to the policy defined in advance.
Monitoring
Pursuing and reserving a system resource and enforcement of a resource use policy.


2. How to control jobs

Submit a job

We can submit a job to a queue by executing the "qsub" command.
$ qsub PBS_script.sh
It is possible to assign values to variables in your PBS script by using "-v" option. We show the format and an example below.
$ qsub -v var1=val,var2=val,var3=val PBS_script.sh
$ qsub -v x=10,y=20,char=abc my_pbs.sh
If you submit a job without a PBS script, you have to specify the orders for the PBS Professional, path to your executable in the full path, and arguments used in your executable on the "qsub" command line. We show the format and an example below.
$ qsub orders -- your_executable_with_full_path arg1 arg2
$ qsub -q q1 -m abe -M taro.tenmon@nao.ac.jp -- /lfs01/tenmontr/my_prog.out 3.14 2.71
Please refer to the details of the "qsub" command using the following command.
$ man qsub

Delete a job

The submitted job can be deleted by executing the "qdel" command. As we shall see later, the Job_ID can be seen by executing the "qstat" command.
$ qdel Job_ID

Hold a job

Queued jobs can be held by executing the "qhold" command.
$ qhold Job_ID

Release a job

Held jobs can be released by executing the "qrls" command.
$ qrls Job_ID


3. Examples of the PBS script

We show several examples of the PBS script.
# An example of the PBS script usign single core.
#!/bin/sh
#PBS -r y
#PBS -m abe
#PBS -q q1
#PBS -o Log.out
#PBS -e Log.err
#PBS -N job_name
#PBS -M taro.tenmon@nao.ac.jp
# Go to this job’s working director
cd $PBS_O_WORKDIR
# Run your executable
./a.out

# An example of the PBS script using multi-cores.
#!/bin/bash
#PBS -r y
#PBS -m abe
#PBS -q q4
#PBS -o Log.out
#PBS -e Log.err
#PBS -N job_name
#PBS -M taro.tenmon@nao.ac.jp
# Go to this job’s working directory
cd $PBS_O_WORKDIR
# Run your executable
./a_0.out &
./a_1.out &
./a_2.out &
./a_3.out


4. The orders for the PBS Professional

In the PBS script, many orders for the PBS Professional can be available. The following orders for the PBS Professional are some of the most commonly used.

PBS -q

This order specifies a queue where you submit a job. If you do not specify this order, then the PBS Professional assumes that the "PBS -q q1" is specified.

PBS -r

This order specifies whether the PBS Professional restart the submitted jobs or not after the system is restored. If you do not specify this order, then the PBS Professional assumes that the "PBS -r y" is specified.

PBS -m

This order specifies whether the PBS Professional send you an E-mail or not. If you do not specifies this order, then the PBS Professional assumes that the "PBS -m a" is specified.

PBS -M

This order specifies an E-mail address where the E-mail from the PBS Professional will be send. If you have specified the "PBS -m [abe]", you must specify the"PBS -M Your_E-mail_Address".

PBS -N

This order specifies a particular name of the job which will be submitted. The particular name is displayed when the "qstat" command is executed. Note that the job-name should be an ASCII alphanumeric-characters sequence of less than 15 characters, and should not be a number but a character. If you do not specify this order, then the name of the PBS script is used as a job name.

PBS -o

This order specifies whether the PBS Professional writes the standard output to a file or not.

PBS -e

This order specifies whether the PBS Professional writes the standard error to a file or not.

PBS -l

This order sets a limit on the calculation resource.


5. The environment variables of the PBS Professional

In the PBS script, we can use the PBS environment variables defined in the PBS Professional. The following variables are the typical PBS environment variables.
PBS_O_WORKDIR: The path to a current directory in which the PBS script will be submitted.
PBS_JOBNAME: The job name of the job which will be submitted.
PBS_JOBID: The job ID of the job which will be submitted.
PBS_O_HOME: The home directory of the user who will submit the job.
PBS_O_QUEUE: The name of the queue to which the job will be submitted.


6. The way to display the job status

The "qstat" command displays a status of jobs which are submitted to queues. The "qstat" command includes a lot of options. The following examples are the typical options of the "qstat" command.

qstat

The "qstat" command without options displays the following output.
$ qstat
Job id            Name       User       Time Use  S  Queue
----------------  ---------  --------   -------   -  -----
9013.a000         job1       user1      50:20:10  R  q1
9019.a000         job2       user2      40:32:13  R  q1
9030.a000         job3       user3      30:14:19  R  q1
9079.a000         job4       user4      00:59:15  R  q1
9102.a000         job5       user5             0  Q  q1
The status of jobs is expressed in the column of the "S".
E (Exiting): An execution of the job has been finished.
H (Held): The job is in a suspension state.
Q (Queued): The job is held in a queue in a suspended state.
R (Running): The job is under execution.
S (Suspended): The job is being suspended.

qstat -Q

The "qstat -Q" command displays the information of the queues available in the batch processing data analysis servers.
$ qstat -Q
Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
q1                   0     0 yes yes     0     0     0     0     0     0 Exec
q4                   0     0 yes yes     0     0     0     0     0     0 Exec
q8                   0     0 yes yes     0     0     0     0     0     0 Exec
q16                  0     0 yes yes     0     0     0     0     0     0 Exec

qstat -q

The "qstat -q" command displays the information of the queues in a different format.
$ qstat -q
Queue            Memory CPU Time Walltime Node   Run   Que   Lm  State
---------------- ------ -------- -------- ---- ----- ----- ----  -----
q1                 11gb    --    720:00:0  --      0     0   --   E R
q4                 44gb    --    720:00:0  --      0     0   --   E R
q8                 88gb    --    360:00:0  --      0     0   --   E R
q16               176gb    --    360:00:0  --      0     0   --   E R
                                               ----- -----
                                                   0     0

qstat -u

The "qstat -u" displays only your job.
$ qstat -u your_account

qstat -r

The "qstat -r" displays only the executing jobs.
$ qstat -r

man qstat

Please refer to the details of the "qstat" command using the following command.
$ man qstat


7. About a priority control of the jobs

If an user submit several jobs which exceed the number of maximum executable jobs defined in the queue, then the priority of the user's own jobs is decreased. This priority control is a soft-limit of the PBS Professional. The following settings are an example for explaining the priority control.
set queue q1 max_run = [u:PBS_GENERIC=8]
set queue q1 max_run_soft = [u:PBS_GENERIC=4]
If other users submit jobs of not more than 8 pieces when low priority jobs are under execution, then the low priority jobs will be set in a suspend state and it will be kept waiting until the other user's jobs are finished. The suspended jobs are displayed as "S" in the result of the "qstat" command.
Job id            Name       User          Time Use  S  Queue
----------------  ---------  -----------   -------   -  -----
9013.a000         myjob      someuser      01:02:10  S  q1

Examples of the priority control

We show an example of the priority control in the situation where several users submit jobs to the queue "q1". Note that we assumed that 8 CPU cores are available in the "q1", the number of maximum executable jobs is 8 jobs, and the soft-limit is 4 jobs.

1. If user "A" submit 11 jobs and no other user have submitted jobs, then 8 jobs will be executed.

Running : A A A A A' A' A' A' The jobs between 5th and 8th from the left (A') are in the low priority.
Queued : A” A” A” The 9th and the subsequent jobs (A”) are held in the queue.


2. If user "B" submit 4 jobs, then the low priority jobs (A') will be suspended.

Running : A A A A B B B B The jobs B instead of jobs A' are executed.
Queued : A” A” A”  
Suspended : A' A' A' A' The jobs A' are suspended.


3. When the jobs B are terminated, then the jobs A' will be re-executed.

Running : A A A A A' A' A' A' The jobs A' are re-executed.
Queued : A” A” A”  



8. About a Round-robin type control

If jobs are submitted when no jobs have been submitted, the jobs will be allocated as below. This is a Round-robin type control for sharing the load among the calculation servers.
kaibm01 => kaibm02 => kaibm01 => kaibm02 => ...


4.2.5 Handling of jobs in the scheduled maintenance

Running and queuing jobs are killed in the scheduled maintenance. Please re-submit jobs after the maintenance.

ADC
2019-07-19