Chemical Visualization Facility

NQS on the Chemistry Visualization Cluster


Contents


Overview

This document describes the Network Queueing System available on the visualization facility computers in the MSU Chemistry Department. It provides only a brief introduction to help users get started using NQS. For more details, consult the appropriate man pages. The nqs man page (type "man nqs") lists all of the NQS commands and the corresponding man pages.


Queue Structure

There are two batch queues on each machine. One of the queues, named "batch_f", is a fast queue with a four hour CPU time limit. If a job in this queue accumulates more than four hours of CPU time, the queue system will kill the job and send a mail message to the owner. The other queue, batch_u, is unlimited.

There are also two pipe queues on each machine. The pipe queues, "fast" and "unlimited", implement load balancing across the cluster by routing jobs to the least loaded machine for execution . These queues do not actually execute jobs; they are simply routes or "pipes" from one machine to another. When a job is submitted to unlimited or fast on any of the machines, it will first be sent to argus (the scheduler) where it will wait for the next available batch queue slot. When a batch queue slot opens up on one of the computers in the cluster, argus will send the job to the corresponding batch queue (batch_u for unlimited or batch_f for fast) on that computer for execution. If more than one computer has an open queue slot, argus will choose the machine with the lightest load.

Jobs should be submitted to fast or unlimited whenever possible to take advantage of load balancing. If you have a job that must run on a specific machine for some reason, submit the job to batch_f or batch_u on that machine. The unlimited queue will be used by default if you do not specify a queue when you submit a job.


Submitting a Job

Jobs are submitted to NQS using the qsub command. Qsub accepts a script which contains the shell commands to be executed when the job runs. You can also instruct qsub to modify the characteristics of the job by embedding switches in the script or by placing them on the command line. There are many switches available; see the qsub man page for details.

A script file can be as simple as a single line of text containing the command to run. Here is an example script file:

 a.out <file.input >file.output

This job could be submitted to the default queue with the following command (assume the script file is named "scriptfile"):

 qsub scriptfile

To send the job to the fast queue instead of the default (unlimited), use this command:

 qsub -q fast scriptfile

Jobs can also be submitted interactively with qsub. For example, instead of putting "a.out ..." in a file and then submitting that file, the job could be submitted interactively as follows:

 qsub <ENTER> 
 a.out <file.input >file.output <ENTER>
 <CONTROL-D>

Running a Job on a Particular Computer

The unlimited and fast queues should be used whenever possible, as they take advantage of the load balancing capabilities of NQS. There may be times, however, when a job must be run on a particular computer. For example, a very large Gaussian job may need the extra memory available on huckel, lewis or pauling. In these situations, users may submit the job to either batch_u or batch_f, as appropriate. The job will then run in that queue on the computer from which it was submitted.


Submitting Aces Jobs

NOTE: Aces is currently available on hbar only.

To submit an Aces job to the queue system, type the following command:

 aces2sub [-64] [-m email] [additional_qsub_arguments] filename

where "filename" is the name of your input file without the ".inp" extension. The arguments in brackets are optional. The -64 option will invoke the 64-bit version of the program. If an email address is given, the output file will be sent to that address upon completion of the job.


Submitting Gaussian Jobs

Submitting Gaussian jobs can be a complicated process, so a separate script for this purpose has already been constructed. It will accept qsub switches in addition to the input file on the command line. Use the following command to submit a Gaussian job:

 g98sub [additional_qsub_arguments] filename

where "filename" is the name of your input file without the ".inp" extension. The additional qsub arguments are optional. To submit a Gaussian job to the default unlimited queue, use this command:

 g98sub filename

To submit a Gaussian job to the fast queue, use this command:

 g98sub -q fast filename

Submitting GAMESS Jobs

Submitting GAMESS jobs is much like submitting Gaussian jobs. Use the following command to submit a GAMESS job:

 gmssub [additional_qsub_arguments] filename

where "filename" is the name of your input file without the ".inp" extension. The additional qsub arguments are optional.


Submitting Molpro Jobs

Use the following command to submit a Molpro job:

 m98sub [molpro options] [-q queuename] filename

where "filename" is the name of your input file. The arguments in brackets are optional.


Submitting Spartan Jobs

Spartan has been configured to submit jobs to NQS automatically. Jobs are submitted in the normal manner by selecting the "Submit" option from the "Setup" menu. A new window will appear on the screen with NQS as the only option for submitting the job. Submitting a Spartan job in this fashion will cause the job to be submitted to the unlimited queue.

There may be times when a Spartan job must be executed on a particular computer. In those situations, the following method should be used. First, use the Spartan "Setup" menu to configure the job parameters. Do NOT submit the job from Spartan. Instead, save the file and quit Spartan. From the Unix command prompt, type the following command:

 spartan -x filename -q batch_u &

where "filename" is the name of your Spartan file. If you want to submit a Spartan job to the fast queue, use this command:

 spartan -x filename -q batch_f &

Any of the other qsub switches may also be listed after the filename in the above commands.

Note that a side effect of configuring Spartan in this manner is that the Spartan Monitor can no longer find jobs on other computers. It will only work if the Hosts filter (View->Filters) is set to Local, and then it will only find jobs running on the local machine. Use the qstat command (e.g. qstat -ad) instead to find jobs running on other machines.


Getting the Status of a Job

To check the status of a job, use the qstat command. Typing qstat alone will list only your jobs on the computer where you typed the qstat command. Typing "qstat -d" will check every computer in the cluster (NQS calls it a domain, thus the '-d') for jobs belonging to you, and typing "qstat -ad" will list all jobs on every computer in the cluster belonging to anybody. To see a list of jobs running on just one computer, add "@hostname" to the qstat command. For example:

 qstat -a @huckel

will list all jobs running on huckel.

A complete NQS request ID consists of a number and a hostname (the host from which the job was submitted) separated by a period. For example, 97.huckel is a complete request ID. The default output of qstat only gives the number part of the request ID. To get the full request ID, use the -s or -l options with qstat. For example, to get the full request IDs for all jobs running on huckel, type:

 qstat -sa @huckel

Deleting a Job

The qdel command is used to delete a job from a queue. First get the job ID number by using the qstat command, then type qdel followed by the job ID.

The default output of qstat only gives part of the job ID, but the qdel command may require the full ID. The full ID will be required if you are typing the qdel command on a computer that is not the same as the computer from which the job was submitted. See the section above for more information about getting the full job ID.

If the job is currently running, you must add the -k switch to tell qdel to kill the job. If the job is in a queue on a different machine, add "@machine" to the job I.D. For example, to delete job number 28 which is still waiting to run, type:

 qdel 28

To delete job number 27, which is running, type:

 qdel -k 27

To delete job 29.huckel, which was submitted from huckel and is running on argus (but you are not on huckel), type:

 qdel -k 29.huckel@argus

Summary

To do this: Type this:
Submit a job to the default (unlimited) queue
 qsub shell_script
Submit a job to the fast queue
 qsub -q fast shell_script
Submit a Gaussian job
 g98sub filename
Check the status of your jobs on the current machine
 qstat
Check the status of all jobs on the current machine
 qstat -a
Check the status of your jobs on all machines
 qstat -d
Check the status of all jobs on all machines
 qstat -ad
Delete job number 26, queued on the current machine
 qdel 26
Delete job number 26, running on and submitted from the current machine
 qdel -k 26
Delete job number 26, running on argus but submitted from huckel
 qdel -k 26.huckel@argus