From Documentation
Revision as of 17:09, 15 April 2019 by Roberpj (Talk | contribs)

Jump to: navigation, search

The documentation for using ANSYS on Compute Canada systems is available here.
The documentation for using ANSYS on SHARCNET legacy system is provided below.


ANSYS
Description: Suite of programs that allow users to carry out fluid flow simulation
SHARCNET Package information: see ANSYS software page in web portal
Full list of SHARCNET supported software


Contents

Introduction

The ansys module provides free for use ansys fluent, fluent-gui, cfx, cfx-gui, icemcfd-gui and workbench runwb2. Research groups who want to use ansys apdl, apdl-gui however must purchase an extension for the sharcnet license.

Version Selection

Graham Cluster

Sharcnet legacy clusters used the Sharcnet ANSYS license by default.
However Compute Canada system require a text file named ansys.lic be created as follows:

[user@gra-login1:~/.licenses] cat ansys.lic 
setenv("ANSYSLMD_LICENSE_FILE", "1055@license3.sharcnet.ca")
setenv("ANSYSLI_SERVERS", "2325@license3.sharcnet.ca")

Once that is done the module maybe loaded:

module load ansys/18.2

Legacy Clusters

module load ansys/17.2

Job Submission

Before submitting a fluent or cfx jobs to the queue load the ansys module as shown above in Version Selection otherwise the software will not be found at runtime.

Graham Cluster

Instructions for submitting a 64core fluent job to the Graham queue in packed format onto 2 nodes is shown here https://docs.computecanada.ca/wiki/ANSYS. To submit a 32core job set "#SBATCH --nodes=1" or to submit a 96core job set "#SBATCH --nodes=3" and so forth, Since jobs are submitted to whole jobs using 32 core increments, larger memory simulations can be handled by additionally setting a maximum of "#SBATCH --mem=120G" which is the memory allocated per node ie) 120G/32cores = 3.75GB per core. Smaller 2d simulations may not scale well to 32core, in such case a smaller sized parallel job such using 8cores instead for example may be submitted by changing "#SBATCH --nodes=1" and "#SBATCH --cpus-per-task=8".

Job Dependencies

To submit a series of jobs to the slurm schedular on graham such that job2 doesnt start until job1 completes and so forth do:

jobid1=$(sbatch myscript1.sh | awk '{print $4}')
jobid2=$(sbatch --dependency=afterany:$jobid1 myscript2.sh | awk '{print $4}')
jobid3=$(sbatch --dependency=afterany:$jobid2 myscript3.sh | awk '{print $4}')
jobid4=$(sbatch --dependency=afterany:$jobid3 myscript4.sh | awk '{print $4}')
$ squeue -u $USER -o "%.8A %.4C %.10m %.20E"
  JOBID CPUS MIN_MEMORY           DEPENDENCY
3780464    4        80G                     
3780535    4        80G     afterany:3780464
3780544    4        80G     afterany:3780535
3780547    4        80G     afterany:3780544

For further information see https://docs.computecanada.ca/wiki/Running_jobs#Cancellation_of_jobs_with_dependency_conditions_which_cannot_be_met

Legacy Clusters

Fluent Job Submission

The following three files should be placed into a directory under /scratch/username or /work/username. From this directory jobs can be submitted as explained below. Note that fluent jobs should never be submitted from a directory under /home/username

i) initialized dat file
ii) proven functional cas file
iii) corresponding journal file.  

As of October 2009 a new command "ansysstat" is available on all clusters which shows total number of fluent licenses that are checked out from the central license server. The SHARCNET license allows a total of 25 jobs to run at one time. If you submit a job to the queue but all ansys licenses are already in use (when your jobs attempts to run) then it will sit idle (though appear running according to sqjobs) until sufficient ansys license are available for up to 24hours before being killed - in such cases you will see the error message Fatal error has happened to some of the processes! Exiting.

Job Submission Syntax

To submit a serial job (1 processor) do the following: :

sqsub -r 1d -q serial -o ofile.%J fluent Model sample.jou

where         where Model  = 2d, 2ddp, 3d, 3ddp

Wobbie Specific

Ansys is no longer installed on wobbie.<be> This section will remain temporarily as the potential for reinstallation is reevaluated.

Parallel fluent (and cfx) jobs must be submitted to the threaded queue on Wobbie. Use this command where the -n and --mpp shown are arbitrary:

sqsub -r 1h -q threaded -n 24 -f xeon --mpp=16G -o ofile.%J fluent Model sample.jou

Note that wobbie consists of several large memory xeon nodes including wob[141-142] (768gb memory and 24core each), wob[143-147] (128gb memory and 24core each), wob[148-149] (512gb memory and 28core each). As a result the largest job one may submit is n=28 core. The cpu clocks of all the nodes are identical however the memory bandwidth on wob[141-142] is slightly slower.

Optimizing Parallel Performance

Once a job is submitted to the queue a period of time will elapse before it starts running on the cluster. Several factors influence this wait time as shown in https://www.sharcnet.ca/my/perf_cluster/cur_perf. Generally speaking large jobs get distributed over a large number of nodes by the queue. This helps minimize queue wait time since resources are allocated to an arbitrary number of nodes as they come available. However it can result in increased communication latency and hence some performance loss on the order of ~10% or more according to some users. A solution (at the cost of some increased queue wait time) is to pack ansys jobs onto fewer node than ncores by doing something like "sqsub -n ncores -N ncores/4 -q mpi etc". For example if submitting a job with -n 16 you might specify -N 2, -N 3 or -N 4. Due to the increase queue time one will likely not want to specify a value of -N > 4 and if submitting many jobs ensure the descired performance enhancement is achieved by conducting upfront scaling tests.

The pack Option

The Sharcnet sqsub command takes a --pack switch as an alternative to directly specifying -N nodes. As described by "man sqsub" this switch will ensure a minimal number of nodes are selected, so that processes will occupy all cpus per node. This means the user is responsible for choosing "-n" to be divisible by the number of cores per node - once submitted the job will drain then run on N fully packed nodes (all cores utilized per node). When using --pack for scaling runs the tests should be repeated to verify consistency (at least once). The option works on all sharcnet clusters as follows:

sqsub -r 1h --nompirun --pack -q mpi -n nCpus -o ofile.%J fluent Model sample.jou
The ppn Option

Yet another approach is specifying the number of cores per node or ppn. This has the benefit of not requiring N whole nodes to completely drain before a job can start. Rather only say ppn=4 cores are needed to drain per node. Another benefit of using n & ppn VERSUS n & N is that the distribution of cores is uniform. Consider the easy of performing the following exploration where n is increased and therefore typically mpp can be reduced. Such tests could likewise be Daisy Chained as explained in the next section:

sqsub -r 3.9h --nompirun -q mpi -n 12 --ppn=6 --mpp=4G -o pres.%J fluent 3d mysimulation.jou
sqsub -r 3.9h --nompirun -q mpi -n 18 --ppn=6 --mpp=3G -o pres.%J fluent 3d mysimulation.jou
sqsub -r 3.9h --nompirun -q mpi -n 24 --ppn=6 --mpp=2G -o pres.%J fluent 3d mysimulation.jou
Batch File Hints

Solve->Monitors->Residual :
        o   Select Print    (to watch residuals: tail -f ofileJOB#)
        o   Deselect Plot    (since batch jobs do not provide graphics)

File->Write->AutoSave :
        o   Set Case File Frequency = 0
        o   Specify Filename = itrFileName    (cleanup afterwards: rm -f itr*.dat)
        o   Set Data File Frequency = 1000    (save itrFileName.dat ~once/day)

Daisy Chaining Jobs

The max number of cores each user can use for ansys is 48. To determine the minimum value of mpp (memory per processor) its most convenient to conduct an ordered series of jobs in such a way that n=48 not exceeded. This maybe done as follows, where initial jobs with values of mpp that are too small will fail quickly. Once a mpp value is found such that the test runs to completion, the remaining jobs maybe killed. A 3h runtime is chosen so contributor nodes may also be used:

sqsub -r 3h --idfile myid --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=1G -o ofile.%J fluent 3ddp mysim.jou
sqsub -r 3h --idfile myid -w $(<myid) --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=2G -o ofile.%J fluent 3ddp mysim.jou
sqsub -r 3h --idfile myid -w $(<myid) --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=3G -o ofile.%J fluent 3ddp mysim.jou
sqsub -r 3h --idfile myid -w $(<myid) --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=4G -o ofile.%J fluent 3ddp mysim.jou
sqsub -r 3h --idfile myid -w $(<myid) --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=5G -o ofile.%J fluent 3ddp mysim.jou
sqsub -r 3h --idfile myid -w $(<myid) --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=6G -o ofile.%J fluent 3ddp mysim.jou
sqsub -r 3h --idfile myid -w $(<myid) --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=7G -o ofile.%J fluent 3ddp mysim.jou
sqsub -r 3h --idfile myid -w $(<myid) --nompirun --mail-start --mail-end -q mpi -n 32 --mpp=8G -o ofile.%J fluent 3ddp mysim.jou

Scaling Test Procedure

The optimal choice of nCpu for a fluent case will typically be very "machine dependant". Therefore before running a long fluent job it is recommended that a few short tests cases be run through the test queue to investigate Wall Clock times required for job completion. Generally expect to find that nCpus Optimal = Total Memory In Giga Bytes.

The test queue is ideal for scaling tests since it allows jobs to start running nearly immediately. However such test jobs must finish within the 60 minute time limit of the queue, likely one to ten timesteps can be run for most cases. From results of such experimentation one should be able gain insight to help choose the amount of parallelization. For example:

    a) First determine wallclock time for 1cpu :
sqsub -r 1h -q serial -o ofile%J fluent 2d sample.jou

    b) Next determine wallclock time for 2cpu :
sqsub -r 1h --nompirun -q mpi -n 2 -o ofile%J fluent 2d sample.jou

    c) Next determine wallclock time for 4cpu :
sqsub -r 1h --nompirun -q mpi -n 4 -o ofile%J fluent 2d sample.jou

    d) Continue to investigate nCpus = 8, 16, 24, 32.

Typically small 2d fluent cases requiring memory of less than 200MB will complete fastest with nCpus chosen = 1 or 2. While larger single precision (2d or 3d) or double precision (2ddp or 3ddp) will be more likely to benefit from nCpu > 2. The CPU Time value printed at the bottom of the queue output file (ofileJOB#) upon successful completion of a job, and should not be confused with WallClock time when determining the optimal choice of the nCpus parameter, else extremely inefficient usage of the resources and hence large amounts of wasted compute time could result! Clusters with GigE interconnects are particularly strong candidates for this behaviour where communication overhead can quickly dominate driving down %CPU utilization.

The following table gives an example of this point for a small test case where n=2 is shown to be the optimal choice for full runs. Its worth mentioning that choosing n=1 for this this example would free the second processor and second fluent license for others to use while only costly about 15% extra wait time for the job to complete.

nCpu
Cpu Time
(ofileJOB#.out)
WallClock
(total real)
TIME
(avg/cpu)
%CPU
(avg/cpu)
n=1 106.61 sec. 107 sec 98 sec 98.6
n=2   47.00 sec.   92 sec 63 sec 82.2
n=4   54.00 sec. 107 sec 53 sec 58.6
n=8   70.00 sec. 186 sec 62 sec 43.2
n=16   77.00 sec. 559 sec 69 sec 14.6

Sample Journal Files

Note that some lines have a few words in round brackets at the end. These are just description comments and therefore should NOT be included in your journal file. To change a line in our journal file to be a comment simply put a semi-colon ; in front and it will be ignored by fluent.

Steady Example (interpreted udf)

First its recommended to open your cas file in fluent-gui and click Define --> User-Defined --> Functions --> Interpreted --> Unset File Name then re-save the cas file before using it in the following journal file templates when submitting jobs to the queue. Please do not include the contents between the round brackets shown in either of the following two journal files, as these are just descriptive comments.

Initialize Domain

This journal file (sample1.jou) reads cas and dat files with a common prefix ie) CasDatFileName.cas, CasDatFileName.dat then initializes the computational domain and runs 1000 iterations:

define/user-defined/interpreted-functions "myudf.c" "cpp" 10000 no
file/read-cas-data CasDatFileName
solve/initialize/initialize-flow
file/auto-save/case-frequency if-case-is-modified
file/auto-save/data-frequency 100           (save numbered dat file every 100 iterations)
file/auto-save/root-name itrCasDatFileName  (file format will be itrCasDatFileName-####.dat)
file/confirm-overwrite y
solve/iterate 1000
file/write-data finCasDatFileName
exit
yes
Restart Domain

In this journal file (sample2.jou) the cas and dat files have a different prefix. It restarts calculations using CasDatFileName.cas and itrCasDatFileName-1000.dat output from sample1.jou therefore the command to initialize the domain is commented out using a semicolon as follows:

define/user-defined/interpreted-functions "myudf.c" "cpp" 10000 no
file/read-case CasDatFileName
file/read-data itrCasDatFileName-1000
;solve/initialize/initialize-flow
file/auto-save/case-frequency if-case-is-modified
file/auto-save/data-frequency 100           (save numbered dat file every 100 iterations)
file/auto-save/root-name itrCasDatFileName  (file format will be itrCasDatFileName-####.dat)
file/confirm-overwrite y
solve/iterate 1000
file/write-data finCasDatFileName
exit
yes
Unsteady Example (compiled udf)

Please note that you cannot simply upload your libudf directory to sharcnet from your local workstation, it will not work. You must compile your UDF on sharcnet as described in the How to Compile a UDF section below. The following journal file (sample3.jou) can be compared with the previous interpreted sample1.udf files:

First its recommended to open your cas file in fluent-gui and click:
o Define --> User-Defined --> Functions --> Interpreted --> (unset Source File Name)
o Define --> User-Defined --> Functions --> Compiled --> (add/build or load libudf)
o Define --> User-Defined --> Functions --> Manage --> (unload all UDF bibraries)
o Define --> User-defined --> Functions --> Manage --> Load (type libudf then click Load)

1) In the following stanza, round(bracketed) statements should not be included in the journal file.
2) Lines with leading semi-colons are comment lines and not processed with fluent, they are provided as options to similar lines.

define/user-defined/compiled-functions load "libudf/lnamd64/3d/libudf.so"
file/read-cas-data Filename
file/auto-save/case-frequency if-mesh-is-modified
file/auto-save/data-frequency 100         (save numbered dat file every 100 time steps)
;file/auto-save/root-name itrFilename     (file format will be itrFilename-100.dat, ...  itrFilename-100.dat)
file/auto-save/root-name itrFilename.gz   (file format will be itrFilename-100.dat.gz, ...  itrFilename-1000.dat.gz)
file/confirm-overwrite y
solve/set/time-step 0.001
solve/dual-time-iterate 1000 40    (perform 1000 time steps with 40 iteration per time step)
;file/write-data finFilename
file/write-data finFilename.gz
exit
yes

where:
      o  number physical time steps specified = 1000
      o  maximum number iterations per time step = 40
      o  time step in secs over rides cas file = 0.001
      o  frequency to save dat file solution = 100

Boundary Condition Check

Assuming boundary conditions were defined using interpreted UDF(s) during the case file construction/setup then its worthwhile to verify such conditions are still defined before running large jobs. This can be done by first interpreting the UDF then second reading in the corresponding cas file using "fluent-gui" and then inspect Define --> Boundary Condition --> Click inlet --> Click Set --> Velocity Magnitide and these should reference the udf profile velocity profiles defined such as ie) udf YourProfileName. If this procedure is done with compiled UDF one will find the boundary conditions are not defined unless either interpreted first or hooked in, as shown in the Compiled UDF example journal file above.

Howto Compile a UDF (on SHARCNET)
GUI Interface UDF Compilation

To compile a UDF on SHARCNET for a given ansys module version, fluent-gui must be run on a sharcnet visualization workstation. To do this, establish a graphical connection. Goto https://www.sharcnet.ca/my/systems/index and click one of the Blue Icons in the Visualization Workstation table. Once connected the remote desktop appears login and open a terminal window. Next cd into the directory where your cas/dat simulation files are located, remove or rename the libudf subdirectory if it exists. Next type module load ansys then startup fluent-gui selecting the appropriate precision and dimensionality such as 3d and single precision [optionally now read in your cas file] and then from the pull down menu click Define --> User-Defined --> Functions --> Compiled.

On the left side of the Compiled UDFs popup box click Add highlight your UDF file (such as myudf.c) then click OK and finally click Build. Assuming the Library Name field said libudf then a library file named libudf/lnamd64/3d/libudf.so should have been created assuming your are working on x86_64 based sharcnet cluster such as orca. If you were working on an Itanium based SMP system (all of which are obsolete now) then libudf.so would have created under libudf/lnia64/3d. Copying a libudf.so created under libudf/win64 using a workstation in a researcher lab will most likely not work and the best solution is to erase the entire libudf and recompile the udf to create a fresh libudf.so on sharcnet as described in this section.

I suggest renaming the newly created libudf directory to match the ansys module version that i was created with for future reference.

mv libudf libudf-16.2.3
ln -s libudf-16.2.3 libudf

Doing so can also be very convenient if you are going to be testing running a simulation with different versions of ansys, therefore you could have several version builds of libudf and just remove the libudf link and recreate it to point to a different version then submit the job. For instance:

rm -f libudf
ln -s libudf-17.0 libudf

The structure of a successfully created libudf directory for a 2ddp cavity problem on sharcnet should look like the following:

[roberpj@orc-login1:~/cavitydemo] tree libudf
|-- lnamd64 
| `-- 2ddp 
| |-- libudf.so 
| |-- makefile 
| |-- makelog 
| |-- udf.c -> ../../src/udf.c 
| |-- udf_names.c 
| |-- udf_names.o 
| |-- udf.o 
| `-- user.udf 
|-- Makefile 
`-- src 
        `-- udf.c

Warning! It is possible to run fluent-gui on a cluster login node (instead of vdi-centos6)) over a regular ssh connection to compile your UDF *however* the graphics will be very slow. To do this one must first run "module unload intel" otherwise the compilation will most likely fail after generating hundreds of undefined errors. The does not need to be done on vdi-centos6 or other sharcnet visualization workstations since they don';t load any sharcnet modules by default when you login.

At this point, if you wish to continue working interactively click Load in the Compiled UDFs popup box then read in your dat file otherwise exit fluent-gui and setup your journal file appropriately before submitting a job to the queue after ensuring the Source File Name field located under Define --> User-Defined --> Functions --> Interpreted is empty.

Command Line UDF Compilation

Once your udf has been compiled with fluent-gui (as described in the previous GUI section to create the libudf directory tree which should contain your udf in the src directory) it can be recompiled again any time on a cluster using the comileudf.sh command. Besides the convenience of not having to run fluent-gui possibly more importantly a sharcnet ansys license (of which researchers are currently limited to three) is not required. Simply follow the following steps where any editor can be used to modify the udf file if thats what you want to do. If you want to recompile with a different version of ansys then its necessary to use fluent-gui once to reinitialize the libudf directry.

ssh orca.sharcnet.ca
ssh orc-dev1
cd /path/to/your/simulation/libudf
nano src/myudffile.c (optional to edit)
module load ansys/17.2
module unload intel
compileudf.sh

If you are working with different ansys versions then you could create all the libudf directories with fluent-gui in one sitting, rename them by adding a version extension to libudf, then as the last step specify which libudf-version directory is active by setting a softlink as follows:

cd /path/to/simulation
module load ansys/15.0.7
run fluent-gui 
mv libudf libudf-15.077
module switch ansys/16.2.3  
run fluent-gui 
mv libudf libudf-16.233
module switch ansys/17.0  
run fluent-gui 
mv libudf libudf-17.0
ln -s libudf-16.2.3 libudf


PARALLEL VS SERIAL UDF

If you see the follow message when launching a parallel fluent case then the UDF you are using is not designed for parallel processing. If this is the case, restart your fluent job in serial mode.

Primitive Error at Node 1: open_udf_library: No such file or directory

Primitive Error at Node 2: open_udf_library: No such file or directory

Primitive Error at Node 3: open_udf_library: No such file or directory

Primitive Error at Node 4: pen_udf_library: No such file or directory

Cfx Job Submission

Graham Cluster

See https://docs.computecanada.ca/wiki/ANSYS

Legacy Clusters

Running cfx jobs on sharcnet requires some care to obtain high performance. When a job is running one may verify all processors are obtaining optimal performance (typically somewhere between 85%-100% on all cpus) by running the sqjobs -L jobid command. If one finds and processes are running much less please submit a problem ticket immediately and leave the job running.

Serial Jobs

Serial jobs can be submitted by doing:

sqsub -r 1h -q serial -o ofile.%J cfx HydrofoilGrid.def
Windeee (parallel)
sqsub -r 1h --nompirun -q mpi -n 16 --mpp=2G -o ofile.%J cfx HydrofoilGrid.def
Wobbie (parallel)

Jobs on wobbie must be submitted to the threaded queue to run on a single node. For example:

sqsub -r 1d -q threaded -f xeon -n 28 --mpp=64G -o ofile.%J cfx HydrofoilGrid.def

where wobbie consists of several large memory xeon nodes including wob[141-142] (768gb memory and 24core each), wob[143-147] (128gb memory and 24core each), wob[148-149] (512gb memory and 28core each). As a result the largest job one may submit is n=28 core.

The above sqsub submit commands were tested using the Ansys Hydrofoil example:

[roberpj@orca:~/testing/cfx] cp /opt/sharcnet/ansys/16.2.3/v162/CFX/examples/Hydro* .
[roberpj@orca:~/testing/cfx] ls
HydrofoilExperimentalCp.csv  HydrofoilGrid.def  HydrofoilIni_001.res  HydrofoilIni.pre  Hydrofoil.pre
CFX Usage Hints
Enabling Double Precision

When submitting to the queue, append the -double switch to the sqsub command line to enable double precision for instance:

sqsub -r 1h --nompirun -q mpi -n 8 --mpp=0.5G -o ofile.%J cfx HydrofoilGrid.def
  -continue-from-file red-244109.red-admin.redfin.sharcnet_001.res -double

One check in the output file it was activated (instead of single) by doing:

[roberpj@orca:~/testing/cfx] cat hnd-6980362.hnd50_001.out | grep double
 | double-int32-64bit-novc8-noifort-novc6-optimised-supfort-noprof-nos|
 | double-int32-64bit-novc8-noifort-novc6-optimised-supfort-noprof-nos|
 | double-int32-64bit-novc8-noifort-novc6-optimised-supfort-noprof-nos|
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...
 Attributes:     double-int32-64bit-novc8-noifort-novc6-optimised-su...

Another option is to configure double into the simulation file manually by opening CFX-Pre GUI and selecting the following:

Execution Control ---> Run Definition ---> Executable Selection ---> Double Precision
Boost ANSYS CFX Partitioner Estimates

To increase the partitioner memory append "-sizepar factor" to the sqsub command line. A description of this variable can be obtained by running the following command available after loading the ansys module:

[roberpj@hnd17:~] cfx5solve-help
-sizepar <factor>
    Change the memory estimates used by the ANSYS CFX Partitioner by a factor of <factor>.
Boost ANSYS CFX Solver Memory Estimates

To increase the solver memory append "-size factor" to the sqsub command line. A description of this variable can be obtained by running the "cfx5solve-help" command. Note that increasing this factor will uniformly boost the virtual memory size of each process hence the "-mpp" value may need to also be increase otherwise the job will not start but rather crash with the following error output placed near the end of the cfx out file.

 +--------------------------------------------------------------------+
 | ERROR #001100279 has occurred in subroutine ErrAction.             |
 | Message:                                                           |
 | Master-Partition Nr.    1                                          |
 | Host computer: saw309                                              |
 |           *** Run-time memory allocation error ***                 |
 |   Not enough free memory is currently available on the system.     |
 |         Could not allocate requested memory - exiting!             |
 |                                                                    |
 +--------------------------------------------------------------------+

 +--------------------------------------------------------------------+
 |                An error has occurred in cfx5solve:                 |
 |                                                                    |
 | The ANSYS CFX solver exited with return code 2.   No results file  |
 | has been created.                                                  |
 +--------------------------------------------------------------------+
Periodic Backups With Cfx-Gui
Insert Pulldown  --> Solver --> Output Control ---> Backup ---> Add new item ---> Input "Output Frequency" --> Option Chooose "Iteration Interval", Interation Interval 10  [click apply then safe def file]
Changing From Steady To Transient
Insert Pulldown  --> Analysis Type --> Option --> Transient  (Total Time = 1s, Time Steps = 10)  [click apply then safe def file]
=Write Modified Def File=
Click Tools Pulldown --> Solve --> Write Solver Input File
Extend Hydrofile Example Beyond Convergence
Start CFX-gui --> Start CFX-Pre --> Reduce: Physical Timescale = 0.05 [s], Residual Target = 1e-9.
=Specify Restart File On Command Line
sqsub -r 1h --nompirun -q mpi -n 4 -o ofile.%J cfx HydrofoilGrid.def -continue-from-file HydrofoilIni_002.res
Create .def File from .mdef File

Note that one must enter the full path to cfx5solve for the module version being used as follows. See chapter 12 in the documentation for further details regarding command line use:

/opt/sharcnet/ansys/16.2.3/v162/CFX/bin/cfx5solve -mdef model.mdef -norun
Some Useful Documentation Links

Referring to the CFX Documentation for ansys 16.2.3 see the following chapters ...

In the CFX Introduction:

o Chapter 2: Overview of ANSYS CFX (see 2.4. ANSYS CFX File Types ---> 1.4. CFX-Pre File Types):

https://www.sharcnet.ca/Software/Ansys/16.2.3/en-us/help/cfx_intr/i1302231.html

In the CFX-Pre User's Guide:

o Chapter 1. CFX-Pre Basics (see 1.4. CFX-Pre File Types):

https://www.sharcnet.ca/Software/Ansys/16.2.3/en-us/help/cfx_pre/cfx_pre.html

In the CFX-Solver Manager User's Guide:

o Chapter 3: CFX-Solver Files:

https://www.sharcnet.ca/Software/Ansys/16.2.3/en-us/help/cfx_solv/i1299415.html

o Chapter 6: CFX-Solver Manager File Menu:

https://www.sharcnet.ca/Software/Ansys/16.2.3/en-us/help/cfx_solv/i1303068.html

o Chapter 12: Starting the CFX-Solver from the Command Line

https://www.sharcnet.ca/Software/Ansys/16.2.3/en-us/help/cfx_solv/i1304872.html

Submitting APDL Jobs to the Queue

Sharcnet no longer provides free licensing for APDL however if you would like to purchase a license to use it open a problem ticket.

Graham Cluster

Please submit a ticket for help submitting apdl jobs to the graham queue distributed over one or many nodes using slurm.

Legacy Clusters

To run efficiently due to technical limitation at the time of this writing, APDL parallel jobs must be run fully distributed (-n = -N) as shown in the below examples. Note that due to large memory requirements (mpp value) observed in test jobs, its recommended APDL jobs only be submitted to a queue on orca or hound (no longer available).

1) FULLY DISTRIBUTED IN-CORE MPI JOB EXAMPLE

[roberpj@hnd19] ls test*
test.dat  test.db

[roberpj@hnd19] sqsub -r 7d --nompirun -q mpi -n 8 -N 8 --mpp 14G -o ofile.%J apdl -i test.dat
submitted as jobid 6907821

[roberpj@hnd19] sqjobs -L 6907821
  jobid  host   pid state resident  virtual %cpu command
------- ----- ----- ----- -------- -------- ---- -------
6907821  hnd4 10886     R  2958744  3653668 61.6 /opt/sharcnet/ansys/14.0/v140/a
6907821  hnd5 11251     R  3139640  3891848 61.9 /opt/sharcnet/ansys/14.0/v140/a
6907821  hnd6 10722     R  3271392  4034556 61.9 /opt/sharcnet/ansys/14.0/v140/a
6907821  hnd7 32175     R  3372772  4080896 61.2 /opt/sharcnet/ansys/14.0/v140/a
6907821  hnd8 28648     R  2805344  3432936 61.9 /opt/sharcnet/ansys/14.0/v140/a
6907821  hnd9 18869     R  3043924  3650296 61.3 /opt/sharcnet/ansys/14.0/v140/a
6907821 hnd11 22161     R  3142136  3885664 61.2 /opt/sharcnet/ansys/14.0/v140/a
6907821 hnd12 31186     R  8119540 10420728 98.5 /opt/sharcnet/ansys/14.0/v140/a
tot_rss 28.5G tot_vsz 35.3G avg_pcpu 66.2% cur_pcpu 235.5%

2) FULLY DISTRIBUTED OUT-OF-CORE MPI JOB EXAMPLE

For jobs to run out-of-core a line containing "dspoption,,optimal" must appear in the "dat" file.

[roberpj@hnd19] sqsub -r 7d --nompirun -q mpi -n 8 -N 8 --mpp 14G -o ofile.%J apdl -i test.dat
submitted as jobid 6907820

[roberpj@hnd19] sqjobs -L 6907820
  jobid  host   pid state resident  virtual %cpu command
------- ----- ----- ----- -------- -------- ---- -------
6907820  hnd4 10772     R  1432960  1917968 50.6 /opt/sharcnet/ansys/14.0/v140/a
6907820  hnd5 11139     R  1362676  1852940 50.6 /opt/sharcnet/ansys/14.0/v140/a
6907820  hnd6 10608     R  1527744  2016608 50.7 /opt/sharcnet/ansys/14.0/v140/a
6907820  hnd7 32062     R  1519564  2007460 49.1 /opt/sharcnet/ansys/14.0/v140/a
6907820  hnd8 28535     R  1342704  1828776 50.2 /opt/sharcnet/ansys/14.0/v140/a
6907820  hnd9 18756     R  1381836  1872604 49.5 /opt/sharcnet/ansys/14.0/v140/a
6907820 hnd11 22048     R  1446620  1929116 49.7 /opt/sharcnet/ansys/14.0/v140/a
6907820 hnd12 30610     S  7934744 10420748 96.5 /opt/sharcnet/ansys/14.0/v140/a
tot_rss 30.3G tot_vsz 41.3G avg_pcpu 51.1% cur_pcpu 376.3%

o SINGLE NODE IN-CORE THREADED JOB EXAMPLE

[roberpj@hnd19] ls test*
test.dat  test.db

[roberpj@hnd19] sqsub -r 7d -q threaded -n 8 --mpp 28G -o ofile.%J apdl -i test.dat
submitted as jobid 6907824

[roberpj@hnd19] sqjobs -L 6907824
  jobid  host   pid state resident  virtual %cpu command
------- ----- ----- ----- -------- -------- ---- -------
6907824 hnd11 22716     R  9411632 25607224  411 /opt/sharcnet/ansys/14.0/v140/a

o SINGLE NODE OUT-OF-CORE THREADED JOB EXAMPLE

To force the job out-of-core a line containing "dspoption,,optimal" should appear in the "dat" file. Note that in this case the virtual memory is 2.5x smaller than the in-core example just above, therefore --mpp=10G would have been sufficient for this job to run:

[roberpj@hnd19] sqsub -r 7d -q threaded -n 8 --mpp 28G -o ofile.%J apdl -i test.dat
submitted as jobid 6907828

[roberpj@hnd19] sqjobs -L 6907828
  jobid  host pid state resident virtual %cpu command
------- ----- --- ----- -------- ------- ---- -------
6907828 hnd12 964     R  7240904 9983392  465 /opt/sharcne

Graphical Use on Visualization Machines

The recommended software for connecting to a sharcnet visualization machine to run ansys interactively is tigervnc client. Install the client on your desktop as described in https://www.sharcnet.ca/help/index.php/Remote_Graphical_Connections#vncviewer_.28Windows.2C_MacOS.2C_and_Fedora.29 While its also possible to connect with xming from a windows machine or use cygwin or even the traditional ssh -Y username@vizN-site.sharcnet.ca unfortunately rendering and button clicks will be very slow compared to connecting with tigervnc.

Compute Canada gra-vdi System

The procedure to start ansys graphical applications on gra-vdi are:

1) Connect with tigervnc to gra-vdi.computecanada.ca 
2) open a terminal window on the desktop
3) module load SnEnv
4) module load ansys/19.1
5) run: fluent-gui, cfx-gui, runwb2-gui or icemcfd-gui

By default the sharcnet license server is used by all the gui programs on the vdi machines.

To specify workbench use the CMC license server answer y at the prompt, otherwise hit any key ie)

module load SnEnv
module load ansys/19.1
runwb2-gui
Press y to use the CMC license server : y

To specify a license server other than sharcnet or CMC set the following exports after loading the module ie)

module load SnEnv
module load ansys/19.1
export ANSYSLI_SERVERS=2325@199.241.162.97
export ANSYSLMD_LICENSE_FILE=6624@199.241.162.97
runwb2
Press y to use the CMC license server : n

Legacy Sharcnet Systems

Starting fluent, cfx or icemcdf in Gui Mode

1) Connect with tigervnc to vdi-centos6.sharcnet.ca or vdi-centos7.sharcnet.ca
2) open a terminal window on the desktop
3) module load ansys/19.1
4) run: fluent-gui, cfx-gui, runwb2-gui or icemcfd-gui

Note1: cfx-gui allows one to start CFX-pre or CFD-post (turbogrid not supported)
Note2: icemcfd-gui command may not work over a xming/putty connections
Note3: ansys should not be run under /home, please use /work or preferably /scratch instead

Starting workbench in Gui Mode

This sections explains howto start up ansys workbench on a sharcnet visualization workstation:

1) Connect to vdi-centos6, viz10-uwo or viz11-uwo using tigervnc as described above 
2) Log into the workstation desktop, open a terminal and run command: module load ansys/17.2   
3) Run command: runwb2  (the gui should appear see Note3 below for exceptions)
4) In the Analysis Systems left hand side menu choose a Fluid Flow such as CFX to start session such as "A Fluid Flow (CFX)"
5) Import External Geometry File or Create a Geometry in Design Modeller for instance by doing Create->Primatives->Sphere
6) Next Start Meshing Program by going Back to the Workbench Gui Screen then under "A Fluid Flow (CFX)" click "Mesh"

Note1: The runwb2 command does not work with ansys/14.0 on sharcnet Any attempt to run the program will result in error messages such as "Unhandled Exception: System TypeInitializationException: An exception was thrown by the type".

Note2: While workbench can be used with Design Explorer and fluent/cfx sharcnet does not have a mechancal license to support this usage. To perform coupled problems you would need to purchase a dedicated license module from ansys for the sharcnet license server, open ticket to request a quote if interested.

Note3: The runwb2 command currently only works on vdi-centos6, viz10-uwo and viz11-uwo and will crash if started on any cluster

Note4: Interactive applications such as runwb2 are automatically killed if left open for more than 24hours on vdi/viz workstations.

General Notes

HowDoI: Query License Usage

The status of the sharcnet ansys license can be checked as shown below. Currently users are limited to consuming 3 aa_r_cfd and 128 aa_r_hpc seats, however when there are many users this setup can result in the license capacity being exceeded and jobs failing on startup. Users are requested to do careful scaling testing to determine the maximum size that results can be obtained in a reasonable amount of time ie) to not use all 128 seats if possible. During some periods of high usage, the license has been exhausted and some jobs failed. Therefore it maybe required to cut back usage from 128 to 96 or less aa_r_hpc seats depending how 2019 progresses.

Graham Cluster

ssh graham.sharcnet.ca
[roberpj@gra-login3:~] lmutil lmstat -c 1055@license3.sharcnet.ca -a | grep Users
Users of aa_mcad:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of aa_r_cfd:  (Total of 25 licenses issued;  Total of 11 licenses in use)  <----
Users of afsp_gui:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of afsp_optigrid:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of afsp_viewmerical:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of aim_mp1:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of aa_r_hpc:  (Total of 512 licenses issued;  Total of 160 licenses in use)  <----

Legacy Sharcnet Systems

ssh vdi-centos6.user.sharcnet.ca
[roberpj@vdi-centos6:~] module load ansys
[roberpj@vdi-centos6:~] ansysstat | grep Users
Users of aa_mcad:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of aa_r_cfd:  (Total of 25 licenses issued;  Total of 11 licenses in use)  <----
Users of afsp_gui:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of afsp_optigrid:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of afsp_viewmerical:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of aim_mp1:  (Total of 25 licenses issued;  Total of 0 licenses in use)
Users of aa_r_hpc:  (Total of 512 licenses issued;  Total of 160 licenses in use)  <----

Issue: command not found

If you try to run an ansys command but get "command not found" you are likely not in the ansys group. To check, run the groups command as shown next. If ansys is not listed then you are not in it! In such case read over the License Section found on the ANSYS sharcnet software page here https://www.sharcnet.ca/my/software

[roberpj@vdi-centos6:~] groups
roberpj abaqus lsdyna ansys ilogcplex starccmplus converge fdtd comsol mode wlu_users etc ...

Issue: Unstable Convergence And Mesh Quality

If your fluent simulation diverges check the mesh by clicking:

Setup --> General --> Report Quality

To check for recommendations click:

Solution --> Run Calculation --> Check Case

Issue: Text File Corruption

Ascii mode (not binary mode) should be used when transferring text files (such as journal files) from a windows machine to SHARCNET otherwise control-M (DOS end of line markers) or control-Z (DOS end of file marker) codes may become invisibly embedded. If such "corrupted" files are used for running jobs (in the queue) unexpected crashes may occur dumping a variety of error messages into the output ofile(s). For instance UDF files might fail to load and compile resulting in an error message such as Error: yourUDFfile.c: line 1: syntax erro followed by hundreds of messages such as Error: chip-exc: function XYZ not found. Or you might see a error messages such as Error: eval: unbound variable or invalid integer and so forth.

The presence of problematic DOS codes can be confirmed by opening your file (typically your journal file) on SHARCNET into a text editor such as vi or nano then looking for a message such as [ Read 6 lines (Converted from DOS format) ] or [dos].

To "uncorrupt" the file back to a unix compatible format run one of the following commands:

dos2unix myfile
     --- OR  ---
sed -i '/\r//g' myfile

then close the file and finally reopen it to check such dos conversion codes no longer exist.

Issue: Jobs Killed When VMEM Limit Exceeded

Some parallel fluent jobs will need fairly large VIRT memory per process (see following top command) versus real RES memory allocation per process. For instance the Bramble Altix has 124gb main memory (plus 72gb swap). When a 8cpu job is started top displays the following briefly then the error message is printed to the output ofile.

[roberpj@bramble:~] sqsub -r 1h --nompirun -q mpi -n 8 -o ofile8cpu fluent 3d test.jou

[roberpj@bramble:~] top
Mem:    125714M total,    30659M used,    95055M free,        0M buffers
Swap:    76294M total,        0M used,    76294M free,    22150M cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
 92820 roberpj   20   0 31.9g  84m  13m R  100  0.1   0:12.80 fluent_mpi.12.1
 92821 roberpj   20   0 31.9g  84m  13m R  100  0.1   0:12.79 fluent_mpi.12.1 
 92815 roberpj   18   0 31.9g  88m  14m R  100  0.1   0:06.66 fluent_mpi.12.1
 92816 roberpj   20   0 31.9g  90m  13m R  100  0.1   0:12.70 fluent_mpi.12.1
 92818 roberpj   20   0 31.9g  82m  14m R  100  0.1   0:12.76 fluent_mpi.12.1 
 92819 roberpj   25   0 31.9g  85m  13m R  100  0.1   0:12.77 fluent_mpi.12.1
 92822 roberpj   20   0 31.9g  88m  14m R  100  0.1   0:12.80 fluent_mpi.12.1
 92817 roberpj   18   0 31.9g  85m  13m R  100  0.1   0:12.76 fluent_mpi.12.1

[roberpj@bramble:~] cat ofile8cpu | grep vmem
auto partitioning mesh by Principal Axes=>> PBS: job killed: vmem 308727316480 exceeded limit 133143986176

o This problem can be mitigated by reducing the number of processors from 8 to 4 which results in a corresponding 50% decrease in VIRT per processor bringing the total from 8x32G=256G=~308727316480 down to a manageable 4x16.2G=64.8G and hence the job runs without tripping the vmem error message this time. Another solution is to use the Silky Altix has double the memory is configured to support much larger jobs.

[roberpj@bramble:~] sqsub -r 1h --nompirun -q mpi -n 4 -o ofile4cpu fluent 3d test.jou

[roberpj@bramble:~] top
Mem:    125714M total,    30824M used,    94890M free,        0M buffers
Swap:    76294M total,        0M used,    76294M free,    22150M cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 90690 roberpj   25   0 16.2g 200m  16m R  100  0.2  10:59.83 fluent_mpi.12.1
 90691 roberpj   25   0 16.2g 207m  16m R  100  0.2  11:00.20 fluent_mpi.12.1
 90692 roberpj   25   0 16.2g 200m  16m R  100  0.2  11:00.34 fluent_mpi.12.1
 90689 roberpj   18   0 16.2g 210m  16m R  100  0.2  10:45.82 fluent_mpi.12.1

Issue: Chip-Exec Function Not Found

If you see the following message when starting a mpi ansys fluent job in the queue, it likely means you have read your dat file before executing the interpreted-functions or compiled-functions line in your journal file. Kill the job, reorder the journal file, then resubmit the job

Parallel variables...
Error: chip-exec: function "myinvel" not found.
Error: chip-exec: function "myinvel" not found.
Error: chip-exec: function "myinvel" not found.
Error: chip-exec: function "myinvel" not found.
Error: chip-exec: function "myinvel" not found.
Error: chip-exec: function "myinvel" not found.
Error: chip-exec: function "myinvel" not found.
Error: chip-exec: function "myinvel" not found.
Done.

Issue: CPP Not Found For Interpreted UDF

This error may appear on Altix systems (such as Silky) if so define explicitly /usr/bin/cpp in the field Define --> User-Defined --> Functions --> Interpreted --> CPP Command Name using fluent-gui, save cas file, then re-submit your job through the queue using sqsub.

Issue: Gui Applications Fail to Start (overquota)

Gui applications will fail to start when your home directory quota is exceeded, according to the "quota" command. The errors messages are not clearly indicative of the problem for instance:

QPixmap: It is not safe to use pixmaps outside the GUI thread.

To fix this, check inside your "~/.ansys" directory for files or directories containing large contents on on the order of 500MB (presumably from previously corrupted gui sessions) and if any are found delete then and retry.

Issue: Gui Applications Fail to Start (corrupted config files)

When trying to start the "runwb2" the gui crashes with error message shown below. The solution appears to be removal of all hidden and/or configuration files from your SHARCNET home directory.

******************************************************************************
*** A fatal error has occurred within AnsysWBU.exe and it must be closed...***
******************************************************************************
Line: 6204
Char: 2
Error: Invalid Key
Code: 80004005
Source: WBControls.WBTabItems.140
Script: var setting = wb.PreferenceMgr.Preference("PID_URL");

Issue: Monitor Solution Residuals For Queue Jobs

Its possible to watch the residuals as your fluent solution progresses while being run in the (text based) batch queue. To do this simply do: tail -f ofileJID# where JID# is the LSF job number assigned by the queue when you submit a job. You can also use the sqjobs command to check the JID# of running or queued jobs.

Issue: Fluent-Gui Sessions Hung On Viz Stations

The situation may occur where fluent-gui session(s) are hung and cannot be killed, typically due to a network interuption. In such cases the ansysstat command will show obsolete session(s) are running on one or more viz stations however there is no way to reconnect to them. To kill off such rogue fluent-gui instances one needs to first find the PID of their respective cortex processes. To do this log into the effected viz station(s) and open a terminal shell and run either:

top -u $USER     or

ps auxw | grep fluent| grep cortex. | grep $USER

From the output, locate the first 3, 4 or 5 digit numbers beside your user name. For example, assume there are 3 sessions hung on viz3-uwaterloo. Runng the following ps command shows the PIDS are 2393, 2409 and 3763 respectively:

[roberpj@viz3-uwaterloo:~] ps auxw | grep fluent| grep cortex. | grep $USER
roberpj    2393 23.6  0.1 231904 76596 ?        Rl   Jun17 261:11 /opt/sharcnet/ansys/14.0/v140/fluent/fluent14.0.0/cortex/lnamd64/cortex.14.0.0 -f fluent -newcx (fluent "3ddp -pshmem  -host -alnamd64 -r14.0.0 -t4 -mpi=pcmpi -path/opt/sharcnet/ansys/14.0/v140/fluent -ssh")
roberpj    2409 99.3  0.1 223600 76452 ?        Rl   Jun11 9719:06 /opt/sharcnet/ansys/14.0/v140/fluent/fluent14.0.0/cortex/lnamd64/cortex.14.0.0 -f fluent -newcx (fluent "3d  -alnamd64 -r14.0.0 -path/opt/sharcnet/ansys/14.0/v140/fluent -ssh")
roberpj    3763 23.8  0.1 231900 76848 ?        Rl   Jun17 262:08 /opt/sharcnet/ansys/14.0/v140/fluent/fluent14.0.0/cortex/lnamd64/cortex.14.0.0 -f fluent -newcx (fluent "3ddp -pshmem  -host -alnamd64 -r14.0.0 -t4 -mpi=pcmpi -path/opt/sharcnet/ansys/14.0/v140/fluent -ssh")

To terminate these fluent-gui cortex session PIDs run the following commands:

kill -9 2393
kill -9 2409
kill -9 3763

Very importantly, check to confirm the jobs were actually killed and hence licenses are no longer being used. Do this by running the ansysstat command again. If it turns out, the jobs did not die, once again login to each viz station, open a terminal window and issue the following more powerful command:

pkill -9 -u $USER

This should kill all processes owned by you including your login shell. Run ansysstat one more time and if the licenses are still being consumed open a ticket since root intervention maybe required ie) if a file system is hung or the workstation is unresponsive and needs to be rebooted.

Issue: CFX Batch Jobs Crash During Startup on Clusters

Users may see error messages such as "Error! ***Memory allocation failed for SetUpCoarseGraph: gdata. Requested size: 222157044 bytesAn error has occurred in cfx5solve:". The solution likey involves increasing the mpp to be greater than the default in a gradual way until the job runs. Such as:

On orca try increasing mpp in 1G increments from 4gb -> ~16gb:
sqsub -r 1h --nompirun -q mpi -n 16 -N 16 --mpp=4G -o ofile.%J cfx whatever.def

If jobs still fail likewise on orca, try hound with 64gb, 96gb or 128gb:
sqsub -r 1h -q threaded -f xeon -n 16 --mpp=64G -o ofile.%J cfx whatever.def 

Issue: Design Modeller GUI Wont Start To Create Geometry In Workbench

When trying to start design modeller by clicking runwb2 -> Cfx -> geometry on a viz station, one only gets an error message that the mesh editor won't start. However other gui's such as fluent-gui, cfx-gui, icemcfd-gui all start normally from within a tigervnc OR ssh -Y connection. The cause is at mininimum most likely corruption of your ~./mw home directory which if necessary can also involve removal of other config directories (which in some cases are known to fix nxclient startup failures) by doing the following agressive removal command. First be sure you dont have any open nxclient (tigervnc) sessions!

/bin/rm -rf .gnom* .gt* .gcon* .gvfs .java .pulse .ansys* .cfx* .local .mw .config .qt .Trash

Issue: CFX-GUI Wont Start Since Im Overquote Somewhere

Its turning out fairly common that when ansys gui programs dont exit cleanly, the will leave very large temporary files in directorys that start with a dot (such as ~/.cfx). To confirm this scan all the .dirs in your home directory using the "du" command on subdirectories shown in the ISSUE9 commend. Likely one of them will turn out to have 900M or 1.1G in it. Once you find out which one the delete it.

cd /home/$USER
du -ksh .gnom* .gt* .gcon* .gvfs .java .pulse .ansys* .cfx* .local .mw .config .qt .Trash .cache | egrep "G|M"
1.1G    .Trash
/bin/rm -rf .Trash

If the problem re-occurs then link the offending into scratch by doing something like:

mkdir /scratch/$USER/.Trash
ln -s /scratch/$USER/.Trash /home/$USER/.Trash 

Issue: Queue Jobs Wont Start Since Im Already Running 4 Jobs?

You were likely using one of the ansys gui programs on one or more viz stations, and it didnt shutdown cleanly leaving running "rogue" processes that are still locking licenses. To fix this you need to determine which viz stations the "rogue" processes are running on, log in there, and kill them. You can submit a ticket to ask for help doing this or just do it yourself as follows:

[roberpj@laptop:~] ssh tope.sharcnet.ca
[roberpj@tope:~] module load
[roberpj@tope:~] module load ansys
[roberpj@tope:~] ansysstat | grep $USER
    roberpj viz11-uwo.sharcnet.ca viz11-uwo.sharcnet.ca (v2011.1010) (license3/1055 3541), start Tue 3/19 23:18
    roberpj viz11-uwo.sharcnet.ca viz11-uwo.sharcnet.ca (v2011.1010) (license3/1055 2816), start Tue 3/19 23:20
    roberpj viz11-uwo.sharcnet.ca viz11-uwo.sharcnet.ca (v2011.1010) (license3/1055 4216), start Wed 3/20 3:28
[roberpj@tope:~] exit
[roberpj@laptop:~] ssh viz11-uwo.sharcnet.ca
[roberpj@viz11-uwo:~] pkill -9 -u $USER

Issue: Fluent-Gui OK Button Freezes Outside Frame Extents

If when using fluent-gui the OK button appears outside the fluent-gui application window (while running inside a nxclient session) then check your screen resolution settings by clicking 'Configure' when first starting nxclient. More generally, should a application frame or title bar freeze try pressing ALT+click to drag it.

Issue: Specify CFX Memory Increase On sqsub Command Line

When jobs crash reporting an insufficient memory allocation error one can specify a memory increase factor when submitting a job to the queue with the sqsub command. The options available can be found by examining the output of the cfx5solve-help command. For instance to apply a Memory Allocation Factor = 1.2 append "-size 1.2". To be more specific and apply a Integer Memory Override only use "-size-ni 1.5x" where the quotes are ommitted in both cases. A variety of other size commands are available on the cfx command line depending on the data type:

sqsub -r 6h --nompirun -q mpi -n 16 --mpp=4G -o ofile.%J cfx test.def -size-nl 10X  (X is multiplier)
sqsub -r 6h --nompirun -q mpi -n 16 --mpp=4G -o ofile.%J cfx test.def -size-nl 10M  (M is Mega)
sqsub -r 6h --nompirun -q mpi -n 16 --mpp=4G -o ofile.%J cfx test.def -size-part 10 (factor assumed)
sqsub -r 6h --nompirun -q mpi -n 16 --mpp=4G -o ofile.%J cfx test.def -size 10      (factor assumed)

As discussed in https://www.sharcnet.ca/Software/Fluent14/help/cfx_mod/i1320139.html we must sometimes increase the CDTYPE stack memory factor. Where CDTYPE is one of CHAR DBLE, INTRE LOGL or REAL to indicate for which data type MAKDAT failed. For instance specfying a larger -size-nl maybe required to improve the LOGL estimate from cfx if the error shown in the following stanza appears. Note however this may only be correctable (as described in the link) by running cfx-gui to then "Re-run CFX-Pre and increase the CDTYPE stack memory factor. Use the Workspace Sizes menu under Advanced Control of CFX-Flow Parameters.".

  +--------------------------------------------------------------------+
  |                *** INSUFFICIENT MEMORY ALLOCATED ***               |
  |                                                                    |
  | ACTION REQUIRED : Increase the logical stack memory size.          |
  |                                                                    |
  | Details :                                                          |
  |   Requested space         :            656 words                   |
  |   Current allocated space :           4000 words        <----- Implies  -size-nl 4K
  |   Current used space      :           3352 words                   |
  |   Current free space      :            648 words                   |
  |   Number of free areas    :              2                         |
  +--------------------------------------------------------------------+

 Details of error:-
 ----------------
 Error detected by routine MAKDAT
 CDANAM =  OUTJOB CDTYPE =  LOGL ISIZE =  656
 CRESLT = FULL

Unfortunately in ansys 14 the -size-nl may not work and one must open the simulation into cfx-gui then enter the "partitioner and/or solver" tabs and set a realistic value such as "Logical memory 200 Kwords" until the above error message above goes away, and once successful something like the following output will be reached:

 +--------------------------------------------------------------------+
 |        Memory Allocated for Run  (Actual usage may be less)        |
 +--------------------------------------------------------------------+

 Allocated storage in:    Kwords
                          Words/Node
                          Words/Elem
                          Kbytes
                          Bytes/Node

 Partition | Real       | Integer    | Character| Logical  | Double
 ----------+------------+------------+----------+----------+----------
         1 |  3489649.0 |   763298.3 |  21477.5 |    200.0 |   1231.3
           |     308.11 |      67.39 |     1.90 |     0.02 |     0.11
           |     327.88 |      71.72 |     2.02 |     0.02 |     0.12
           | 13631441.0 |  2981633.8 |  20974.1 |    195.3 |   9619.8  
           |    1232.46 |     269.58 |     1.90 |     0.02 |     0.87  
 ----------+------------+------------+----------+----------+----------

From an actual case run on sharcnet discussed in tk#21884, the following allocation factors had to be set in both the solver tab AND the partitioner tab as follows. Also shown is the corresponding sqsub command ...

Catalogue Size Override = 1.2x
Character Memory Override = 2x
Logical Memory Override = 200k
Memory Allocation Factor = 1.0
Real Memory Override = 2147m

sqsub -r 7d --nompirun -q mpi -n 8 --mpp=8G -o case3.%J cfx real.def -continue-from-file real.bak -size 1.2

For reference, some links relating to CFX memory allocation are shown here:

o CFX Documentation: CFX-Solver Manager User's Guide
https://www.sharcnet.ca/Software/Fluent14/help/cfx_solv/cfx_solv.html

o 2.2.8. Configuring Memory for the CFX-Solver
https://www.sharcnet.ca/Software/Fluent14/help/cfx_solv/mgr_memoryconfig.html

o 3.2.1.4. Memory Allocated for the Run
https://www.sharcnet.ca/Software/Fluent14/help/cfx_solv/i1299644.html#cfxSolvSolvMemo
1 word = 4 bytes, 1 Kword = 1000 words, 1 Kbyte = 1024 bytes

o 3.2.5.4. Memory Usage Information
https://www.sharcnet.ca/Software/Fluent14/help/cfx_solv/i1379738.html#cfxSolvSolvMemo1

o Chapter 11: Starting the CFX-Solver from the Command Line
https://www.sharcnet.ca/Software/Fluent14/help/cfx_solv/i1304872.html

o Chapter 13: CPU and Memory Requirements
https://www.sharcnet.ca/Software/Fluent14/help/cfx_solv/i1305496.html

o 13.2. Special Partitioner, Solver and Interpolator Executables (-part-large)
https://www.sharcnet.ca/Software/Fluent14/help/cfx_solv/i1305559.html

Issue: Unsteady fluent simulation animations

The goal is to generate animation from large scale simulation run on the clusters. The problem is the Solution Animations defined in the lower Calculation Activities pane are ***NOT*** persistent ie) they are not saved into cas files, hence jobs submitted on the cluster will not have the commands. The workaround involves a two step process 1) define a solution variable(s) such as Vorticity Magnitude to export into iteration .cdat files via the Automatic Export (export-1 CFD Post Compatible) found in the upper Calculation Activities since this information is save-able into a cas file 2) run the simulation to completion via the queue on the cluster to generate the .cdat files 3) Startup cfd-gui -> cfdpost highlight the time steps in Time Step Selector, go to Animation page and select Quick Animation, set up the format and name for animation file then click the Play bottom to start saving the animation file frame by frame into mp4 or mpeg file. Some relevant links are shown below for more reading:

o Ansys 14.0 - Solution Animation Dialog Box & Animation Sequence Dialog Box
https://www.sharcnet.ca/Software/Fluent14/help/flu_ug/flu_ug_calculation_activities_task_page.html
FLUENT User's Guide -> Menu Reference Guide -> Solve Menu -> Solve/Calculation Activities -> Calculation Activities Task Page

o Ansys 14.0 - Animating the Solution
https://www.sharcnet.ca/Software/Fluent14/help/flu_ug/flu_ug_sec_solve_animate.html
FLUENT User's Guide -> Using the Solver -> Animating the Solution

o Ansys 14.0 - Displaying the Preliminary Solution
https://www.sharcnet.ca/Software/Fluent14/help/flu_tg/x1-220002.14.html
FLUENT Tutorial Guide -> Displaying the Preliminary Solution -> Displaying the Preliminary Solution

o Ansys 14.0 - Graphics and Animations Task Page
https://www.sharcnet.ca/Software/Fluent14/help/flu_ug/flu_ug_graphics_animation_task_page.html
FLUENT Tutorial Guide -> Task Page Reference Guide -> Graphics and Animations Task Page

o Ansys 14.0 - Storing Contour Plot Settings
https://www.sharcnet.ca/Software/Fluent14/help/flu_ug/flu_ug_sec_graphics_contours.html#flu_ug_sec_contours_store
Graphics and Animations Task Page -> Graphics and Animations Task Page -> Contours Dialog Box -> Storing Contour Plot Settings

o Ansys 14.0 - Text User Interface (TUI) Commands for /solve/animate
https://www.sharcnet.ca/Software/Fluent14/help/flu_tcl/flu_tcl.html
FLUENT User's Guide -> FLUENT Text Command List -> solve/

o Ansys 14.0 - CFD-Post Quick Animation
https://www.sharcnet.ca/Software/Fluent14/help/cfd_post/i1368277.html
CFD-Post User's Guide -> CFD-Post Tools Menu -> Animation -> Quick Animation
In case file first select a quantity  ie) FIle -> Data File Quantities -> Vorticity Magnitude

Issue: Ill Effects of Round Brackets

Its strongly recommended round brackets not be used in cas/dat/.c/jou file names for instance instead of flat(plate12).cas use flat[plate12].cas, then resubmit your jobs.

Issue: Jobs Fail Due To Fork() Call

Jobs run in the queue that use interpreted UDFs may randomly hang or quit during the startup phase of a job, with the following error message. The solution is to use the compiled udf approach instead ...

define/user-defined/interpreted-functions "myudf.c" "cpp" 10000 no
cpp -I"/opt/sharcnet/ansys/15.07/v150/fluent/fluent15.0.7/src" -I"/opt/sharcnet/ansys/15.07/v150/fluent/fluent15.0.7/cortex/src" -I"/opt/sharcnet/ansys/15.07$
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          red14 (PID 23984)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[red14:23983] 2 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[red14:23983] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[red14:23983] 2 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[red14:23983] 2 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
==============================================================================
Node 7: Process 31979: Received signal SIGSEGV.
==============================================================================
[red14:23983] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork

Issue: Job Runs Slowly and Processor Affinity

Jobs should run properly when something like the following message appears in the output:

 > Note: Rank = 7: Process affinity not being set Machine is already loaded. 
Note: Rank = 0: Process affinity not being set Machine is already loaded. 
Note: Rank = 5: Process affinity not being set Machine is already loaded. 
Note: Rank = 6: Process affinity not being set Machine is already loaded. 
Note: Rank = 4: Process affinity not being set Machine is already loaded. 
Note: Rank = 1: Process affinity not being set Machine is already loaded. 
Note: Rank = 2: Process affinity not being set Machine is already loaded. 
Note: Rank = 3: Process affinity not being set Machine is already loaded.

However if instead something like the following message appears, the job will likely be running prohibitively slow (the cause is currently unknown). In such case one should kill the job and resubmit it. It if continues to be a problem then leave the job running and submit a problem ticket providing output from the 'sqjobs -l jobid' and 'sqjobs -L jobid' commands which will greatly help speed up troubleshooting.

 > sched_setaffinity() call failed: Invalid argument 
sched_setaffinity() call failed: Invalid argument 
sched_setaffinity() call failed: Invalid argument 
Note: Rank = 0: Process affinity not being set Machine is already loaded. 
sched_setaffinity() call failed: Invalid argument 
Note: Rank = 0: Process affinity not being set Machine is already loaded.
Note: Rank = 0: Process affinity not being set Machine is already loaded.
Note: Rank = 0: Process affinity not being set Machine is already loaded.
Multicore processors detected. Processor affinity set!

Issue: Output Files Not Written to Current Directory

The Sample Journal Files above each contain a line starting with file/auto-save/root-name which defines an output filenameprefix. Its possible however the cas file over rides this by hardcoding a /target/output/path/filenameprefix. To resolve this conflict follow Steps 1-5. Output files should then be written into the sqsub job submission directory, typically where the cas & dat files reside.

Step1) Open cas file in fluent-gui
Step2) Click on Solution -> Calculation Activities
Step3) Double Click Autosave (Every Time Steps)
Step4) Remove path and output filename prefix
Step5) Save cas then resubmit job on cluster

Getting More Help

For technical questions regarding how to submit fluent or cfx jobs into the queue on SHARCNET clusters please submit a ticket to the SHARCNET problem tracker being sure to include job numbers of any failed jobs and which cluster you were using. If your having problems with convergence then please first run the "Solve --> Run Calculation --> Check Case" and make refinements to the mesh if necessary.

For additional support setting up simulations using ANSYS products on topics such as: physics and the use of specific simulation models, boundary conditions, modeling assumptions, unexpected and non-physical results and other "non-IT" ANSYS software issues please review the relevant software documentation and speak with your Professor/Supervisor first. If your question still cannot be answered then submit a ticket summarizing your problem and request it be forwarded to SimuTech Group for consideration. ** There is also a quota on the maximum number of files a user can have on any file system. Currently the limit is 1,000,000.

References

o Ansys 18.2.2 - Documentation
https://www.sharcnet.ca/Software/Ansys/18.2.2/

o Ansys 17.2 - Documentation
https://www.sharcnet.ca/Software/Ansys/17.2/

o Ansys 16.2.3 - Documentation
https://www.sharcnet.ca/Software/Ansys/16.2.3/

o Ansys 15.0.7 - Documentation
https://www.sharcnet.ca/Software/Ansys/15.0.7/

o Ansys 14.0 - Documentation
https://www.sharcnet.ca/Software/Ansys/14.0/


o Ansys 15.0.7 - Design Explorer
https://www.sharcnet.ca/Software/Ansys/15.0.7/en-us/help/wb_dx/dxbook.html

o Ansys 14.0 - Fluent UDF Manual
https://www.sharcnet.ca/Software/Ansys/14.0/help/flu_udf/flu_udf.html

o ANSYS 14.0 - Meshing Users Guide (website pdf file)
https://www1.ansys.com/customer/content/documentation/140/wb_msh.pdf

o ANSYS 14.0 - Command-Line Options and Keywords for cfx5solve
https://www.sharcnet.ca/Software/Ansys/14.0/help/cfx_solv/i1304960.html

o ANSYS 14.0 - Configuring Memory for the CFX-Solver
https://www.sharcnet.ca/Software/Ansys/14.0/help/cfx_solv/mgr_memoryconfig.html


o HPC Computing for Mechanical Simulation using ANSYS 2012
http://www.ansys.com/staticassets/ANSYS/staticassets/event/conference/hpc-for-mechanical-ansys.pdf

o ANSYS ICEM_CFD WIKIBOOKS Webpage
https://en.wikibooks.org/wiki/ICEM_CFD

o ANSYS FEA & CFD Webinar Calendar
http://www.simutechgroup.com/Training-Services/fea-cfd-webinar-calendar.html


o See Section: 3. Text User Interface (TUI)
https://www.sharcnet.ca/Software/Fluent14/help/flu_ug/flu_ug.html

o See Section: FLUENT Text Command List ie) --> 3. display/
https://www.sharcnet.ca/Software/Fluent14/help/ai_sinfo/flu_intro.html

o 3.2. Text Prompt System (see note on comma separator)
https://www.sharcnet.ca/Software/Fluent14/help/flu_ug/flu_ug_sec_tui_prompt_system.html

o See 2.4. Running in Batch Mode & 2.4.1. Example: Pressure Calculation on Multiple Files using Batch Mode
https://www.sharcnet.ca/Software/Ansys/17.0/en-us/help/cfd_post/cfd_post.html