{"id":4337,"date":"2022-04-05T14:05:15","date_gmt":"2022-04-05T18:05:15","guid":{"rendered":"https:\/\/labs.icahn.mssm.edu\/minervalab\/?p=4337"},"modified":"2022-04-05T14:05:18","modified_gmt":"2022-04-05T18:05:18","slug":"new-a100-80gb-gpu-nodes","status":"publish","type":"post","link":"https:\/\/labs.icahn.mssm.edu\/minervalab\/new-a100-80gb-gpu-nodes\/","title":{"rendered":"New A100-80GB GPU nodes with 2TB memory are available on Minerva"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;3.22&#8243;][et_pb_row _builder_version=&#8221;3.25&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;3.25&#8243; custom_padding=&#8221;|||&#8221; custom_padding__hover=&#8221;|||&#8221;][et_pb_text _builder_version=&#8221;4.9.0&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; hover_enabled=&#8221;0&#8243; sticky_enabled=&#8221;0&#8243;]<\/p>\n<p style=\"background: white\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">We have added 2 new A100-80GB GPU nodes to the LSF queue. Each node is equipped with 2<\/span> <span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">TB of memory and 7<\/span> <span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">TB of local NVMe PCIe SSD to provide higher performance over the prior generation.<\/span><\/p>\n<p style=\"background: white\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0<\/span><\/p>\n<p style=\"background: white\"><b><u><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: #201f1e;background: white\">What are the A100-80GB GPU nodes on Minerva?<\/span><\/u><\/b><\/p>\n<p style=\"background: white\"><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: #201f1e;background: white\">8 A100 GPUs<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: #201f1e;background: white\">\u00a0in 2 nodes\u00a0<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 11.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">64 Intel Xeon Platinum 8358 2.6 GHz CPU Processors per node,\u00a0<b>2<\/b><\/span><b> <\/b><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">TB memory per node<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">, for a total of 128 CPU cores<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 11.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">7.68<\/span><\/b><b> <\/b><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">TB NVMe PCIe SSD\u00a0<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0(7.0TB usable) per node, which can deliver a sustained read-write speed of 3.5 GB\/s in contrast with SATA SSDs that limit at 600 MB\/s<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 11.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">4 A100 GPUs per node,\u00a0<b>80 GB of memory for each GPU,\u00a0<\/b>for a total 320 GB per node<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 11.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">The A100 is connected via NVLink<\/span><\/p>\n<p style=\"background: white\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0<\/span><\/p>\n<p style=\"background: white\"><b><u><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: #201f1e;background: white\">How to submit jobs to the A100-80GB GPU nodes?<\/span><\/u><\/b><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 10.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">A100-80GB GPU nodes are available in the GPU queue ( use the LSF flag\u00a0<b>&#8220;-q gpu&#8221;\u00a0<\/b>).<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 10.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">To submit your jobs to those A100-80GB GPU nodes, flag &#8220;<\/span><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: red;background: white\">-R a10080g<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">&#8221; is required.\u00a0 I.e., add\u00a0<\/span><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: red;background: white\">\u00a0#BSUB -R a10080g\u00a0<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">to your LSF script or\u00a0<\/span><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: red;background: white\">-R a10080g<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0to your LSF command line. For example, the following requests 1 GPU card, 8 CPUs and 256GB of memory for 1hr:<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"color: #201f1e\">\u00a0<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: #CCCCCC\">bsub -P acc_xxx -q gpu -n 8 -R rusage[mem=32000] -R a10080g\u00a0-R rusage[ngpus_excl_p=1] -R span[hosts=1]\u00a0-W 01:00 -Is \/bin\/bash<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"color: #201f1e\">\u00a0<\/span><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 10.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">Note, the gpu queue also contains other GPU nodes with V100 and A100 GPU cards. You can access those resources with the corresponding flags &#8220;-R v100&#8221;, &#8220;-R a100&#8221;. If GPU model flag is not specified, your job will start on the earliest available GPU nodes.<\/span><\/p>\n<p style=\"background: white\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0<\/span><\/p>\n<p style=\"background: white\"><b><u><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: #201f1e;background: white\">How to use the ssd on the A100-80GB GPU nodes?<\/span><\/u><\/b><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 10.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">The slink<b>\u00a0<\/b><\/span><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: red;background: white\">\/ssd<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0points to the local\u00a0NVMe SSD storage. You can specify \/ssd in your job script and direct your temporary files there. At the end of your job script, <b>please remember to clean up your temporary files.<\/b>\u00a0<\/span><\/p>\n<p style=\"background: white\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0<\/span><\/p>\n<p style=\"background: white\"><b><u><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: #201f1e;background: white\">What cuda version is supported on the A100-80GB GPU nodes?<\/span><\/u><\/b><\/p>\n<p style=\"margin-left: .5in;text-indent: -.25in;background: white\"><span style=\"font-size: 10.0pt;font-family: Symbol;color: black\">\u00b7<\/span><span style=\"font-size: 7.0pt;color: black\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">Cuda 11.x or later is supported on those A100-80GB nodes. Please load the cuda module by\u00a0<\/span><b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: red;background: white\">ml cuda\/11.1\u00a0 or ml cuda\u00a0<\/span><\/b><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">( cuda\/11.1 is the default version currently)<\/span><\/p>\n<p style=\"background: white\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">\u00a0<\/span><\/p>\n<p class=\"xmsonormal\"><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: black;background: white\">If you have any question on this, please send us a ticket at\u00a0<\/span><span style=\"font-size: 11.0pt;font-family: 'Calibri',sans-serif;color: #1155cc;background: white\"><a href=\"mailto:hpchelp@hpc.mssm.edu\" target=\"_blank\" rel=\"noopener\">hpchelp@hpc.mssm.edu<\/a><\/span><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221;][et_pb_button button_url=&#8221;https:\/\/labs.icahn.mssm.edu\/minervalab&#8221; button_text=&#8221;Back to HPC&#8221; button_alignment=&#8221;center&#8221; _builder_version=&#8221;4.9.0&#8243; _module_preset=&#8221;default&#8221; custom_button=&#8221;on&#8221; button_bg_color=&#8221;#d80b8c&#8221; background_layout=&#8221;dark&#8221;][\/et_pb_button][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We have added 2 new A100-80GB GPU nodes to the LSF queue. Each node is equipped with 2 TB of memory and 7 TB of local NVMe PCIe SSD to provide higher performance over the prior generation. \u00a0 What are the A100-80GB GPU nodes on Minerva? 8 A100 GPUs\u00a0in 2 nodes\u00a0 \u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a064 Intel Xeon Platinum [&hellip;]<\/p>\n","protected":false},"author":600,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-4337","post","type-post","status-publish","format-standard","hentry","category-minerva"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/posts\/4337","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/users\/600"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/comments?post=4337"}],"version-history":[{"count":6,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/posts\/4337\/revisions"}],"predecessor-version":[{"id":4344,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/posts\/4337\/revisions\/4344"}],"wp:attachment":[{"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/media?parent=4337"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/categories?post=4337"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/labs.icahn.mssm.edu\/minervalab\/wp-json\/wp\/v2\/tags?post=4337"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}