高效能運算中心

概覽

CPOS IT o每管理高等 Pe高效能運算設施（HPCF），由 48 節點節點 6、848 CPU 核心, 124 顯示卡， 54 TB 記憶體以及百億位元組級全快閃儲存系統，部署於 400GB 高速網路之上。

目標使用者

香港大學教職員及學生如需進行PanorOmics研究所需的電腦服務

服務時間

除系統維護期間外，通常提供全年無休的服務

服務 200名以上研究人員

關於組學研究

600+ 軟體/工具

豐富的軟體庫

387,000+ 已完成專案

過去 12 個月

高效能運算叢集 (HPCF3)

🖥️ HPCF3 GPU 試用優惠 – 現已開放免費體驗！

➡️ 點擊此處查看更多詳情。

硬體與存取

🖥️HPCF3 叢集概覽

可透過 SSH 並使用以下主機名稱連線至 HPCF3 登入節點：

hpcf3.staging1.cpos.hku.hk

點擊此處查看更多詳情。

🧰作業系統與軟體

所有 HPCF3 伺服器均運行於Rocky Linux 8 系統。
系統已預先安裝多種常用的生物資訊學軟體，所有使用者皆可使用。

⚙️服務

使用者透過叢集的作業排程系統OpenPBS 提交作業。
執行過程僅透過 PBS 作業在運算節點上進行。

➡️禁止直接登入運算節點。

👤使用者帳戶設定

如需申請使用者帳戶，請聯絡 itsupport.cpos@hku.hk。

所有帳戶申請均須經 CPOS 核准。
經核准的使用者將獲得一個 Linux 帳戶，用以存取該叢集。
每個帳戶預設提供100 GB 儲存空間；可申請額外配額（視供應情況而定，並需支付費用）。
每個帳戶均分配給特定個人使用——嚴禁共用帳戶。
雖然儲存系統已透過硬體冗餘機制受到保護，但仍強烈建議使用者定期自行備份資料。
使用者應熟悉 Linux/UNIX 環境。ITS 提供一份參考指南。

🔐叢集登入

使用者可透過 SSH 遠端連線，並使用如PuTTY 等客戶端軟體，該軟體可於以下網址取得：http://www.chiark.greenend.org.uk/~sgtatham/putty/

連線詳細資訊

若要連線至 HPCF3 的登入節點，請使用 SSH 客戶端，並依照以下步驟開啟 SSH 終端機連線：

主機名稱：hpcf3.staging1.cpos.hku.hk
使用者名稱：

🔒網路需求

基於安全考量，HPCF3 伺服器僅限於香港大學校內網路方可存取。

校外使用者必須先透過HKUVPN 連線。
詳情：https://its.hku.hk/services/network-connectivity/hkuvpn/

📦資料傳輸

🪟 Windows 使用者

對於 Windows 使用者而言，有幾款免費且具備圖形化介面的安全檔案傳輸客戶端，例如 WinSCP 或FileZilla。您可以透過這些客戶端，直接在叢集前端節點與您的桌上型電腦或筆記型電腦之間進行檔案拖放操作。

🍏 Mac / 🐧 Linux 使用者

在 Mac 或 Linux 系統上，命令列 scp 客戶端通常會隨作業系統一併安裝。您可以透過本機上運行的終端機來使用它。例如，假設您使用的是 Mac，且本機上有一份名為 myfile 的檔案，您想將它複製到叢集上的家目錄中。

範例：

這會複製 myfile 進入該目錄 mydir/ 位於您的 HPCF3 主目錄下。

與任何將檔案或目錄作為參數的 Unix 指令相同，無論是來源還是目標的檔案或目錄，皆可指定為絕對路徑（以斜線開頭）或相對路徑（不以斜線開頭）。對於本機檔案或目錄，相對路徑是以當前工作目錄為基準；對於遠端檔案或目錄，相對路徑則是以遠端主機上的家目錄為基準。

📡批量資料傳輸指引

為避免影響校園網路效能：

往返香港大學網絡外部的傳輸量若超過 30 GB，應於辦公時間以外進行。
若傳輸量超過 500 GB，請通知 itsupport.cpos@hku.hk ，以便我們能與資訊科技服務處（ITS）協調處理。

⚠️ 若未就大規模資料傳輸提出通知，ITS 可能會暫時將叢集伺服器從網際網路隔離。

指控

點擊此處查看更多詳情。

充電政策

👤HPCF 使用者帳戶

每個 HPCF 使用者帳戶將按月收取費用，用於支付該帳戶在我們叢集中的日常支援與維護服務。
計費週期自每月1日起算。
新用戶帳戶將從下一個計費週期開始收費。例如：若新帳戶已於 28 日準備就緒並已向用戶發送電子郵件通知，則 29 日不會收取費用，但將從下個月 1 日起開始收費。
每個 HPCF 使用者帳戶必須實際使用並繳費至少一個月。不允許短期使用。
若要刪除使用者帳戶，將於該計費週期的最後一天執行，且在此期間仍會繼續計費。

📦使用者主頁資料儲存

使用者主目錄資料儲存空間（/home/）將根據所分配的磁碟配額，以 100GB 為單位（1KB = 1000 位元組）進行每月計費。
計費週期自每月1日起算。
若將新用戶帳戶設為家庭儲存方案，相關費用將自下一個計費週期起，與新用戶帳戶費用一併收取。
若未特別指定，預設情況下每個新使用者帳戶將擁有100GB 的磁碟配額。
若使用者提出申請並抄送 PI，將為其增加磁碟配額，且增加後的磁碟配額將自當前計費週期起開始計費。
若提出請求，磁碟配額的調降將於計費週期的最後一天進行，且在此期間仍會繼續計費。

📦使用者群組資料儲存

使用者群組資料儲存空間（/home/groups/）將依據所分配的磁碟配額，以 100GB 為單位（1KB = 1000 位元組）進行每月計費。
計費週期自 每月1日起算。
磁碟配額的增加將於小組協調員提出申請並抄送計畫主持人後進行，而調升後的磁碟配額將自當前計費週期起開始計費。
若提出請求，磁碟配額的調降將於計費週期的最後一天進行，且在此期間仍會繼續計費。

共置服務

對於新的共置設備，將自用戶接受測試（UAT）通過後的下一個計費週期起收取每月支援費。

重要注意事項

CPOS 可能會根據實際使用情況及成本回收需求，不時調整收費標準。任何變更實施前，均會事先通知。

就共置設備而言，所有 FEO 設備將由 CPOS 管理。共置伺服器可整合至 HPCF 叢集並設定為運算節點。將為在共置伺服器上執行的任務建立獨立的任務佇列。請注意，CPOS 資訊科技部會集中管理 HPCF 內的所有系統資源（包括共置設備），並可能不時為任何閒置資源安排工作排程，且無需事先通知，以確保資源能有效率地運用，並造福香港大學社群。

環境模組

HPCF3 叢集採用「環境模組」系統來管理集中安裝的軟體。目前已提供多種常用生物資訊工具的版本。
請注意：環境模組僅支援HPCF3，HPCF2 則不提供此功能。

顯示可用模組

環境模組讓使用者能夠透過簡單的指令，輕鬆載入、卸載以及在不同的軟體環境之間切換。

若要查看 HPCF3 上有哪些軟體套件及版本可用，請執行：

模組可用

此指令會列出所有目前已安裝且可供使用的模組。

已安裝的軟體

工作排程

高效能運算叢集 (HPCF2)

硬體與存取

HPCF2 叢集

主節點為 Omics，可透過 SSH 連線至主機名稱hpcf2.staging1.cpos.hku.hk。

HPCF2 叢集由 8 個運算節點組成，其系統配置如下：


伺服器	CPU 品牌與型號	CPU 數量	每顆 CPU 的核心數	每台伺服器的實體核心數	每台伺服器的 CPU 執行緒數	記憶體 (GB)
hpch01	Intel Xeon E5-2650 v4 2.2GHz	2	12	24	48	256
hpch02	Intel Xeon E5-2650 v4 2.2GHz	2	12	24	48	256
hpch03	Intel Xeon E5-2650 v4 2.2GHz	2	12	24	48	256
hpch04	Intel Xeon E5-2650 v4 2.2GHz	2	12	24	48	256
hpch05	Intel Xeon E5-2650 v4 2.2GHz	2	12	24	48	256
hpch06	Intel Xeon E5-2650 v4 2.2GHz	2	12	24	48	256
hpch07	Intel Xeon E5-2650 v4 2.2GHz	2	12	24	48	256
hpch08	Intel Xeon E5-2683 v4 2.1GHz	2	16	32	64	512
			總計：	200	400	2,304

作業系統與系統軟體

HPCF2 中的所有伺服器均運行 CentOS 7 Linux 作業系統，並已安裝主要生物資訊軟體，供所有使用者隨時使用。

叢集節點

主節點

服務

透過 SSH 在互動模式下編譯/執行命令列程式
將批次工作提交至工作佇列系統
透過 FileZilla 等 SFTP 客戶端進行檔案傳輸

防治措施

主節點應僅用於上述服務，而使用者的分析工作則應透過將工作提交至工作佇列系統，在運算節點上執行。特別是，主節點不應被用來執行那些資源消耗量大的工作。因此，在主節點上實施了以下控制措施：

使用者所有工作在任何時刻的 CPU 使用率總和上限為 400%（即同時執行四項工作，每項工作的每個執行緒各佔用 100% 的 CPU 時間；或執行一項工作，該工作包含四個執行緒，每個執行緒各佔用 100% 的 CPU 時間）。
每位使用者任務的最大執行時間為 10 分鐘。
使用者所有工作在任何時刻所佔用的主記憶體總量上限為 10GB。

主節點將檢查每位使用者是否超出限制，若發現超出限制，將依照使用量由高至低的順序終止使用者工作，直至資源使用量降至可接受的限制範圍內。系統將自動向使用者發送通知電子郵件，並附上已終止程序的詳細資訊。

運算節點

服務

使用者可透過工作排程系統，從主節點提交 PBS 工作以在運算節點上執行。因此，使用者無法直接登入運算節點來執行工作。

使用者帳戶設定

如需設定使用者帳戶，請聯絡itsupport.cpos@hku.hk。
帳戶核准須經 CPOS 審核。
帳戶申請經核准後，每位使用者將獲分配一個 Linux 使用者帳戶，以便存取該叢集。
每位使用者初始預設將獲分配一個具備 100GB 磁碟空間的 Linux 使用者帳戶。使用者可申請額外磁碟空間，惟須視實際可用空間及相關費用而定。
每個使用者帳戶應由指定人員使用，並由該人員承擔相關責任。嚴禁共用帳戶。
使用者可申請額外的磁碟配額，惟須視實際可用空間及相關費用而定。
雖然儲存系統已透過硬體冗餘機制受到保護，但為了讓您更安心，我們強烈建議您定期將叢集中的重要檔案備份至本機硬碟。
使用者應熟悉 Linux/UNIX 軟體環境。您可參考 ITS 編製的 Unix 使用者指南（點此查閱）。

叢集登入

使用者可透過 SSH 連線，使用 PUTTY 等 SSH 客戶端（SSH 客戶端連結：http://www.chiark.greenend.org.uk/~sgtatham/putty/）遠端連線至 HPCF2 的命令列終端機。

若要連線至 HPCF2 的主節點，請使用 SSH 客戶端，並依照以下步驟開啟 SSH 終端機連線：

 主機名稱：hpcf2.staging1.cpos.hku.hk

請使用與當前 HPCF 中相同的使用者帳戶登入。

請注意：基於安全考量，僅限香港大學（HKU）校內網路的電腦可連線至伺服器。若使用校外電腦，使用者必須先連線至 HKUVPN，方可連線至伺服器。詳情請參閱https://its.hku.hk/services/network-connectivity/hkuvpn/

資料傳輸

對於 Windows 使用者而言，有幾款免費且具備圖形化介面的安全檔案傳輸客戶端，例如 WinSCP 或 FileZilla（http://filezilla-project.org/）。您可以透過這些客戶端，直接在叢集前端節點與您的桌上型電腦或筆記型電腦之間進行檔案拖放操作。

在 Mac/Linux 電腦上，開啟「終端機」應用程式，然後輸入

 scp myfile userid@hpcf2.staging1.cpos.hku.hk:mydir/

這會將本機檔案 myfile 複製到叢集上您的家目錄下的 mydir/ 子目錄中，並保留 myfile 這個檔案名稱。如同任何將檔案或目錄作為參數的 Unix 指令，無論是來源還是目標的檔案或目錄，都可以指定為絕對路徑（以斜線開頭）或相對路徑（不以斜線開頭）。對於本機檔案或目錄，相對路徑是以當前工作目錄為基準；對於遠端檔案或目錄，相對路徑則是以遠端主機上的家目錄為基準。

批量資料傳輸

為避免影響香港大學校園網絡的其他用戶，香港大學資訊科技服務處建議，在叢集與香港大學網絡外的電腦之間進行大量數據（>30GB）的網絡傳輸時，應安排在非辦公時間進行。若需傳輸的數據超過 500GB，應事先通知 itsupport.cpos@hku.hk，以便我們將此資訊轉達給資訊科技服務處（ITS），使其能預先規劃網路流量。若未事先通知，ITS 可能會封鎖我們的伺服器連線至網際網路。

除役

HPCF2 將於 2026 年 10 月前全面退役

指控

點擊此處查看更多詳情。

充電政策

HPCF 使用者帳戶

每個 HPCF 使用者帳戶將按月收取費用，用於支付該帳戶在我們叢集中的日常支援與維護服務。
計費週期自每個月的第1天開始。
新用戶帳戶將從下一個計費週期開始收費。例如：若新帳戶已於 28 日準備就緒並已向用戶發送電子郵件通知，則 29 日不會收取費用，但將從下個月 1 日起開始收費。
每個 HPCF 使用者帳戶必須實際使用並繳費至少一個月。不允許短期使用。
若要刪除使用者帳戶，將於該計費週期的最後一天執行，且在此期間仍會繼續計費。

使用者主頁資料儲存

使用者主目錄資料儲存空間（/home/）將根據所分配的磁碟配額，以 100GB 為單位（1KB = 1000 位元組）進行每月計費。
計費週期自每個月的第1天開始。
若將新用戶帳戶設為家庭儲存方案，相關費用將自下一個計費週期起，與新用戶帳戶費用一併收取。
若未特別指定，預設情況下每個新使用者帳戶將擁有 100GB 的磁碟配額。
若使用者提出申請並抄送 PI，將為其增加磁碟配額，且增加後的磁碟配額將自當前計費週期起開始計費。
若提出請求，磁碟配額的調降將於計費週期的最後一天進行，且在此期間仍會繼續計費。

使用者群組資料儲存

使用者群組資料儲存空間（/home/groups/）將依據所分配的磁碟配額，以 100GB 為單位（1KB = 1000 位元組）進行每月計費。
計費週期自每個月的第1天開始。
磁碟配額的增加將於小組協調員提出申請並抄送計畫主持人後進行，而調升後的磁碟配額將自當前計費週期起開始計費。
若提出請求，磁碟配額的調降將於計費週期的最後一天進行，且在此期間仍會繼續計費。

共置服務

對於新的共置設備，將自用戶接受測試（UAT）通過後的下一個計費週期起收取每月支援費。

重要注意事項

CPOS 可能會根據實際使用情況及成本回收需求，不時調整收費標準。任何變更實施前，均會事先通知。

環境模組

系統採用名為「Environment Modules」的機制來管理集中安裝的軟體。目前已安裝了數個常用生物資訊學軟體的版本。請注意，此功能僅在新的 HPCF2 叢集上提供，舊版 HPCF（statgenpro）則不提供此功能。

顯示可用模組

環境模組讓使用者能夠透過幾個簡單的指令，在不同軟體環境之間動態切換。若要了解叢集中有哪些軟體套件及其版本，請使用「module avail」指令列出所有可用的模組環境：

 [itsupport@omics ~]$module avail
 
 --------------------------------------------- /software/Modules/modulefiles ----------------------------------------------
  ANNOVAR/2017Jul16             HTSeq/0.9.1            (D)    STAR/2.5.2a               bwa/0.7.12
  BEDTools/2.12.0               MACS/2.0.10-2012.06.06        STAR/2.5.3a        (D)    bwa/0.7.17        (D)
  BEDTools/2.17.0               MACS/2.1.0-2015.04.20  (D)    TrimGalore/0.4.1          cutadapt/1.8.1
  BEDTools/2.27.1        (D)    MUMmer/3.22                   TrimGalore/0.4.5   (D)    cutadapt/1.15     (D)
  BioPerl/1.7.2                 MUMmer/3.23            (D)    Trimmomatic/0.33          idba/1.1.3
  Canu/1.5                      NCBI-blast/2.2.27+            Trimmomatic/0.36   (D)    java/7.0_25
  Canu/1.6               (D)    NCBI-blast/2.7.1+      (D)    VerifyBamID/1.1.2         java/7.0_80
  CellRanger/2.0.1              Oncotator/1.9.6.1             VerifyBamID/1.1.3  (D)    java/8.0_161      (D)
  CellRanger/2.1.0       (D)    PEAR/0.9.10                   bamUtil/1.0.13            java/9.0.4
  DESeq2/1.10.1                 PEAR/0.9.11            (D)    bamUtil/1.0.14     (D)    miniconda2/4.3.31
  DESeq2/1.18.1          (D)    Perl/5.26.1                   bamtools/2.3.0            muTect/1.1.4
  EBSeq/1.9.3                   Picard/2.0.1                  bamtools/2.5.1     (D)    muTect/1.1.5      (D)
  EBSeq/1.18             (D)    Picard/2.17.4          (D)    bcl2fastq/2.19            python2/2.7.14
  FASTX-toolkit/0.0.13.2        QIIME/1.9.1                   bcl2fastq/2.20     (D)    python3/3.6.4
  FASTX-toolkit/0.0.14   (D)    QIIME2/2017.12                bedGraphToBigWig/4        samtools/0.1.18
  FastQC/0.11.2                 R/3.2.5                       bismark/0.14.3            samtools/1.3
  FastQC/0.11.7          (D)    R/3.4.3                (D)    bismark/0.19.0     (D)    samtools/1.6      (D)
  GenomeAnalysisTK/3.5          RNAmmer/1.2                   bowtie/1.0.0              strelka/1.0.15
  GenomeAnalysisTK/3.7          RSEM/1.2.31                   bowtie/1.2.2       (D)    strelka/2.8.4     (D)
  GenomeAnalysisTK/3.8   (D)    RSEM/1.3.0             (D)    bowtie2/2.2.5             tRNAscan-SE/1.3.1
  HOMER/4.9                     SPAdes/3.10.0                 bowtie2/2.3.4      (D)
  HTSeq/0.6.1                   SPAdes/3.11.1          (D)    bwa/0.6.2
  
  -------------------------------------- /opt/Lmod/7.7.14/lmod/lmod/modulefiles/Core ---------------------------------------
  lmod/7.7.14    settarg/7.7.14
  
  其中：
  D:  預設模組

載入模組

接著，您可以透過載入對應的模組來執行這些集中安裝的軟體套件。例如，若您想執行 BWA 比對工具，可以使用「module load」指令來啟用 bwa 模組環境：

 [itsupport@omics ~]$ module load bwa
已載入 bwa/0.7.17

接著，系統將載入 bwa 模組的預設版本。之後，您即可使用 bwa 工具進行後續的資料分析：

 [itsupport@omics ~]$ bwa
 
 程式：bwa（透過 Burrows-Wheeler 轉換進行比對）
 版本：0.7.17-r1188
 聯絡人：Heng Li 
 
 用法： bwa  [選項]
 
 指令：index         將序列索引至 FASTA 格式
        mem           BWA-MEM 演算法
        fastmap       識別超最大精確比對結果
 ...

列出已載入的模組

若要列出目前已載入的模組，請使用「module list」指令：

 [itsupport@omics ~]$ module list
 目前已載入的模組：
   1) bwa/0.7.17

卸載模組

當您不再需要執行該軟體時，請使用「module unload」指令將其從當前工作階段中卸載：

 [itsupport@omics ~]$ module unload bwa
 bwa/0.7.17 已卸載

搜尋模組的可用版本

如果您想使用預設版本以外的其他版本，可以搜尋並指定該軟體模組的特定版本：

 itsupport@omics ~]$module avail bwa
 ------------------------------- /software/Modules/modulefiles -------------------------------
    bwa/0.6.2    bwa/0.7.12    bwa/0.7.17 (D)
 
 說明：
  D:  預設模組

載入模組的特定版本

載入 bwa 的舊版本（0.7.12）：

 [itsupport@omics ~]$ module load bwa/0.7.12
 bwa/0.7.12 已載入

現在您可以執行這個舊版的 bwa：

 [itsupport@omics ~]$ bwa
程式：bwa（透過 Burrows-Wheeler 轉換進行比對）
版本：0.7.12-r1039
聯絡人：Heng Li 
 
 用法： bwa  [選項]
 指令：index         將序列以 FASTA 格式建立索引
          mem           BWA-MEM 演算法
 ...

載入模組的替代版本

 [itsupport@omics ~]$ module load bowtie2/2.2.5
 bowtie2/2.2.5 已載入
 [itsupport@omics ~]$ module load bowtie2/2.3.4
 bowtie2/2.2.5 已卸載
 bowtie2/2.3.4 已載入
 以下模組已因版本變更而重新載入：
   1) bowtie2/2.2.5 => bowtie2/2.3.4

模組指令快速參考


指令	說明
模組清單	列出目前已載入的模組
模組可用	顯示可供載入的模組
模組可用 [名稱]	僅顯示適用於名為 [name] 的應用程式的模組
模組關鍵字 [字詞1] [字詞2] …	顯示符合搜尋條件的可用模組
模組 whatis [模組名稱]	顯示特定模組的說明
模組說明 [模組名稱]	顯示說明資訊
module load [模組名稱]	根據模組檔案設定您的環境
module load [模組名稱]/[版本]	載入模組的特定版本
module load [模組 A] [模組 B] …	載入模組清單
卸載模組 [模組名稱]	還原由模組檔案執行的設定
卸載模組 [模組 A] [模組 B] …	載入一組模組
模組交換 [模組 A] [模組 B]	卸載模組檔案 A 並載入模組檔案 B
清除模組	卸載所有目前已載入的模組

在 shell 腳本中使用模組

以下是在 shell 腳本中使用模組的範例：

 #!/bin/bash
 
 # 首先進行清理
 module purge
 
 # 載入本腳本所需的模組
 module load bwa/0.7.12
 
 # 執行資料分析
 bwa mem reference.fa reads1.fq reads2.fq > aligned_pairs.sam

載入具有先決條件的模組

某些模組可能需要依賴其他模組才能正常運作，當載入此類模組時，系統會提示使用者載入所需的先決條件模組。

 [itsupport@omics ~]$module load bismark
 Lmod 偵測到以下錯誤：無法載入模組 "bismark/0.19.0"
 因為未載入以下模組：
  bowtie2
 
 在處理以下模組時：
   模組全名  模組檔案名稱
   ---------------  ---------------
   bismark/0.19.0   /software/Modules/modulefiles/bismark/0.19.0.lua

 [itsupport@omics ~]$module load bowtie2
 bowtie2/2.3.4 已載入
 [itsupport@omics ~]$module load bismark
 bismark/0.19.0 已載入

已安裝的軟體

軟體

以下列出的軟體套件已預先在 HPCF2 叢集上集中安裝，您只需透過 module 指令載入即可使用：


軟體名稱	HPCF2 上的模組名稱	已安裝的版本	首頁	說明與功能
7-Zip	7-Zip	16.02	http://www.7-zip.org/	檔案壓縮工具，用於將一組檔案封裝在稱為「壓縮檔」的壓縮容器中。
ANNOVAR	ANNOVAR	2020年6月8日	http://annovar.openbioinformatics.org/	為從各種基因組中檢測到的基因變異添加註解
BamTools	bamtools	2.3.0 2.5.1	https://github.com/pezmaster31/bamtools	供終端使用者處理 BAM 檔案的工具集
BamUtil	bamUtil	1.0.13 1.0.14	https://genome.sph.umich.edu/wiki/BamUtil	供終端使用者操作 BAM/SAM 檔案的程式
bcl2fastq	bcl2fastq	2.19 2.20	https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html	將資料進行解復用，並將 Illumina 定序系統所產生的 BCL 檔案轉換為標準 FASTQ 檔案格式，以供後續分析使用
bedGraphToBigWig	bedGraphToBigWig	4	https://www.encodeproject.org/software/bedgraphtobigwig/	將 bedGraph 轉換為 bigWig 檔案
BEDTools	BEDTools	2.12.0 2.17.0 2.27.1	http://bedtools.readthedocs.io/en/latest/	一套實用工具集，適用於各種基因組分析任務，例如合併與重新排列來自多種常用基因組檔案格式（如 BAM、BED、GFF/GTF、VCF）的多個檔案中的基因組區間
BioPerl	BioPerl	1.7.2	http://bioperl.org/	適用於生物資訊學、基因組學及生命科學的開源 Perl 工具
俾斯麥	俾斯麥	0.14.3 0.19.0 0.20.0	https://www.bioinformatics.babraham.ac.uk/projects/bismark/	一種用於比硫化物轉換後序列讀取圖譜繪製及釐清胞嘧啶甲基化狀態的工具
領結	領結	1.0.0 1.2.2 2.2.5 2.3.4 2.3.4.1 2.3.4.3 2.4.2	http://bowtie-bio.sourceforge.net/index.shtml	一款超高速且記憶體效率高的短讀取對齊工具
Bowtie 2	bowtie2	2.2.5 2.3.4 2.3.4.1 2.3.4.3 2.4.2	http://bowtie-bio.sourceforge.net/bowtie2/index.shtml	一款用於將定序讀取序列比對至長參考序列的超高速且記憶體效率高的工具
BWA	bwa	0.6.2 0.7.12 0.7.17	http://bio-bwa.sourceforge.net/	透過 Burrows-Wheeler 轉換，將短讀取序列比對至帶索引的參考序列
卡努	卡努	1.5 1.6 1.9	http://canu.readthedocs.io/	一款適用於大小基因組的單分子序列組裝工具
CellRanger	CellRanger	2.0.1 2.1.0 2.2.0 3.0.2 3.1.0 4.0.0 6.1.2	https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger	一套分析流程，用於處理 Chromium 單細胞 RNA-seq 輸出結果，以進行讀取序列比對、生成基因-細胞矩陣，並執行聚類與基因表達分析
cutadapt	cutadapt	1.16 1.8.1 2.3 3.4	https://github.com/marcelm/cutadapt	從高通量定序資料中移除接頭序列
DESeq2	DESeq2	1.10.1 1.18.1	https://bioconductor.org/packages/release/bioc/html/DESeq2.html	基於負二項分布的基因差異表達分析
EBSeq	EBSeq	1.9.3 1.18	http://bioconductor.org/packages/release/bioc/html/EBSeq.html	一個用於分析 RNA-seq 數據中基因與亞型差異表達的 R 套件
FastQC	FastQC	0.11.8 0.11.9	https://www.bioinformatics.babraham.ac.uk/projects/fastqc/	一種用於高通量序列資料的品質控制工具
FASTX-Toolkit	FASTX 工具包	0.0.14 1.3.0	http://hannonlab.cshl.edu/fastx_toolkit/	一套用於短讀取 FASTA/FASTQ 檔案前處理的命令列工具集合
基因組分析工具包 (GATK)	GenomeAnalysisTK	3.7 3.8.1.0 4.1.9.0 4.2.0.0	https://software.broadinstitute.org/gatk/	多種工具，主要用於變異體發現與基因分型
荷馬	荷馬	4.9 4.11	http://homer.ucsd.edu/homer/	HOMER（Motif EnRichment 的超幾何優化）是一套用於 Motif 發現與次世代定序分析的工具套件
HTSeq	HTSeq	0.6.1 0.9.1	https://htseq.readthedocs.io/en/release_0.9.1/	一個 Python 套件，提供處理高通量定序實驗數據的基礎架構
IDBA	idba	1.1.3	https://github.com/loneknightpy/idba	用於第二代定序讀取資料的基本迭代式德布魯因圖組裝器
Java	Java	8.0_161 9.0.4 10.0.2 11.0.9 12.0.2 13.0.2	https://www.java.com	一種通用的電腦程式設計語言，具備並行、基於類別、物件導向的特點，並特別設計為盡可能減少實作上的依賴性
MACS	MACS	2.0.10-2012.06.06 2.1.0-2015.04.20 2.1.1-2016.03.09 2.1.2-2019.09.06	http://liulab.dfci.harvard.edu/MACS/	基於模型的 ChIP-Seq 分析（MACS），用於識別轉錄因子結合位點
MUMmer	MUMmer	3.22 3.23 4.0.ob2	http://mummer.sourceforge.net/	一套用於快速比對完整基因組的系統，無論是完整版本還是草稿版本
muTect	muTect	1.1.4 1.1.5	http://archive.broadinstitute.org/cancer/cga/mutect	在癌症基因組的次世代定序數據中識別體細胞點突變
NCBI-blast	NCBI-blast	2.2.27+ 2.7.1+	https://blast.ncbi.nlm.nih.gov/Blast.cgi	BLAST 用於找出生物序列之間的相似區域。該程式會將核苷酸或蛋白質序列與序列資料庫進行比對，並計算其統計顯著性。
Oncotator	Oncotator	1.9.6.1 1.9.9.0	http://portals.broadinstitute.org/oncotator/	一款用於標註人類基因組點突變和插入缺失，並提供與癌症研究人員相關數據的網路應用程式
PEAR	PEAR	0.9.10 0.9.11	https://www.h-its.org/downloads/pear-academic/	一款超高速、記憶體效率高且精準度極高的配對末端讀取資料合併工具。
Perl	Perl	5.26.1	https://www.perl.org	一種高階、通用、解釋型、動態的程式語言
皮卡德	皮卡德	2.17.4 2.18.9 2.25.2	https://broadinstitute.github.io/picard/	用於處理 SAM 檔案的命令列工具
Python 2	Python 2	2.7.14	https://www.python.org	一種用於通用程式設計的解釋型高階程式語言
Python 3	Python 3	3.6.4 3.7.10 3.9.2	https://www.python.org	一種用於通用程式設計的解釋型高階程式語言
QIIME	QIIME	1.9.1	http://qiime.org/	一個用於從原始 DNA 定序資料進行微生物組分析的開源生物資訊處理流程
QIIME2	QIIME2	2017.12 2019.7	https://qiime2.org/	一款專注於數據與分析透明度的微生物群落分析套件
R	R	3.4.3 3.5.1 3.6.1 4.1.0	https://www.r-project.org	一種用於統計運算與圖形的程式語言及自由軟體環境
RNAmmer	RNAmmer	1.2	http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer	在完整基因組序列中預測核糖體 RNA 基因
RSEM	RSEM	1.2.31 1.3.0 1.3.3	http://deweylab.github.io/RSEM/	根據 RNA-Seq 數據精確量化基因與亞型表達量
SAMtools	samtools	1.6 1.8 1.9 1.11	samtools.sourceforge.net	SAM Tools 提供多種工具，用於處理 SAM 格式的比對結果，包括排序、合併、建立索引，以及以「按位置」格式產生比對結果
SPAdes	SPAdes	3.10.0 3.11.1 3.13.0	http://cab.spbu.ru/software/spades/	聖彼得堡基因組組裝工具——是一套包含多種組裝流程的組裝工具包
STAR	STAR	2.7.8a 2.7.9a	https://github.com/alexdobin/STAR	RNA-seq 比對工具
斯特列爾卡	斯特列爾卡	1.0.15 2.8.4 2.9.7	https://github.com/Illumina/strelka	生殖系與體細胞微變異檢測工具
TrimGalore	TrimGalore	0.4.1 0.4.5	https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/	一款基於 Cutadapt 和 FastQC 的封裝工具，用於對 FastQ 檔案進行一致性的品質檢查與接頭修剪
Trimmomatic	Trimmomatic	0.33 0.36 0.38	http://www.usadellab.org/cms/index.php?page=trimmomatic	一款適用於 Illumina NGS 數據的靈活讀取片段修剪工具
tRNAscan-SE	tRNAscan-SE	1.3.1	http://eddylab.org/software.html	大規模基因組序列中的 tRNA 檢測
驗證 BamID	驗證 BamID	1.1.2 1.1.3	https://genome.sph.umich.edu/wiki/VerifyBamID	驗證特定檔案中的讀取序列是否與某個個體（或一群個體）的已知基因型相符，並檢查這些讀取序列是否因混入兩個樣本而受到污染

註：以上清單並非詳盡無遺。如需確認特定軟體是否提供，請與我們聯繫。

工作排程

HPCF2 叢集上的工作排程系統 (PBS Pro)

所有工作都必須透過批次工作系統（PBS Pro）進行排程。提交工作至 PBS 時，需指定所需資源，包括佇列、CPU 數量、記憶體容量及執行時間。當資源可用時，PBS 將在符合最大資源使用限制的前提下，於可用的運算節點上執行該工作。

一般工作佇列


排隊名單：	每個工作最多可使用個處理器	每個工作任務的最大記憶體（GB）	每位使用者同時執行的工作數上限	每位使用者最多可排隊的工作數	最大執行時間 (小時)
小	2	10	18	40	6
small_ext	2	10	6	12	60
中	12	50	12	25	24
medium_ext	12	50	6	8	60
大	12	120	3	4	84
legacy	12	45	8	16	96
test	24	190	1	1	1

Special Queues (On Request)

Sometimes, you may have some jobs that would need more computing resources than those available in the general job queues. In this case, please send us the details of your job execution plan and resources needed. Depending on the current cluster resource usages and trends, we might setup a customized job queue with specific computing resources to enable your job execution on a short term basis.

Job Scripting

PBS Job Directives

Below are some commonly used PBS options in a job command file, which can also be used on the command line with qsub.


PBS Job Directives	說明
#PBS -A acct	Causes the job time to be charged to “acct”.
#PBS -N myJob	Assigns a job name. The default is the name of PBS job script.
#PBS -l nodes=4:ppn=2	The number of nodes and processors per node.
#PBS -l walltime=01:00:00	Sets the maximum wall-clock time during which this job can run. (walltime=hh:mm:ss)
#PBS -l mem=n	Sets the maximum amount of memory allocated to the job.
#PBS -q queuename	Assigns your job to a specific queue.
#PBS -o mypath/my.out	The path and file name for standard output.
#PBS -e mypath/my.err	The path and file name for standard error.
#PBS -j oe	Join option that merges the standard error stream with the standard output stream of the job.
#PBS -M email-address	Sends email notifications to a specific user email address.
#PBS -m	Set email to be sent to the user when:
#PBS -m a	a – the job aborts
#PBS -m b	b – the job begins
#PBS -m e	e – the job ends
#PBS -r n	Indicates that a job should not rerun if it fails.
#PBS -S shell	Sets the shell to use. Make sure the full path to the shell is correct.
#PBS -V	Exports all environment variables to the job.
#PBS -W	Used to set job dependencies between two or more jobs.

Some useful batch job directives to try are the following:


Directive	Function
-S /bin/bash	Specifies which shell to use.
-N JobName	Gives a name to the job. The name will appear in the output of qstat.
-M userame@hku.hk	Specifies an email address where notification messages will be sent.
-m abe (or any subset of a, b, and e)	Specifying when an email will be sent: a — abort, b — begin, e — end. See notes on email below.

PBS Job Output/Error Files

You can specify the full file names of the PBS job output/error files:


PBS Job Directives	說明
#PBS -o mypath/my.out	The path and file name for standard output
#PBS -e mypath/my.err	The path and file name for standard error

Or you can specify them with directories:


PBS Job Directives	說明
#PBS -o mypath	The path for standard output. Output file will be generated as, e.g. 123.omics.OU
#PBS -e mypath	The path for standard error. Error file will be generated as, e.g. 123.omics.ER

Or if these two directives -e and -o are not specified, the working directory and default filenames as below would be used:


Output / Error Files	說明
Output File	File with name $PBS_JOBNAME.o$PBS_JOBID would be generated, e.g. myfirstjob.o123
Error File	File with name $PBS_JOBNAME.e$PBS_JOBID would be generated, e.g. myfirstjob.e123

PBS Pro uses the directory that the job was submitted from to define the working directory for a job – no matter the location of the job submission script. Please note that by default, the PBS scheduler would store data relative to your home-directory. Yet it is recommended to specify a full path to the filename.

Tips:

The PBS job variables (e.g. $PBS_JOBID and $PBS_JOBNAME commonly used) would NOT be resolved in the #PBS job directives at the Omics cluster with new PBS scheduling system (that is different from the statgenpro cluster).

Interactive PBS Jobs

Use of PBS is not limited to batch jobs only. It also allows users to use the compute nodes interactively, when needed. For example, users can work with the developer environments provided by R on compute nodes, and run their jobs (until the walltime is expired).

Instead of preparing a submission script, users pass the job requirements directly to the qsub command. For example, the following PBS script:

 #PBS -l nodes=1:ppn=4 
 #PBS -l mem=2gb 
 #PBS -l walltime=15:00:00 
 #PBS -q small

would correspond to the qsub command with parameters as below:

 qsub -I -q small -l nodes=1:ppn=4,walltime=15:00:00,mem=2gb

Hence, the PBS scheduler will allocate 4 cores to the user as soon as nodes with given specifications become available, then automatically log the user into one of the compute nodes. Any Interactive PBS jobs (i.e. qsub –I ) will be logged out after a 30-minutes idle time to free up allocated but unused resources.

Job Management

Submit a batch job

Batch jobs allow users to submit a bunch of jobs that can be queued up and then executed across the cluster by the job scheduler, when the requested resources become available. Users submit jobs as scripts, which include instructions on how to run the job. The output of the job is written to a file for review later on. You can write a batch job that does anything that can be typed on the command-line.

Here is an example of batch script (with filename simple.sh) :

 #!/bin/bash
 #PBS -l nodes=1:ppn=1
 #PBS -l mem=2g
 #PBS -l walltime=00:01:00
 #PBS -m ae
 #PBS -N omics-simple
 #PBS -q small

 module load bamtools/2.5.1
 bamtools -v

Then you can submit this job script by:

 $ qsub simple.sh

See “PBS Job Directives” for more about the PBS job parameters that could be specified in the script file.

Check job status

After your job is submitted, you can use the qstat command to view the status of your job and the job queues. If there are resources available on the compute nodes, your job should turn into the R (running) state; If the compute nodes are busy, your job may be shown in the Q (queued) state until your requested resources are then available to run it. Jobs shown in C state means it has been completed.

Show running jobs

$ qstat -rn1

範例：

 [itsupport@omics ~]$qstat -rn1
 
 omics: 
                                                              Req'd  Req'd   Elap
 Job ID          Username  Queue    Jobname    SessID NDS TSK Memory Time  S Time
 --------------- --------  -------- ---------- ------ --- --- ------ ----- - -----
 1649.omics      itsupport large    BWA_19NT   121502  1  12   90gb  48:00 R hpch01/0*12

Show queuing/held jobs

$ qstat -i

範例：

 [itsupport@omics ~]$qstat -i
 
 omics: 
                                                              Req'd  Req'd   Elap
 Job ID          Username  Queue    Jobname    SessID NDS TSK Memory Time  S Time
 --------------- --------  -------- ---------- ------ --- --- ------ ----- - -----
 1651.omics      itsupport large    BWA_19T    ---     1  12   90gb  48:00 Q

Show all jobs (including completed jobs)

$ qstat -xan1

範例：

 [itsupport@omics Exome]$qstat -xan1
 
 omics: 
                                                            Req'd  Req'd   Elap
 Job ID          Username  Queue    Jobname    SessID NDS TSK Memory Time  S Time
 --------------- --------  -------- ---------- ------ --- --- ------ ----- - -----
 1648.omics      itsupport  large  BWA_23T     127476   1  12   45gb 48:00 F 00:03 hpch08/0*12
 1649.omics      itsupport  large  BWA_23NT    121502   1  12   90gb 48:00 R 01:12 hpch08/0*12
 1650.omics      itsupport  large  BWA_43T     5940     1  12   45gb 48:00 F 00:01 hpch08/0*12
 1651.omics      itsupport  large  BWA_43NT     --      1  12   90gb 48:00 Q   --   --

Show details of a job

$ qstat -xf JobID

範例：

 [itsupport@omics]$qstat -f 1649
 Job Id: 1649.omics
   Job_Name = BWA_34NT
   Job_Owner = itsupportn@omics
   resources_used.cpupercent = 916
   resources_used.cput = 05:56:56
   resources_used.mem = 59554972kb
   resources_used.ncpus = 12
   resources_used.vmem = 92494740kb
   resources_used.walltime = 01:28:38
   job_state = R
   ...

Check resource usage of a completed job

Function: Show job information and resource usage percentages of a particular job

Usage: myjob

Examples:

   $ myjob 440356
   
   Job information and usage summary of your HPCF job 440356 :
   +-------------------+----------+-------+-------------+------+---------------------+-----------+------+-------+
   | jobid             | username | queue | jobname     | E S  | End Time            | walltime% | mem% | cpu%  |
   +-------------------+----------+-------+-------------+------+---------------------+-----------+------+-------+
   | 440356.statgenpro | kelvin   | large | Large       |    0 | 2016-08-30 04:31:30 |     25.36 | 8.28 | 14.51 |
   +-------------------+----------+-------+-------------+------+---------------------+-----------+------+-------+
   +---------------------+---------------------+----------+----------+------+------+-------+----------+--------+
   | Submit Time         | Start Time          | wtime@   | wtime#   | mem@ | mem# | vmem@ | CPUTime@ | nproc# |
   +---------------------+---------------------+----------+----------+------+------+-------+----------+--------+
   | 2016-08-29 22:26:16 | 2016-08-29 22:26:18 | 06:05:13 | 24:00:00 | 3.31 | 40gb |  4.75 | 10:35:53 |     12 |
   +---------------------+---------------------+----------+----------+------+------+-------+----------+--------+
   E S = Exit Status ; % = usage percentage; # = requested ; @ = used ; mem@/vmem@ in GB ; nproc = number of processors

Only the owner of the job may check the usage

   $ myjob 123456
   ERROR: You (kelvin) not the owner of the job 123456.

Check resource usage summary of completed job(s)

Function: Display job summary and resource usage percentages of your recent jobs\\

Usage: myjobs [-v] [-j] []

Description: Display job summary and resource usage percentages of your recent jobs\\

Optional Parameters:

-v verbose mode with resource usage data of walltime/memory/cpu requested and used

-j Last jobs by JobID numbers (instead of the default by End Time). Note that the jobs in the list by JobID may be different from that by End Time.

specifying the number of jobs (default:20) to display

Examples

$ myjobs 5
 
Your last 5 HPCF completed jobs (by End Time):
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+
| jobid             | username | queue | jobname                | E S  | End Time            | cpu%   | mem%  | wtime% |
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+
| 458952.statgenpro | kelvin   | large | GATK_CK02              |    0 | 2016-10-28 16:45:05 |  69.82 | 10.80 |   0.42 |
| 458860.statgenpro | kelvin   | large | GATK_NG07              |    0 | 2016-10-28 11:19:56 |  53.44 | 10.70 |   0.40 |
| 458853.statgenpro | kelvin   | large | GATK_CK03              |    0 | 2016-10-28 11:13:53 |  69.20 | 10.80 |   0.42 |
| 458862.statgenpro | kelvin   | large | GATK_YJ01              |    0 | 2016-10-28 11:09:05 |  49.90 | 10.30 |   0.24 |
| 458858.statgenpro | kelvin   | large | GATK_KC04              |    0 | 2016-10-28 10:57:39 |  55.96 | 10.30 |   0.11 |
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+
E S = Exit Status ; % = usage percentage; wtime = walltime

Verbose mode

$ myjobs -v 5

Your last 5 HPCF completed jobs (by End Time): 
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+----------+--------+-------+-------+----------+-----------+
| jobid             | username | queue | jobname                | E S  | End Time            | cpu%   | mem%  | wtime% | CPUTime@ | CPUno# | mem@  | mem#  | wtime@   | wtime#    |
 +-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+----------+--------+-------+-------+----------+-----------+
| 458952.statgenpro | kelvin   | large | GATK_CK02              |    0 | 2016-10-28 16:45:05 |  69.82 | 10.80 |   0.42 | 00:41:55 |      2 |  1.08 | 10.00 | 00:30:01 | 120:00:00 |
| 458860.statgenpro | kelvin   | large | GATK_NG07              |    0 | 2016-10-28 11:19:56 |  53.44 | 10.70 |   0.40 | 00:31:03 |      2 |  1.07 | 10.00 | 00:29:03 | 120:00:00 |
| 458853.statgenpro | kelvin   | large | GATK_KC03              |    0 | 2016-10-28 11:13:53 |  69.20 | 10.80 |   0.42 | 00:42:17 |      2 |  1.08 | 10.00 | 00:30:33 | 120:00:00 |
| 458862.statgenpro | kelvin   | large | GATK_YJ01              |    0 | 2016-10-28 11:09:05 |  49.90 | 10.30 |   0.24 | 00:16:57 |      2 |  1.03 | 10.00 | 00:16:59 | 120:00:00 |
| 458858.statgenpro | kelvin   | large | GATK_KC04              |    0 | 2016-10-28 10:57:39 |  55.96 | 10.30 |   0.11 | 00:08:55 |      2 |  1.03 | 10.00 | 00:07:58 | 120:00:00 |
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+----------+--------+-------+-------+----------+-----------+
E S = Exit Status ; % = usage percentage ; wtime = walltime ; @ = used ; # = requested ; CPUno = number of processors ; mem@/vmem@/mem# in GB

Display last jobs by JobID numbers

$ myjobs -j 5

Your last 5 HPCF completed jobs (by JobID):
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+
| jobid             | username | queue | jobname                | E S  | End Time            | cpu%   | mem%  | wtime% |
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+
| 458952.statgenpro | kelvin   | large | GATK_CK02              |    0 | 2016-10-28 16:45:05 |  69.82 | 10.80 |   0.42 |
| 458862.statgenpro | kelvin   | large | GATK_YJ01              |    0 | 2016-10-28 11:09:05 |  49.90 | 10.30 |   0.24 |
| 458860.statgenpro | kelvin   | large | GATK_NG07              |    0 | 2016-10-28 11:19:56 |  53.44 | 10.70 |   0.40 |
| 458858.statgenpro | kelvin   | large | GATK_KC04              |    0 | 2016-10-28 10:57:39 |  55.96 | 10.30 |   0.11 |
| 458857.statgenpro | kelvin   | large | GATK_LR06              |    0 | 2016-10-28 10:54:35 |  55.46 | 10.30 |   0.11 |
+-------------------+----------+-------+------------------------+------+---------------------+--------+-------+--------+
E S = Exit Status ; % = usage percentage; wtime = walltime

Delete a job

$ qdel JobID

範例：

 [itsupport@omics]$qdel 1651

常見問題

How to make file compression faster with multiple CPU cores?

By default, file compression with 7z only uses a single CPU core and it may take considerable time when archiving large amount of files.

We may speed it up by enabling the use of multiple CPU cores in 7z.

We can specify the “bzip2” algorithm with the “-mm=Bzip2” argument” and specify the number of threads with the “-mmt=<#THREAD>” argument.

Below is an example command that uses 4 threads.

7za a -mm=Bzip2 -mmt=4 output_zip_file input_file

How do I check how much data storage space I have used?

User may use the “df” command to show the disk quota usage of his/her home folder.

[tmchan@omics ~]$df -h ~/ Filesystem Size Used Avail Use% Mounted on compellent2:/home 1000G 308G 693G 31% /home

How to login HPCF with VS Code?

Our HPCF cluster has a special setup due to security policy. Please copy and add the following settings to your SSH config file, then try reconnecting to “omics” from VSCode.

Host omics
     HostName omics
     User username
     ProxyCommand ssh -q -W %h:%p username@omics.staging1.cpos.hku.hk

Note: Replace the above text “username” as your own HPCF login account name

You can refer the below screenshot to locate the SSH config file in your PC, and open the file directly from VSCode:

Why my VS Code cannot login HPCF?

Please get the portable version of 1.98.2 VSCode from the link below and use this specific version to connect to our HPCF cluster. Portable version of VSCode is preferred in this case because it would not perform self-upgrade automatically.

https://update.code.visualstudio.com/1.98.2/win32-x64-archive/stable

filename: VSCode-win32-x64-1.98.2.zip
md5sum: db4fefdae4986ef4ba5ba4524d8488cd

Also, it is recommended disabling VSCode update function in its settings. You can go to File > Preferences > Settings and search for “Update”, then:

Uncheck “Update: Enable Windows Background Updates”
Set “Update: Mode” to “none”

How to launch RStudio / Jupyter Notebook in HPCF?

If you have not set X11 forwarding on your PC before, please follow the below steps:

Install MobaXterm

Download the installation package from the official site: https://mobaxterm.mobatek.net/

After installation, open MobaXterm

Check if the option “X server” is started or not. Click the option if it is stopped

When you see below, it means “X server” is started

RStudio

Connect to HPCF2 (a.k.a. omics); then, check the available software/modules of RStudio with the below command:

ml av rstudio

Type the below command to module load the RStudio software (RStudio/1.4.1717 for best compatibility):

ml RStudio/1.4.1717

The message “RStudio/1.4.1717 is loaded” is appeared.

Note: You can also load an R software module (for example: R/4.2.1) at the same time to launch the higher version of R in RStudio. The default R version is 3.5.2.

Type the below command to open RStudio:

rstudio

The below messages may appear in the command window:

In the meantime, a new window of RStudio should be loaded:

Jupyter Notebook

Check the available software/modules of Jupyter Notebook with the below command:

ml av ju

Type the below command to module load the Jupyter Notebook software:

ml jupyter/notebook

You should view the messages:

“firefox/latest is loaded

jupyter/notebook/ is loaded”.

Type the below command to open Jupyter Notebook:

jupyter notebook

The below messages may appear in the command window:

Meanwhile, a new window of Firefox (with Jupyter Notebook) should be loaded:

After starting Jupyter Notebook process, please use one of the following commands in a new SSH terminal to set up SSH local port forwarding:

ssh -N -f -L localhost:8889:omics:8889 username@hpcf2.staging1.cpos.hku.hk
or
ssh -N -f -L 8889:omics:8889 username@hpcf2.staging1.cpos.hku.hk
or
ssh -N -f -L 127.0.0.1:8889:omics:8889 username@hpcf2.staging1.cpos.hku.hk

(Port 8889 is an example, please change it to the actual port listened by Jupyter Notebook)

How to resolve SSH host identification warning when connected to HPCF?

When login HPCF, you might encounter the “WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!” prompt:

The above error means the security key for our HPCF Server has been updated due to a previous server security maintenance. This is expected and safe if you have not logged into our HPCF since Oct 2024.

To fix this, you need to remove the old key from your computer. Please open your terminal and use the following command:

ssh-keygen -R hpcf2.staging1.cpos.hku.hk

After running the command, try connecting to the server again:

ssh your_username@hpcf2.staging1.cpos.hku.hk

(Replace your_username with your actual username)

When asked “Are you sure you want to continue connecting (yes/no/[fingerprint])?”, type yes and press Enter, and you should then be able to input password and log in HPCF.

聯絡我們

2831-5417
itsupport.cpos@hku.hk

高效能運算中心