πŸͺ„ Inspiration πŸͺ„

This HPRC

Is indeed not too easy

BAAANG HP-EZ!


High performance research computing (HPRC) refers to the use of advanced techniques to solve complex problems beyond the capabilities of standard computers, but using HPRC clusters typically require expert knowledge. We created HP-EZ to translate complex HPC concepts into intuitive interfaces that allow non-computing experts to run large-scale computations correctly, efficiently, and with confidence.


πŸ”¨ What it does πŸ”¨

  • Provides a web-based interface for submitting jobs to HPRC clusters without using the command line or SLURM batch files.

  • Allows users to upload their code as a ZIP/TAR file and specify how it should be executed.

  • Uses Gemini 2.0 analysis to to detect technologies present from the code files and auto-suggest appropriate HPC settings such as:

    • Number of nodes
    • CPU cores per node
    • Memory per node
    • Maximum runtime
    • Whether GPUs are needed

and more.

  • Submits the job to the backend running on HPC, which then generates a SLURM batch script with the appropriate parameters, sets up the code base, and runs batch script on the user’s behalf.
  • Hides SLURM-specific complexity (partitions, QOS, topology configuration, resource limits) behind a simple, guided form.
  • Displays a real-time dashboard of job states, including queued, running, completed, and failed jobs.
  • Maintains a complete per-user job history, showing all past and current submissions.
  • Provides access to job outputs and logs for debugging, analysis, and result download.
  • Reports clear failure reasons and rejection messages when a job crashes or cannot be scheduled.
  • Enforces and displays cluster resource limits, preventing invalid or excessive job requests.

🍭 How we built it 🍭

  • Backend implemented in Python using FastAPI, deployed on a server with a lightweight SLURM-managed cluster.
  • Used Aurora DB to store job details, account data, and session information
  • Designed and implemented a secure API for job submission, status monitoring, output retrieval, file management, and cluster information access.
  • Implemented a Next.js and React-based frontend that delivers an easy-to-use dashboard for submitting jobs and checking job status.
  • Used the Gemini API through OpenRouter to provide generative AI suggestions

πŸ€Έβ€β™€οΈ Challenges we ran into πŸ€Έβ€β™€οΈ

  • Creating a prompt for generative AI to both avoid overfitting and defaulting to general values when filling in suggested HPC settings

🌞 Accomplishments that we're proud of 🌞

We are proud of:

  • making a sleek, modern looking UI
  • incorporating creative use of generative AI through recommendations
  • developing our functionality to work with Grace, the flagship HPRC cluster at A&M

πŸ‘‘ What we learned πŸ‘‘

  • SLURM and HPC connection
  • LLM prompting best practices and optimizations

😸 What's next for hp-ez 😸

  • Having prebuilt recipes the user can select based on their project will allow for more specialized and intuitive job submissions
  • Using AI to automate repetitive tasks such as generating multiple jobs for parameter sweeps

Links for Figma Challenge:

Prototype: https://www.figma.com/proto/bzsCfYvqpGI7duvs9vZybS/hp-ez?node-id=76-353&t=tzLCSAy3RK691nJc-1&scaling=min-zoom&content-scaling=fixed&page-id=0%3A1&starting-point-node-id=76%3A353

Video: https://www.youtube.com/watch?v=yYSQl5-xr5Y

Built With

+ 14 more
Share this project:

Updates