Distributed Workload Management
This report mostly contains links to Wiki pages written to address different aspects of the theme: distributed workload management, and are part of this repository. You can find and navigate through all of them HERE.
Problem Statement
Through this theme we intend to find the best possible solution to the issue of managing workloads, in a distributed environment with a micro-services based architecture, with emphasis on how these would benefit Apache Airavata. This leads to finding the best way that different micro-services (eg: Airavata micro-services) should communicate and distribute work. A lot has been written about the problem statement and requirements in the main Wiki page.
Possible Solutions
To begin with, we identified a proof-of-concept micro-service based example on which we could base our design discussions. Thanks to the healthy conversations on the Apache Airavata Dev mailing list, we have were able to come up with a design, and progressively improve it. We started with evaluating state-full vs state-less architecture to solve the problem. Then shifted our focus on a decentralized vs centralized state architecture, where the latter received a lot of traction and favorable feedback.
Thanks to our understanding of Apache Mesos/Aurora architecture, we were able to imbibe some fundamentals into improving our centralized-state architecture. Each of these “possible solutions” have been well documented with pictorial representations in our Wiki pages. The links to each of them are as follows.
- Proof-of-Concept Example : Wiki link
- A state-full design : Wiki link
- A state-less design : Wiki link
- A centralized, Apache Mesos inspired design : Wiki link
- [KB] Messaging infrastructures : Wiki link
Solution Evaluations
Each Wiki page has a detailed analysis of the respective topic. Based on the Apache Airavata mailing list discussions (see here), we evaluated all possible solutions and had a conceptual agreement towards the Mesos inspired design (see figure below).
Conclusion
We have decided to start a proof-of-concept implementation of the finalized design (Mesos inspired centralized architecture). We will be creating Git issues targeting main building blocks of the design, and members are free to take up and start working on each component as they wish. There will be new Wiki pages added subsequently as we make progress. This will include instructions to build/compile, deploy, and run the code.
GitHub Commits
We are in the development phase, and the prototype is not completely ready. Currently, I have implemented a JobSubmissionTask (which will be part of a worker service). This uses Apache Aurora to schedule a Job on Apache Mesos cloud infrastructure. My commits to GitHub can be tracked here.
Airavata Dev Mailing List Discussions
Below are links to Apache Airavata developer mailing list discussions which I have contributed to.
- Introducing workload distribution theme : LINK
- Define test example, state-full, and state-less design : LINK
- Using a workflow micro-service : LINK
- Centralized vs Decentralized design : LINK
- Final design using centralized-state : LINK
Github Issues
I have created Github issues to track the progress of implementation and to resolve any design conflicts via discussions. The issues can be found HERE