airavata-courses

Distributed Workload Management

This report mostly contains links to Wiki pages written to address different aspects of the theme: distributed workload management, and are part of this repository. You can find and navigate through all of them HERE.

Problem Statement

Through this theme we intend to find the best possible solution to the issue of managing workloads, in a distributed environment with a micro-services based architecture, with emphasis on how these would benefit Apache Airavata. This leads to finding the best way that different micro-services (eg: Airavata micro-services) should communicate and distribute work. A lot has been written about the problem statement and requirements in the main Wiki page.

Possible Solutions

To begin with, we identified a proof-of-concept micro-service based example on which we could base our design discussions. Thanks to the healthy conversations on the Apache Airavata Dev mailing list, we have were able to come up with a design, and progressively improve it. We started with evaluating state-full vs state-less architecture to solve the problem. Then shifted our focus on a decentralized vs centralized state architecture, where the latter received a lot of traction and favorable feedback.

Thanks to our understanding of Apache Mesos/Aurora architecture, we were able to imbibe some fundamentals into improving our centralized-state architecture. Each of these “possible solutions” have been well documented with pictorial representations in our Wiki pages. The links to each of them are as follows.

Solution Evaluations

Each Wiki page has a detailed analysis of the respective topic. Based on the Apache Airavata mailing list discussions (see here), we evaluated all possible solutions and had a conceptual agreement towards the Mesos inspired design (see figure below).

Mesos inspired design

Conclusion

We have decided to start a proof-of-concept implementation of the finalized design (Mesos inspired centralized architecture). We will be creating Git issues targeting main building blocks of the design, and members are free to take up and start working on each component as they wish. There will be new Wiki pages added subsequently as we make progress. This will include instructions to build/compile, deploy, and run the code.

GitHub Commits

We are in the development phase, and the prototype is not completely ready. Currently, I have implemented a JobSubmissionTask (which will be part of a worker service). This uses Apache Aurora to schedule a Job on Apache Mesos cloud infrastructure. My commits to GitHub can be tracked here.

Airavata Dev Mailing List Discussions

Below are links to Apache Airavata developer mailing list discussions which I have contributed to.

  • Introducing workload distribution theme : LINK
  • Define test example, state-full, and state-less design : LINK
  • Using a workflow micro-service : LINK
  • Centralized vs Decentralized design : LINK
  • Final design using centralized-state : LINK

Github Issues

I have created Github issues to track the progress of implementation and to resolve any design conflicts via discussions. The issues can be found HERE