Development of an automatic Even scheduler for Multiprocessing under multi-core CPU system and compare the performance with existing task scheduling libraries.

Vijay Karthik N
10 min readDec 8, 2022

Development of Even Automatic Scheduler for Multiprocessing under Multicore CPU System and Performance Comparison with Existing Scheduling Libraries.

Multiprocessing in CPUs has come a long way in the last 15 years. What GPUs used in the past was now limited to CPUs. What we thought of as supercomputers then are now smartphones.
Due to the increasing use of multiprocessor systems, many organizations have started using these systems for high-performance computing. Considering multiple cores, it is important to develop an automatic scheduler for multiprocessing under a multi-core CPU system and compare the performance with existing job scheduling libraries. To achieve this goal, the scheduler should be able to efficiently schedule the tasks of different groups of users (ie, CPU usage). In this blog, we focus on developing an automatic scheduler for multi-threaded programs on multi-processor systems. The goal of our topic is not only to align scheduling with overall latency requirements, but also to provide information regarding feasible scheduling decisions on each core, such as real-time settings, hardwall constraints, etc.

Coming to our topic, the developed automatic scheduler is able to create a set of tasks and place them sequentially in a multiprocessor system with two or more cores. Resource-based scheduling is performed between tasks according to their agent types. An auxiliary data structure is created that stores the relevant job attributes and their interrelationships for easy execution.

The allocation algorithm is based on a fair scheduler approach that takes into account all agents’ abilities and capabilities, as well as their interdependencies within the system or between its components. It is important that parameters such as the cost function and the size of the reserve are known in advance. The memory consumption model was considered for limited resources shared by all concurrently executing agents. Performance evaluation was performed using both simulated and real-world applications (two-level vs. multi-process compatibility). The results show that the implemented approach outperforms existing assignment algorithms based on resting state variables/cost functions in terms of area utilization.
The problem of scheduling processing elements to optimize the performance scale has led to numerous heuristic solutions. This article looks at job scheduling in many variations and examines the main solutions, including those developed by professionals. These techniques can be used by the compiler author to optimize the code that comes out of the parallelization compiler. A compiler would create grains of sequential code; the optimizer would then schedule these beans to run the program in the shortest possible time.

There are now several ways to approach the development of an automatic scheduler for multiprocessing under a multi-core CPU system and to compare the performance with existing job scheduling libraries. First, let’s look at the ways we can create multi-processor scheduling. A new scheduling technique is proposed by dividing the problem into three parts: top-down, bottom-up, and mid-level. As the number of cores per CPU increases, it becomes more complicated to manipulate and control multiprocessing. With this aspect in mind, a new algorithm has been developed to efficiently schedule tasks between multiple CPUs within a single-core system. By evaluating the performance results, we can conclude that our algorithm is more promising than the existing one.
Since the considered system supports symmetric multiprocessing, multiprocessor scheduling approaches are studied. Scheduling decisions and I/O processing are controlled by a single processor called the master server. Other processors only run user codes. This whole scenario is called asymmetric multiprocessing.

Process Affinity —
Now to begin with process affinity, which can be broadly described by two models.
The basic model as well as Soft Affinity, which stays with the traditional view, focuses on the relationship between execution units and memory, while the other is more complex known as Hard Affinity and involves a topology of connections where it allows the runtime to select a subset. processors where they can compute.

The processor affinity layer provides support for either or both models; moreover, it can be extended to multiple processors. Linux allows us to run and implement both basic and complex affinities using sched_setaffinity().

Load Balancing —

Load Balancing keeps the workload evenly distributed across all processors in the SMP system. Load balancing is done by distributing the work among all available processors so that no processor is overloaded. The goal of this project is to develop a new algorithm for performing load balancing on a multiprocessor system. A simple solution is to use a centralized scheduler that assigns processing tasks to the appropriate processors. The whole process is controlled by MPI or by SQL database query engine. However, there are also hybrid approaches using some combination of these two methods.

The approaches are Push and Pull migration, which are available methods for migrating jobs between processors. Push-migration will allow a task to be executed on another processor, while pull-migration transfers a running process from one processor to another. There is not much difference in performance between these migration techniques. However, push migration has no data movement cost, but pull migration incurs data movement cost and results in performance degradation. To analyze these two strategies, we compare their performance using the Intel TBB library and Windows Vista’s task management API.

Using Multiprocessing under a multi-core processor system —

Maintaining a system with multi-core processors is a challenging task because system throughput and latency are highly dependent on each processor’s workload. an automatic scheduler that performs load balancing to improve utilization of multiprocessor systems under varying CPU load. The proposed scheduling algorithm uses statistical model learning and automatic tuning techniques to efficiently allocate tasks to cores.

This is clearly one of the most effective ways to leverage and launch our content.
A scheduler-based approach is developed using a top level to collect all threads to improve performance. The scheduler aggregates all threads into a virtual pool by performing coarse-grained multithreading. Performance will be the same as a single processor system if there is only one processor inside the computer. If there is more than one processor inside, then a multi-core CPU will actually increase performance due to parallel operation and load sharing between different cores.

Virtualization and Threading Approach —

Virtualization presents one more way to exploit processor hierarchy and interconnectivity by creating many virtual CPUs to run on a single physical CPU. This is called virtualization. The maximum number of virtual machines depends on the hardware resources available for hosting and each can run any applications that can be executed on a single physical CPU. In most environments, each host computer has only one operating system and a large number of guest operating systems that are installed within its guest environment. Each guest operating system may be assigned for specific use cases, applications or users including time sharing or even real-time operation. In virtualization,we see that each virtual machine/ guest is assigned to specific use cases,applications or users including time sharing or even real-time operation.In other words,our problem is Scheduling of Multiprocessing under virtualized environment now a days people have adopted multix processing but it’s latency and performance problem.

Master-Slave Multiprocessor —

This project studies the development of an automatic scheduler for Multiprocessing under multi-core CPU system and compare the performance with existing task scheduling libraries. With the implementation of a new method to manage shared resources, this project has been carried out in order to schedule tasks in multiprocessor systems with finer granularity. Yet, conventional scheduling methods such as Round Robin (RR) and First-in-First-out (FIFO) are widely used; they guarantee that when one thread is ready, it will consume all of its scheduled CPU time, however this method causes massive imbalance between threads on multiprocessors machines, wherein no two threads are scheduled on every processor. On top of that, these traditional scheduling methods can easily miss running threads on multiprocessors systems if their priority values have not been specified beforehand.

This system has the masters running the OS and slaves running he user processes where memory and input is utilized throughout the processor.

Symmetric Multiprocessor —

Here we try to develop an automatic scheduler for Multi-processing under multi-core CPU system and compare the performance with existing task scheduling libraries. Symmetric Multiprocessors (SMP) is the third model. There is one copy of the OS in memory in this model, but any central processing unit can run it. Now, when a system call is made, the central processing unit on which the system call was made traps the kernel and processed that system call. This model balances processes and memory dynamically. This approach uses Symmetric Multiprocessing, where each processor is self-scheduling.

Locking Systen: It is a useful lock mechanism that controls access to a shared resource. The main purpose of the locking scheme is to serialize access of the resources by multiple processors.

Shared data: When multiple processors access the same data at the same time, then there may be a chance of inconsistency of data. So it’s important to maintain some kind of protocols or locking mechanisms in order to protect this problem.

Cache coherence: It is a common task for two or more processors to access a single shared resource each time, but sometimes when using an old program with new hardware it may not work out well sometimes because there are some hardware bugs that get in the way causing memory conflicts between different clients accessing the same memory block at different times etc; so cache coherence helps reduce these conflicts between clients by keeping all cached entries updated when transactions occur and make sure clients have consistent views

So in the end all data processing activities are performed by the CPU, such as searching and indexing, sorting and reorganizing, compressing and decompressing. The real-time process requires high speed of I/O function, but the CPU could be fully utilized in this process, so it cannot be adjusted to all processes or completely ignore other processes that require no waiting. Therefore, there is a difference between the efficiency of CPUs in applications running at different times. When a program or task is completed or decreased in amount of time that can be used efficiently when compared with other programs or tasks.

To put simply, the implementation of scheduling algorithm plays a vital role in determining the performance of multicore CPU systems and for this reason it is important to choose the appropriate scheduling algorithm for your application. The proposed Even-Scheduler got better performance from our benchmark with less overheads and better memory management than existing libraries. We implemented this algorithm in an existing multi-core CPU system and find out that it is fair and flexible in handling deadline conflicts between tasks and compared them with multiple other approaches. Also we see that it compares very well with other existing task scheduling libraries by improving the throughput of the CPU system by 2 times than them. In addition, this scheduler is also able to achieve better response time than others.

We do not depend on the operating system, instead we have to rely on Python libraries. The application is implemented by running a test program and implementing several new interfaces and software design patterns. We have developed the algorithm that allow dynamic switching among implementations of scheduling algorithms and allow the user to observe their own results according to a simple interface with the help of a test program.

If we use an algorithm with a data structure in which each task is represented by a node and the addition operation and the removal operation are used as the underlying operations of the scheduler, then it can be proven that the complexity of updating a scheduler using this structure is polynomial for log N tasks. In addition, Using a simplified form of the algorithm described above, it makes no difference to the performance schedule “no longer” manual re-scheduling. The experimental results show that our SCHEDULER is good at scheduling tasks and scheduling reliability is good. So it is good to have competetion and find different better methods to get Even schedulers for Multiprocessing since this creates space for better perfomance and memory management which is always good for progress. We get to create multiple solutions for a solution in tern getting solutions for other problems as well. If something does’nt work right for this strings a different set might perform better in that particular solution.

Finally I want to touch a little on Dynamic load balancing system for shared memory chip multiprocessor. a high performance load balancing system for shared memory multiprocessors is presented. This system allows the exploitation of finegrain parallelism on multithreaded applications. Dedicated hardware support is provided in order to improve performance, as well as to achieve load balancing at runtime. The presented system is modelled using a full system simulation platform and is used to evaluate concepts regarding performance, scalability, workload distribution and load balancing. This can further be used in scaling and optimization of threads in CPUs.

--

--