# Function Allocation and Bandwidth Reservation for Mixed-critical Adaptive Software Systems # Mahmoud Hussein Abstract-The new Auto SAR adaptive platform makes mixedcritical automotive systems able to adapt themselves in response to hardware and software failures at runtime. However, mapping functions of these automotive systems and reserving bandwidth for them are still major challenges. In this paper, we propose a model-based approach for mapping functions of an automotive system to its hardware nodes and reserving their bandwidth. To do so, an architecture description language for automotive systems (i.e. EAST-ADL) is used to design an embedded system, and to specify its timing requirements. The design model is then used for identifying functions allocation and their bandwidth in different system configurations. To schedule the critical functions of the system, the Earliest Deadline First (EDF) is used, while the Constant Bandwidth Server (CBS) is used for scheduling the non-critical functions. The quality of service for the non-critical functions is determined by their reserved bandwidth. In addition, a Tabu search-based approach is used for mapping the system functions to hardware nodes. Furthermore, there is a temporal isolation between the critical and non-critical functions. Thus, overruns of the non-critical functions do not affect the timing guarantees of the critical functions, and the quality of service for the non-critical functions is maximized. Keywords: mixed-criticality; design space exploration, adaptive system; schedulability analysis; model-based. In recent years, a number of approaches has been proposed for mapping a system's functions to its processing units, and for reserving their bandwidth to ensure that they are going to meet their deadlines at runtime (e.g. [5] [6] [7] [8] [9]). These approaches are aiming at functions mapping and bandwidth reservation for systems that do not have runtime adaptability. However, the new vehicle systems need to adapt themselves in response to hardware and software failures while they are in operation [10] [11]. To cope with system failures, a number of system configurations need to be specified as reactions to these failures. In addition, functions mapping and their bandwidth reservation in each system configuration need to be defined. Consequently, adopting the existing approaches for specifying a system's different configurations, and defining its functions mapping and their bandwidth reservation is difficult and error-prone task, where these approaches have not been proposed for adaptive systems. In this paper, we propose a model-based approach to ease the specification of the different configurations of an adaptive system, and to identify functions mapping and their bandwidth reservation in each configuration. First, we use EAST-ADL (an architecture description language for automotive embedded systems [12]) for creating the system design model. This model captures the system's functionality, timing requirements, and adaptive behavior. To model the adaptive behavior, we adopted the state machine approach [13]. In this machine, the states are corresponding to the system configurations, and the transitions represent adaptations between these configurations. To capture the timing requirements, we have adopted TIMMO (TIMing MOdel) approach [14]. Second, the system design model is used for identifying the functions allocation and their bandwidth in the different system configurations automatically. To Introduction ith the advances in micro-electronics, embedded system engineers are now able to integrate more system functions on a powerful System-on-Chips (see Figure 1) [1]. The automotive industry also benefits from these advances, where the engineers become able to integrate advanced vehicle functions on high performance electronic control units (ECUs).These functions can be classified as critical and non-critical functions. Thus, mixed-criticality concept has been introduced, where vehicle functions have different new Auto SAR adaptive platform makes automotive systems able to adapt themselves at runtime to cope with hardware and software failures [3]. A major challenge is how to map functions of an adaptive systems to its hardware nodes, and to reserve bandwidth for these functions [4]. W critically levels as shown in Figure 1 [2]. In addition, the schedule critical functions, Earliest Deadline First (EDF) [15]is used, while Constant Bandwidth Server (CBS) [16]is used to schedule non-critical functions. Quality of service for the non-critical functions is determined by their reserved bandwidth. In addition, a Tabu searchbased strategy is used to map functions to the hardware nodes [17]. Furthermore, there is a temporal isolation between critical and non-critical functions. Thus, overruns of the non-critical functions do not affect timing guarantees for the critical ones, and the quality of service for the non-critical functions is maximized. To show the applicability of our approach, a case study using it is performed in the context of Safe Adapt project. The remainder of the paper is organized as follows. A short description of related work is given in Section II. Our approach for designing an embedded system, and for functions mapping and their bandwidth reservation is described in Section III. In Section IV, we present our approach implementation. Finally, we conclude the paper in Section V. # II. # Related Work The work introduced in this paper is related to two research areas: designing adaptive systems, and functions mapping and their bandwidth reservation. In the following, we describe the related work from these two angles. # a) Designing Adaptive Systems Rainbow framework provides mechanisms for monitoring the environment, performing the analysis of selecting the required adaptation strategy, and effecting the needed changes to a running system [18]. To capture An approach was introduced by Zhang and Cheng to create formal models of a system behaviour [20]. In this approach, the system adaptive behaviour is separated from its non-adaptive behaviour. This separation makes the system models easier to specify and verify. They used Petri-nets to capture the system's adaptive behaviour, where they use context change as guidance for the transition between system states. The SOCAM (Service-Oriented Context-Aware Middleware) project introduces an architecture for building adaptive systems [21]. It uses a central server to gather context information from distributed context providers. This information is then processed, so that it can be used by the system functionality. MUSIC project is a component-based framework that is used to optimize a system overall utility in response to context changes [22]. They have a quality composition together with the relevant QoS dimensions, and how they are affected when the system is going to change from one configuration to another. The quality of service model is used for selecting a new configuration that has the best utility and is able to cope with the context changes. Heaven et al. have developed an approach to adapt a system in response to environment changes while preserving its high level goals [23]. They use Labelled Transition Systems (LTS) to capture the system states and the environment situations. Andrade et al. have proposed an approach to cope with unanticipated changes of an adaptive system behaviour [24]. They separate the system adaptation from its functionality, and represent the adaptation logic as a set of condition-action rules. These rules are constructed as a component-based system that can be changed at runtime. Morin et al. proposed a technique to handle the exponential growth of the number of configurations that are derived from the system variability [25]. They combine model driven and aspect oriented approaches to cope with the complexity of adaptive software systems. # b) Function Mapping and Bandwidth Reservation A technique that uses Constant Bandwidth Server (CBS) for integrating critical and non-critical functions on the same processor has been introduced by Abeni and Buttazzo [8]. This technique uses scheduling algorithms such as Earliest Deadline First (EDF) or Rate Monotonic (RM) to guarantee meeting the deadlines of critical functions. The non-critical functions are scheduled by a number of servers. A server parameters (i.e., its bandwidth and period) determine the probability of meeting the deadline of a non-critical function (i.e. its quality of service). The approach is also extended to adjust the server parameters by proportional integral derivative (PID) controllers at runtime. The idea behind this is to maximize the QoS of the non-critical functions [26]. Offline and online approaches to derive CBS parameters have been also proposed [27]. They are aiming at increasing the probability of meeting deadlines of the system non-critical functions. A Tabu search-based algorithm has been introduced for performing functions mapping and their bandwidth reservation [28]. This algorithm considers mixed-critical real-time systems that should tolerate transient faults. It uses EDF to schedule critical functions and CBS to schedule non-critical ones. The faults are tolerated through defining check points with rollback recovery mechanism [29]. It also uses the probability density functions for the non-critical functions. Thus, decisions of the mappings and processor bandwidth allocations are improved. the environment to initiate the adaptation process, the system reactions to environment changes, they used a language called Stitch. Sheng et al. have proposed a model-driven approach to ease the development of context-aware web services [19]. In their approach, they consider the system functionality as a single service, and the environment information is used by a set of rules to adapt the service output parameters in response to environment changes. of service (QoS) model that describes the system In the context of systems that have mixed time triggered (TT) and event triggered (ET) functions, an approach has been introduced [30]. This approach schedules the TT functions by static-cyclic scheduling (SCS), while the ET functions are scheduled using fixedpriority scheduling (FPS). It can be also be extended to constrain the TT schedules by following a given partitioning. The problem of mapping and partitioning have been addressed [31]. However, the partitioning means deciding which functions are TT and which are ET. mixed critical real-time functions on distributed embedded architectureshas been introduced [32]. It assumes that the architecture provides spatial and temporal partitioning. Therefore, enough separation between functions are enforced. In temporal partitioning, each function executes in a separate partition. Each partition is also allocated a set of time slots on a processor (where the function is mapped). Time slots of all functions on a processor are also grouped within a major frame that is repeated periodically. The functions are scheduled using static-cyclic scheduling. The approaches discussed above are focusing on functions mapping and bandwidth reservation for embedded systems that do not adapt at runtime. However, new vehicle systems need to adapt themselves in response to hardware and software failures with the support of the new Auto SAR adaptive platform [3][10] [11]. To cope with failures of such systems, a number of system configurations need to be specified as system reactions. In addition, functions mapping and their bandwidth reservation for each configuration need to be identified. Therefore, using the existing approaches for specifying the system's different configurations, and identifying the functions mapping and their bandwidth reservation is difficult and errorprone tasks. # III. # The Proposed Approach To ease the process of mapping functions of an embedded adaptive system to its hardware nodes, and for reserving their bandwidth, we have proposeda twostep process. In the first step, a model for the system is specified. This model includes the system's functionality, timing requirements, and adaptive behavior. In step two, based on the system model, mappings of the system functions in each system configuration and their bandwidth are identified by schedule ability analysis techniques. In the following, we describe the two steps in detail. # a) Modelling Adaptive Embedded System To model a safe adaptive system, three aspects need to be captured: the system functionality, its timing requirements, and its adaptive behavior. The adaptive behavior specifies system reactions to anticipated changes such as hardware failures. The System Functionality: The system functionality consists of functions that interact with each other to meet user requirements (see Figure 2). To model such functionality, an architecture description language for automotive domain (i.e. EAST-ADL [12]) is used. In EAST-ADL, the architecture is modeled at two levels of abstraction: an abstract functional model and a refined in form of a design model. Both levels are modelled as a composite structure that consists of components that interact with each other through functional ports. In our approach, we use cardinality of system components to specify the system variability. A component with cardinality {0 or 1} is optional while cardinality {2} means it has two instances and the system can switch between them at runtime. A cardinality {1} specifies that the component is mandatory and should always exist while the system is in operation (i.e. permanent). # Active function # Inactive function Processing unit (N i ) ? f i Configuration 1 Configuration 2 Adapt A f i I f 1, F f 3 I f 2 A f 4 A Hot replica Cold replica f i H f i C Failed function Non-critical unction f i A f i Critical function f i f 5 A f 1 H f 3 A f 2 C f 6 A ? f 1 A f 3 A f 2 C f 6 A N 1 k=2 S[0] S [1] f i # N j Mapping f i to N j (M i,j ) Multiprocessor Embedded System (S) f 1, I f 3 I f 2 A f 4 A f 5 A Failed to Inactive Hot to Active N 1 k=3 N 1 k=3 N 1 k=2 Fig. 2: A representation of an adaptive embedded system A component may have two instances to increase the system availability and to ensure that the functionality of this component is always provided [33]. These two instances need to be allocated to different processing units, so that a failure does not lead to a total failure of the functionality (e.g. function f 1 in Figure 2). To specify such allocation, the system hardware need to be modelled and the functional allocation can be then defined. Similar to the system functions, the hardware model is specified using EAST-ADL modelling language. It is modelled a composite structure that contains the hardware elements such as processing units, sensors, actuators, etc. The System Timing Requirements: To model the system's timing requirements, we have adopted the technique proposed in TIMMO-2-USE [14] (TIMing MOdel -TOols, algorithms, languages, methodology, and USE cases) project. To specify a timing requirement, events that are associated with a software function or one of its ports are defined. These events are then used for defining timing constraints such as execution time constraint, periodic constraint, reaction constraint, etc. Using TIMMO, the execution time of a function is specified by the Execution Time Constraint concept where the start and the stop events of the function are defined with lower and upper bound for the delay between them (see Figure 3). Similarly, to specify a periodic constraint, an event that is associated with a function is defined. Then, the Periodic Constraint concept is used, where a periodic constraint is specified that references this event. This constraint specifies the inter-arrival time and the period of a function (e.g. 60 milliseconds). # Fig. 3: Representation of timing constraintsof a function f i The System Adaptive Behavior: To adapt the embedded system in response to context changes, we introduce a system management component [34]. This component switches from a system configuration to another in response to an adaptation trigger (e.g. a failure of the function f 1 as shown in Figure 2). Therefore, we need to model adaptation triggers and different runtime configurations (states) of the system. Both represent a runtime system state. To model a system state, we adopted the concept of UML instance specifications where a system design instance (i.e. the system's functionality and hardware platform) is created and configured to specify a system runtime state or an adaptation trigger [35]. In order to model an adaptive behavior of a system, we adopted the state machine approach [13]. This technique makes adaptation policies easy to understand and it is useful for validation and verification purposes. In this machine, states are corresponding to the system configurations, while transitions represent the adaptations between these configurations. Each transition is guarded and triggered by an adaptation event. For example, in response to a function failure during a specific state, the system moves from its state or to a configuration that recovers from this failure as shown in Figure 2. # b) Task Mapping and Bandwidth Reservation To identify functions mapping and bandwidth reservation of an embedded system, the design model is used as a base for doing that (i.e. Step 2). In the following, we first describe the formulation of functions mapping and bandwidth reservation problem. Then, we describe an approach to solve this problem. The problem can be formulated as follows. Given a mixed critical system (S), and a distributed architecture (N) with a maximum number of transient faults (k) (an example system is shown in Figure 2). We are interested in determining a solution (L) consisting of a mapping M (f i ) ? N for each function f i ? S, and a set B containing the bandwidth B i for each function f i . Thus, the deadlines for critical functions are satisfied even in the case of transient faults. In addition, the probability for meeting the deadlines of non-critical functions is maximized. To solve the above problem, we have used a Tabu search-based strategy. Tabu search is an optimization metaheuristic [17]. It explores iteratively solutions in the neighbourhood of current solution to select a solution that minimizes the cost function. The cost function we use (described later) captures schedulability of critical functions and quality of service (QoS) of non-critical functions. The minimization of the cost function aims to improve schedulability of the critical functions and to maximize the QoS for the non-critical functions. Our Tabu search-based strategy is described in Table 1 (an extension for the algorithm described in [28]). The input is an adaptive system model in a form of configurations as described above (S), the hardware nodes (N), and maximum number of iteration for our strategy (MaxI). The output of the algorithm is a number of solutions (L) for the system configurations. The solutions consist of functions mapping M, and the set of bandwidth B for all critical and non-critical functions in each system configuration. J solution L 0 (see Table 1: Line 3-4). In this solution, the functions are allocated randomly to the hardware nodes, and critical tasks are assigned a bandwidth equals to their worst case execution time (WCET) while the noncritical functions assigned a bandwidth equals to their average execution time (AET). The initial solution can be The strategy then starts to search the neighbourhood of the current solution for finding a better solution. New solutions are generated either by changing functions mappings or through increasing/reducing the bandwidth of the non-critical functions. The neighbourhood can be very large. Thus, we only consider a limited number of solution in each iteration. The generated solutions are evaluated by computing their costs. The one with the lowest cost is then selected as current solution (Lines 9-16 in Table 1). This process is repeated until maximum number of iterations is reached or the cost becomes zero. After finding a solution for a system configuration, the strategy is repeated for finding solutions for other system states which is the output of our strategy. One feature of Tabu-search based techniques is the storage of solutions history that are visited (called Tabu List in our strategy). The idea behind this list is to avoid revisiting already explored solutions. The solutions history is initialized in Line 8 of our strategy, and updated with the currently visited solution at Line 16. analysis for critical and non-critical functions needs to be calculated. In the following, we describe these calculations in detail. Schedulability Analysis for Critical Functions: To analyse schedulability of the critical functions and content bandwidth servers for the non-critical functions (describe below), we use a utilization-based test. The utilization of a hardware node N j is computed by Equation 1(below) [36]. In this equation, first, C`i is worst case execution time (WCET) of a critical function f i allocated to the hardware node N j . To compute the WCET while considering failures possibility, we use a number of checkpoints (n i ) together with checkpoints overhead (o i ), error detection overhead (e i ), and the function's execution time as shown in Figure 4-A: C`i = C i + (n i -1) (o i + e i ) + e i [28]. 1 ) ( ) ( : ) ( ) ( : ' < + + ? ? = ? ? = ? = ? = ? j j i j i i j i j i i N R j N f M critical Non t R f i i N f M Critical f R f i i U T B T C (1) Second, T i is the deadline for a critical or a noncritical function f i . Third, B i is the bandwidth that is allocated to a non-critical function. Forth, when a fault occurs during the execution of a function f i , this function has to be restored from a previously saved checkpoint. Also, an execution segment of length (C i /n i ) needs to be executed. Therefore, the utilization needed for recovering from a fault is ((C i /n i ) + e i + m i )/T i , where m i is the time required to recover from the error. For a processing unit N j that can have up to k failures during the system execution, its utilization can be computed following Equation 2 [28]. In this equation, the utilization is determined for recovery critical functions. In addition, the worst-case is the occurrence of the k faults, which is corresponding to largest recovery utilization. i i i i i f F Critical f R f N R T m e n C k U i i i j + + × = ? ? = ) / ( max ) ( ) ( : ? (2) # Normal Execution # Start event Error detection (e i ) Establishing checkpoint (o i ) Error recovery (m i ) Execution with a single fault Execution segment (C i /n i ) A. Normal and faulty executions of a system function f i B. Queuing model for a constant bandwidth server of function f i B i T i C i,k ? Arrival of a job J i,k of function f i # Queue Server End event Fig. 4: Execution of a function f i and a queuing model for its constant bandwidth server # Schedulability Analysis for Non-critical Functions: The schedulability analysis of non-critical functions is probabilities of meeting their deadlines (i.e. their quality of service (QoS)). These probabilities depend on the allocated bandwidths B for them. Because of the temporal isolation property of Constant Bandwidth Server, each non-critical function can be analysed individually. Also, the computation of the system functions' QoS is expensive, and then they are computed and stored to be later used by our strategy. For a non-critical function f i , the QoS (f i ) is defined as the probability of meeting its deadline d i which is P {ft i,k ? rt i,k + d i }. The ft i,k and rt i,k are the finishing and the arrival time of the k th job of the function. To calculate this probability, a CBS that serves the function f i is modelled as a queuing system [37]. The function jobs J i are seen as tokens that need to be served by the server having the capacity B i as shown in Our strategy starts by generating an initial schedulable or not. A schedulable solution is the one that all critical functions meet their deadlines. The initial solution is used as a starting point in finding a better solution for a system configuration (state). However, in the case of it is not the first state, an initial solution is generated that takes into account the previous solution as shown in Table 1: Line 4. To compute the cost of a solution, schedulability Year 2018 A Markov matrix is then built and its steady state probability is computed to determine the probability of meeting the deadline of f i when its allocated bandwidth is B i (for more detail about these calculations, see [26] and [38]). J © 2018 Global Journals In our approach, we consider values of a bandwidth in the interval [AET, WCET]. In the case of Bi?WCET, the deadline will be met in 100% of the cases, while if Bi . Therefore, this configuration is defined as follows: {, , , , , , , , } Fig. 7: Specifying an adaptation trigger and a system response To model the switching between the system configurations in response to the adaptation triggers, a state machine is created as shown in Figure 8. For example, in response to a failure of the steer-by-wire (the adaptation trigger specified on top part of Figure 7), the system adapts from its initial configuration to another that recovers this failure (i.e. the system configuration shown in the bottom part of Figure 7). This system switching is specified using the fourth transition shown in Figure 7 (i.e. "Failure of SBW1"). In this transition, the state of the first instance of the SBW is changed from "Failed" to "Inactive", while the state of the second instance of the SBW is changed from "Hot" to "Active". Based on the timing information specified into the design model (see above), our tool (which implements the strategy described in previous section) finds the allocation and required bandwidth for each system function. An example screenshot of the tool is function. For example the degraded ACC is allocated on TMDP control unit. It also finds the required bandwidth for the degraded ACC which is "53" with QoS of 100% probability of meeting its deadline. The tool also shows how many iterations (e.g. 12) and how long (e.g. 64 millisecond) it takes to find the allocations and to determine the bandwidth reservation. A main feature of our approach is the consideration of all configurations in identifying the function mappings. Thus, our approach improves existing techniques (e.g. Saraswat et al. [28]), where we find allocations that reduce (limit) the changes of function allocation from a configuration to another as shown in Figure 10. Therefore, at runtime, functions re-allocation is reduced or restricted (if possible), while achieving the best quality of service for the non-critical functions. # Conclusion The new AutoSAR adaptive platform makes mixed-critical automotive systems able to adapt themselves at runtime to cope with hardware/software failures. However, mapping functions of these automotive systems and reserving bandwidth for them are still major challenges. In this paper, we proposed a model-based approach for specifying different configurations of an embedded adaptive system, and for defining functions mapping and bandwidth reservation in each system configuration. First, we have used EAST-ADL to create the system design model. This model captures the embedded system functionality, timing requirements, and its adaptive behavior. Second, the system's design model is used as a base for finding functions allocation and for reserving their bandwidth in the different system states (configurations). To schedule the critical functions, the Earliest Deadline First (EDF) is used, while the Constant Bandwidth Server (CBS) is used for scheduling the non-critical functions. Finally, to show our approach applicability, in the context of the Safe Adapt project, a case study has been conducted. As future work, we plan to extend our approach to enable code generation from the system design model, and automatic deployment of the generated system functions to its hardware nodes. Further evaluations will also be carried out to assess the approach robustness by applying it to a number of case studies. # Global 1![Fig.1: Multiprocessor architecture for a system with mixed critical-functions](image-2.png "Fig. 1 :") ![Allocation and Bandwidth Reservation for Mixed-critical Adaptive Software Systems Global Journal of Researches in Engineering ( ) Volume XVIII Issue IV Version I](image-3.png "Function") 1![Tabu search-based strategy for task mapping and bandwidth reservation Input: S: Adaptive system, N: Distributed architecture, MaxI: Maximum number of iterations. configuration C: S[c] do 3: if (c > 0) L 0 = GenerateInitialSolution (L[c-1]); 4: else L i = GenerateInitialSolution(S[c], N); 5: end if 6: L current = L best = L 0 ; 7: Cost best = CostFunction(L 0 ); 8: TabuList = new List; 9: While iter < MaxI or Cost best == 0 do 10: NL = GenerateNeighborhood(L best ); 11: L current = SelectSolution(NL); 12: if Cost Function(L current )