[A-47]ARMv9/v8-电源状态管理软件架构(PSCI架构)

ver0.1

前言

我们的手已经开始触摸电源管理的软件层面的话题了,本文我们将在前文电源状态的基础上深入到电源管理软件架构的内部看一下。为了便于更加精细的管理,SOC的硬件节点会被划分成不同的电源域,而电源域又有不同的电源模式。电源管理系统是通过SCP对各个电源域内的PPU进行编程控制驱动电源域在不同电源模式之间进行切换。那么如何使电源域的电源模式满足软件的需求又能够省电,也就是让SOC工作在最佳能效比,就要引入OSPM对整个电源管理框架的干预,其实就是软件内部更知道自己的状态,通过一些软件策略不断的和SCP进行沟通,让SCP做出正确的决策。目前的OSPM有三条途径可以对电源管理进行干预:电压、时钟和电源状态。对应到CPU的节点可以概括为IDLE管理和DVFS管理。DVFS的管理先放一放,我们今天先来聊聊ARM是如何通过电源状态实现IDLE的管理,以及ARM架构下IDLE管理著名方式PSCI接口体系。同样在阅读本文之前,希望大家读一读我们的前序文章,掌握一些基础,也顺便找找感觉:
(1)[V-02]虚拟化基础-CPU架构(基于AArch64)
(2)[A-03]ARMv8/ARMv9-多级Cache架构
(3)[A-21]ARMv8/v9-SMMU系统架构和功能概述
(4)[A-25]ARMv8/v9-GIC的系统架构(中断的硬件基础)
(5)[A-38]ARMv8/v9-Generic Timer系统架构
(6)[A-41]ARMv9/v8-电源管理系统架构(Power Management System Architecture)
(7)[A-42]ARMv9/v8-电源管理工作原理(SCP Service Overview)
(8)[A-43]ARMv9/v8-电源控制框架简介PCF(Power Control Framework Overview)
(9)[A-0x2c]ARMv9/v8-电源管理域(Voltage Domain/Power Domain)
(10)[A-45]ARMv9/v8-电源模式(Power Modes)
(11)[A-46]ARMv9/v8-电源状态(Power States)
(12)[V-05] 虚拟化基础-异常模型(Exception)(AArch64)

正文

1.1 背景

前文中我们介绍了电源管理中的电源状态的基本概念。需要搞清楚,SOC、PE-Cores、Clusters、Devices(除了CPU之外的其他电源域)都有自己的电源状态。Devices的电源状态相对比较简单就是OFF和ON,当然根据不同的SOC厂家和IP厂家的实现方式不同,也可以支持更多的电源状态。考虑到我们研究的方向,我们还是以CPU这一条线为主,重点研究一下CPU子系统的电源状态及其管理架构,如图1-1。
1-1

图1-1 SOC Power States Example

1.1.1 挑战

有了SOC的全局视野,再聚集到的CPU子系统。前文已经讨论过,OSPM对CPU的电源管理两个抓手IDLE管理和DVFS管理。今天先来看一下第一只手IDLE management:

Idle management is normally under the control of the operating system. In such a case, when a core is idle, Operating System Power Management (OSPM) moves it into a low-power state. Typically, a choice of states is available, with different entry and exit latencies, and different levels of power consumption associated with each state. The state that is used typically depends on how quickly the core is required again. The power states that can be used at any one time might also depend on the activity of other components in a SoC, beside the cores. Each state is defined by the set of components that are clock-gated or power-gated when the state is entered.
A challenge for idle power management is that various operating systems, from various vendors, can be simultaneously executing in an Arm system. It is then necessary to have a method of collaboratively performing power control. For example, if the operating system that is managing power, running at one level of privilege, wants to enter a state that powers-on or off a core, then operating systems at other levels of privilege need to react to this request. Equally, if a core is woken from a power state by a wake-up event, it might be necessary for operating systems running at different levels of privilege to perform actions, such as restoring state. The Power State Coordination Interface (PSCI) specification provides an interface for this purpose.
PSCI leads to a power state change request to an SCP software interface. The SCP firmware acts on that message and manages all hardware level details.

我们对手册上的描述总结一下:
(1) OSPM要对一个电源域进行电源状态的管理是非常复杂的事情,主要体现在三点:
• 一个是和SCP进行交互非常的繁琐涉及到大量的IO操作,CPU和总线上的其他IP通信真的很麻烦,编程难度大且容易出错,对于电源这么敏感的子系统来说就更加突出。
• 对于CPU子系统而言,由于其内部的电源域多,而且每个电源域对应的电源模式和电源状态也多,也增加了电源状态机的管理难度。
• CPU内部和SOC内部的各个电源域的电源域存在继承关系,这也加剧了OSPM对CPU内部各个电源域的电源状态的管理难度。
(2) 从集成的角度出发,由于市场上有不同的操作系统,以及不同的芯片OEM厂家。大家都要适配基于ARM架构的SOC系统,如果各自为战,不但会增加SOC的成本,还会有大量的稳定性课题需要克服。
(3) 随着ARM架构的不断升级,基于ARM体系的软件架构也越来越复杂。尤其是引入虚拟化和Security机制之后,尤其是引入RME机制之后,软件系统本身也变得越来越复杂,如图1-2所示。试想一下,一个EL1的OS想让一个PE-Core进入Idle状态,因为它的权限有限,软件系统内部总要有一个模块考虑一下运行在这个PE-Core上的Hypervisor的感受以及Security世界上TEE-OS和APP的感受吧。
1-2

图1-2 Realm Management Extension

1.1.2 PSCI(Power State Coordination Interface)

前面提到的那么多电源管理带来的挑战,ARM提出了得救之道那就是PSCI:

This PSCI for PE and system level power management that can be used by OS vendors for supervisory software working at different levels of privilege on an Arm device.
Rich operating systems like Linux and Windows, hypervisors, privileged firmware, and Trusted OS implementations must interoperate when power is being managed. The aim of this standard is to ease the integration between supervisory software from different vendors working at different privilege levels.
This interface standard is aimed at the generalization of code in the following power management scenarios:
• Core idle management.
• Dynamic addition and removal of cores, and secondary core boot.
• System shutdown and reset.
The interface does not cover Dynamic Voltage and Frequency Scaling (DVFS) or device power management (for example, management of peripherals such as GPUs). Arm recommends using Advanced Configuration and Power Interface , or System Control and Management Interface as the standard interface for such features.
The interface is designed so that it can work in conjunction with hardware discovery technologies such as Advanced Configuration and Power Interface (ACPI) and Flattened Device Tree (FDT). It is not a replacement for ACPI or FDT.

我们总结一下ARM的解题思路
(1)抽象出一套接口PSCI给各层的软件开发者使用,注意是各层(EL1/EL2/EL3)的软件开发者,规定好各自的职责,大家都按照接口实现就能够完成对基于ARM的系统的电源状态管理工作。
(2) 这一套接口涵盖三个场景:PE-Cores的IDLE管理、PE-Cores的热插拔、系统的关机和重置。(等各位小伙伴搞过量产的项目之后,就会发现这些都是死机黑屏的重灾区。)
(3) PSCI可以和其他的接口标准(例如ACPI)进行融合,而不是替代,如图1-3所示。(这一部分不展开讲,后面会规划专门的文章讨论。)
1-3

图1-3 Infrastructure system: example power management software stack

1.2 PSCI软件架构

1.2.1 异常模型

在进行PSCI的架构的讲解之前需要澄清一个概念:ARM的异常模型。如图1-4所示:
1-4

图1-4 Service call routing

我们看一下手册中的描述:

The name for privilege in AArch64 is Exception level, often abbreviated to EL. The Exception levels are numbered, normally abbreviated and referred to as EL(x), where (x) is a number between 0 and 3. The higher the level of privilege the higher the number. For example, the lowest level of privilege is referred to as EL0.
The architecture does not specify what software uses which Exception level. A common usage model is application code running at EL0, with a rich Operating System (OS) such as Linux running at EL1. EL2 may be used by a hypervisor, with EL3 used by firmware and security gateway code.
For example, Linux can call firmware functions at EL3, using software interface standards, to abstract the intent from the lower-level details for powering on or off a core. This model means the bulk of PE processing typically occurs at EL0/1.
The Arm architecture includes the exception-generating instructions SVC , HVC , and SMC . The purpose of these instructions is solely to generate an exception and enable the PE to move between Exception levels:
• The Supervisor Call ( SVC ) instruction enables a user program at EL0 to request an OS service at EL1
• The Hypervisor Call ( HVC ) instruction, available if the Virtualization Extensions are implemented, enables the OS to request hypervisor services at EL2
• The Secure Monitor Call ( SMC ) instruction, available if the Security Extensions are implemented, enables the Normal world to request Secure world services from firmware at EL3 .
When the PE is executing at EL0, it cannot call directly to a hypervisor at EL2 or secure monitor at EL3, as this is only possible from EL1 and higher. The application at EL0 must use an SVC call to the kernel, and have the kernel perform the action to call into higher Exception levels.

ARM的架构升级到V8之后引入了异常模型,将CPU在运行时的权限分成了4层,依次升高。通常情况下,CPU都是工作在EL0/1层,当需要申请更高权限的时候,会通过被动陷入的方式或者主动执行指令(SVC/HVC/SMC)的方式跨越EL层级申请更高权限的服务。要注意的是如果EL2层的虚拟化层实现了,那么EL1即便是调用了SMC指令也不能跨越EL2直接申请EL3层的服务。我们前面有专门的文章分析ARM的异常模型,这里就不展开讨论了,建议小伙伴们读一读前面的文章。

1.2.2 PSCI的架构(电源状态管理的软件架构)

有了前面的基础,我们来看一电源状态管理的软件架构,如图1-5所示:
1-5

图1-5 High-Level PSCI SW ARCH

注意这只是一个PSCI实现的例子,具体要看你拿到受的SOC的软件代码的层面的设计和实现。下面我们基于上面的PSCI的软件架构展开讨论:

The PSCI interface must support interaction at all levels of execution implemented on the device, where multiple levels of supervisory software might be executing. For the caller operating in the Normal world, the interface must forward a message to the PPF. In a system that implements EL2, it must be possible to trap interface calls made by the EL1 kernel context to the hypervisor (EL2). In a system that implements Arm RME, it must be possible to trap interface calls made by the Realm executing at R-EL1, to the RMM executing at R-EL2. The RMM can then decide to forward the call to the Hypervisor executing at NS-EL2 . If the hypervisor determines that a change of physical power state is required, it must then be able to use the PSCI interface to inform the PPF.
The conduits available to transfer a message from one Exception level to another depend on the implemented Exception levels and Security states.
Arm systems generally include a power controller, or control logic, that can manage core power. This normally provides interfaces that support several power management functions. Often these include support for transitioning cores, clusters, or a superset into low-power states. In the low-power state, the cores are either fully switched off or in quiescent states where they are not executing code. Arm strongly recommends that the EL3 is responsible for the control of these states. Otherwise, cleanup of the Root and Secure state, including cache clean, is not possible prior to entering the low-power state. Other forms of power management, such as dynamic performance anagement through voltage and frequency scaling, are not covered by this interface. Arm strongly recommends that all policy in power and performance management is performed in the Normal world. The Normal world has greater visibility of the current use and purpose of a given device. Where the Secure world has performance requirements, Arm recommends that IMPLEMENTATION DEFINED mechanisms are used to communicate those requirements to the Normal world.

结合手册的描述我们归纳如下:
(1) 除了EL0之外,每一层都要实现PSCI接口的适配,但是这不代表每一次都能直接和SCP进行通信,和SCP进行直接通信的ARM的建议是放在EL3内部实现。这个是比较容易理解的,权限低的ELx层的软件要动用电源资源肯定要向权限高的ELx层申请,反过来肯定是不行的。那么这就天然的将整个PSCI接口的适配分成两部分,一部分位于EL1或者EL2。EL1主要是站在自己VM内部的视角考虑如何迁移电源状态(vCPU:虚拟CPU)就行了,而EL2要站在所有VM的视角考虑如何迁移电源状态(pCPU:物理CPU)。而EL3内部的Firmware要实现PSCI所承载的状态协调的实现部分,并完成对SCP的通信。
(2) 引入虚拟化技术后,OSPM的实现分为两个层EL1和EL2,根据虚拟化类型的不同具体的实现又有所不同,如图1-6所示:
1-6

图1-6 Typical Power management models in virtualization

看下手册的描述:

Physical OSPM: This comprises the software components that select the physical power states.
Virtual OSPM: This is an OSPM that is present in a guest OS running a virtual machine, which selects virtual, rather than physical power states.

这部分和(1)中断描述以及软件架构图的软件模块高度匹配这里就不展开讨论了。
(3) 引入了Secure的实现之后,PSCI的软件架构可以进一步的抽象,如图1-7所示:
1-7

图1-7 Typical Power management models with Arm RME

我们看下手册的描述:

Many Trusted OS implementations are not SMP-capable. When running on MP devices, they are tied to a single core. Secure Monitor Calls destined for the Trusted OS are only expected to come from that core. The lack of MP support in the OS helps to keep Trusted code simple and small, which in turn aids certification. Trusted OS services are invoked from the Normal world through Rich OS drivers or daemons that are provided with the Trusted OS implementations. The threads associated with these drivers and daemons are normally affinitized to the core used by the Trusted OS.
When Arm RME is implemented, and the Realm Security state is present. In this case Realms access PSCI services through the RMM. The RMM co-ordinates with the Hypervisor in Non-secure EL2 irrespective of the Hypervisor type. The decision for physical power management is made in the same manner as in a system without Realms.

我们分成两个层面来讨论一下:
• 安全世界主要是为RichOS授信用的,因此实现力求高效简洁。在电源管理这一块,ARM是不希望TrustZone过多的参与其中,也就是不要将非安全世界的电源管理业务也耦合到安全世界中,也就是安全世界只做好授信的工作就可以了。
• 那么安全世界的电源状态管理业务不能自裁,ARM希望它们要把电源状态的申请汇总到Hypervisor内部一起综合决策后再通过EL3的固件程序发送到SCP。

1.2.3 Linux下PSCI架构

看一下Linux下的PSCI的架构,如图1-8所示:
1-8

图1-8 inux mobile: example power management software stack

看一下手册的描述:

In the Linux kernel, Energy Aware Scheduling (EAS) provides the core scheduling with tight links to core idle and integrated frequency control. EAS is also linked to a thermal management solution using Intelligent Power Allocation (IPA). Finally, both EAS and IPA have linkages to user space performance management interfaces.
An OS agnostic firmware layer includes an implementation of PSCI as outlined in Idle Management. SCMI provides an interface for communication with the SCP firmware. It supports protocols for power and performance control in addition to sensors, such as those for temperature measurements, used by IPA.

这里提一下的原因是因为这部分内容我们后面的文章会专门对其中一些细节进行介绍,比如对系统调度有影响的EAS算法,事实上我们最开始写电源的时候一部分原因也是对系统调优的时候涉及到了EAS算法。回归到PSCI在Linux系统内部的实现,首先要关注cpu_operations,如图1-9所示:
1-9

图1-9 cpu_operations struct

cpu_operations是Linux系统通过ARM世界并操控CPU的大门。这个大门背后隐藏了PSCI到EL3的细节,如图1-10、图1-11、图1-12所示:
1-10

图1-10 cpu_operations 赋值

1-11

图1-11 cpu_operations赋值调用

1-12

图1-12 psci_operations

1-13

图1-13 psci_operations赋值

这里面罗列一些关键的代码,感兴趣的小伙伴可以自行阅读代码,这里就不展开讨论了。

结语

本文讨论了PSCI的架构,从OSPM对于电源状态管理的挑战开始我们引出了ARM为了应对这些挑战提出的解决方案PSCI接口。与其说是接口,其实更适合的说法是机制,这个PSCI机制跨越EL1到EL3,各层的分工又所有不同。各个VM内部的电源状态其实是虚拟化的状态,即便是安全世界的电源状态也不是最终的物理CPU的电源状态,最终都会通过EL2层面的电源服务综合决策后发送给EL3的Firmware然后转发给SCP,SCP再去对物理CPU进行电源模式的切换。文章的最后我们罗列了Linux下的PSCI的架构以及部分代码,方便一些小伙伴先建立起对PSCI接口的感性认识,下一篇文章我们将对PSCI接口所涉及的业务进行详细的介绍。今天就到这里,谢谢大家,请关注、转发、评论。

Reference

[01] <DEN0050D_Power_Control_System_Architecture.pdf>
[02] <armv8_a_power_management_100960_0100_en.pdf>
[03] <Power_Policy_Unit_Architecture_Specification_V_1_1_ARM_DEN_0051E.pdf>
[04] <DEN0024A_v8_architecture_PG.pdf>
[05] <79-LX-LD-s003-Linux设备驱动开发详解4_0内核-3rd.pdf>
[06] <80-PGxxx-35_QNX_Thermal_Manager_Overview.pdf>
[07] <80-pgxxx-7_n_qnx_power_management_software_architecture_reference_manual.pdf>
[08] <80-ARM-POWER-HK0001_一文搞懂ARM_SoC功耗控制架构.pdf>
[09] <Arm_Power_and_Performance_Management_SCMI_White_Paper.pdf>
[10] <80-ARM-POWER-cs0001_Arm-SoC-power功耗控制架构.pdf>
[11] <80-LX-LK-cl0009_深入理解Linux电源管理.pdf>
[12] <DEN0056D_System_Control_and_Management_Interface_v3_1.pdf>
[13] <arm_total_compute_2021_reference_design_software_developer_guide_en.pdf>
[14] <arm_total_compute_2022_reference_design_software_developer_guide_en.pdf>
[15] <arm_cortex_m85_processor_trm_en.pdf>
[16] <DEN0108_00eac0_smcf-archl-Specification.pdf>
[17] <DEN0022F.b_Power_State_Coordination_Interface.pdf>
[18] <MTxxxx_SCP_User_Manual_V1.0.pdf>
[19] <learn_the_architecture_arm_system_architectures_en.pdf>
[20] <arm_dsu_110_trm_101381_0400_11_en.pdf>
[21] <DEN0077A_Firmware_Framework_Arm_A_profile_1.1_EAC0.pdf>
[22] <80-LX-POWER-PSCI-cs0001_Linux-PSCI框架.pdf>
[23] <learn_the_architecture_-_realm_management_extension_guide.pdf>

Glossary

AP - application processor
OSPM - Operating System Power Management
WFI - Wait For Interrupt
WFE - Wait For Event
DVFS - Dynamic Voltage and Frequency Scaling
SCU - Snoop Control Unit
OPP - Operating Performance Point
PSCI - Power State Coordination Interface
PPU - Power Policy Unit
PCSA - Power Control System Architecture
SoC - System-on-Chip
PCF - Power Control Framework
SCP - System Control Processor
BSP - board support package
SCMI - System Control and Management Interface
EAS - Energy Aware Scheduling
IPA - Intelligent Power Allocation
ACPI - Advanced Configuration and Power Interface
LPI - Low-Power Idle
CPPC - Collaborative Processor Performance Control
PCSM - power control state machine
AOSS - Always-on subsystem
PMIC - Power Management Integrated Circuit
JM - job manager
AON - always on domain
SBSA - Server Base System Architecture
CLK_CTRL - Clock Controller
LPD - Low Power Distributor
LPC - Low Power Combiner
P2Q - P-Channel to Q-Channel Convertor
GPIO - General Purpose IO
RAS - Reliability, Availability, and Serviceability
STR - Suspend to RAM
SMCCC - SMC Calling Convention
RMM - Realm Management Monitor
BMC - board management controller
PPF - Privileged platform firmware

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值