(转)理解storm 进程内消息流(很好的一篇文章)

本文详细介绍了Apache Storm中用于内部消息传递的工作进程、执行器线程及其消息队列配置。通过了解这些组件如何交互,可以帮助优化Storm拓扑结构的性能。

from:http://zhangzhenjj.iteye.com/blog/1937861?utm_source=tuicool

Understanding the Internal Message Buffers of Storm

JUN 21ST, 2013

Table of Contents
  • Internal messaging within Storm worker processes
  • Illustration
  • Detailed description
    • Worker processes
    • Executors
  • Where to go from here
    • How to configure Storm’s internal message buffers
    • How to configure Storm’s parallelism
    • Understand what’s going on in your Storm topology
    • Advice on performance tuning

When you are optimizing the performance of your Storm topologies it helps to understand how Storm’s internal message queues are configured and put to use. In this short article I will explain and illustrate how Storm version 0.8/0.9 implements the intra-worker communication that happens within a worker process and its associated executor threads.

Internal messaging within Storm worker processes

Terminology: I will use the terms  message and (Storm)  tuple interchangeably in the following sections.

When I say “internal messaging” I mean the messaging that happens within a worker process in Storm, which is communication that is restricted to happen within the same Storm machine/node. For this communication Storm relies on various message queues backed by LMAX Disruptor, which is a high performance inter-thread messaging library.

Note that this communication within the threads of a worker process is different from Storm’sinter-worker communication, which normally happens across machines and thus over the network. For the latter Storm uses ZeroMQ by default (in Storm 0.9 there is experimental support for Nettyas the network messaging backend). That is, ZeroMQ/Netty are used when a task in one worker process wants to send data to a task that runs in a worker process on different machine in the Storm cluster.

So for your reference:

  • Intra-worker communication in Storm (inter-thread on the same Storm node): LMAX Disruptor
  • Inter-worker communication (node-to-node across the network): ZeroMQ or Netty
  • Inter-topology communication: nothing built into Storm, you must take care of this yourself with e.g. a messaging system such as Kafka/RabbitMQ, a database, etc.

If you do not know what the differences are between Storm’s worker processes, executor threads and tasks please take a look at Understanding the Parallelism of a Storm Topology.

Illustration

Let us start with a picture before we discuss the nitty-gritty details in the next section.

Figure 1: Overview of a worker’s internal message queues in Storm. Queues related to a worker process are colored in red, queues related to the worker’s various executor threads are colored in green. For readability reasons I show only one worker process (though normally a single Storm node runs multiple such processes) and only one executor thread within that worker process (of which, again, there are usually many per worker process).

Detailed description

Now that you got a first glimpse of Storm’s intra-worker messaging setup we can discuss the details.

Worker processes

To manage its incoming and outgoing messages each worker process has a single receive thread that listens on the worker’s TCP port (as configured via supervisor.slots.ports). The parameter topology.receiver.buffer.size determines the batch size that the receive thread uses to place incoming messages into the incoming queues of the worker’s executor threads. Similarly, each worker has a single send thread that is responsible for reading messages from the worker’s transfer queue and sending them over the network to downstream consumers. The size of the transfer queue is configured via topology.transfer.buffer.size.

  • The topology.receiver.buffer.size is the maximum number of messages that are batched together at once for appending to an executor’s incoming queue by the worker receive thread (which reads the messages from the network) Setting this parameter too high may cause a lot of problems (“heartbeat thread gets starved, throughput plummets”). The default value is 8 elements, and the value must be a power of 2 (this requirement comes indirectly from LMAX Disruptor).
1
2
3
// Example: configuring via Java API
Config conf = new Config();
conf.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 16); // default is 8
Note that  topology.receiver.buffer.size is in contrast to the other buffer size related parameters described in this article actually not configuring the size of an LMAX Disruptor queue. Rather it sets the size of a simple  ArrayList that is used to buffer incoming messages because in this specific case the data structure does not need to be shared with other threads, i.e. it is local to the worker’s receive thread. But because the content of this buffer is used to fill a Disruptor-backed queue (executor incoming queues) it must still be a power of 2. See  launch-receive-thread! in  backtype.storm.messaging.loader for details.
  • Each element of the transfer queue configured with topology.transfer.buffer.size is actually a list of tuples. The various executor send threads will batch outgoing tuples off their outgoing queues onto the transfer queue. The default value is 1024 elements.
1
2
// Example: configuring via Java API
conf.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 32); // default is 1024

Executors

Each worker process controls one or more executor threads. Each executor thread has its ownincoming queue and outgoing queue. As described above, the worker process runs a dedicated worker receive thread that is responsible for moving incoming messages to the appropriate incoming queue of the worker’s various executor threads. Similarly, each executor has its dedicated send thread that moves an executor’s outgoing messages from its outgoing queue to the “parent” worker’s transfer queue. The sizes of the executors’ incoming and outgoing queues are configured via topology.executor.receive.buffer.size and topology.executor.send.buffer.size, respectively.

Each executor thread has a single thread that handles the user logic for the spout/bolt (i.e. your application code), and a single send thread which moves messages from the executor’s outgoing queue to the worker’s transfer queue.

  • The topology.executor.receive.buffer.size is the size of the incoming queue for an executor. Each element of this queue is a list of tuples. Here, tuples are appended in batch. The default value is 1024 elements, and the value must be a power of 2 (this requirement comes from LMAX Disruptor).
1
2
// Example: configuring via Java API
conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384); // batched; default is 1024
  • The topology.executor.send.buffer.size is the size of the outgoing queue for an executor. Each element of this queue will contain a single tuple. The default value is 1024 elements, and the value must be a power of 2 (this requirement comes from LMAX Disruptor).
1
2
// Example: configuring via Java API
conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384); // individual tuples; default is 1024

Where to go from here

How to configure Storm’s internal message buffers

The various default values mentioned above are defined in conf/defaults.yaml. You can override these values globally in a Storm cluster’s conf/storm.yaml. You can also configure these parameters per individual Storm topology via backtype.storm.Config in Storm’s Java API.

How to configure Storm’s parallelism

The correct configuration of Storm’s message buffers is closely tied to the workload pattern of your topology as well as the configured parallelism of your topologies. See Understanding the Parallelism of a Storm Topology for more details about the latter.

Understand what’s going on in your Storm topology

The Storm UI is a good start to inspect key metrics of your running Storm topologies. For instance, it shows you the so-called “capacity” of a spout/bolt. The various metrics will help you decide whether your changes to the buffer-related configuration parameters described in this article had a positive or negative effect on the performance of your Storm topologies. See Running a Multi-Node Storm Cluster for details.

Apart from that you can also generate your own application metrics and track them with a tool like Graphite. See Installing and Running Graphite via RPM and Supervisord for details. It might also be worth checking out ooyala’s metrics_storm project on GitHub (I haven’t used it yet).

Advice on performance tuning

Watch Nathan Marz’s talk on Tuning and Productionization of Storm.

The TL;DR version is: Try the following settings as a first start and see whether it improves the performance of your Storm topology.

1
2
3
4
conf.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE,             8);
conf.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,            32);
conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    16384);

 


已经博主授权,源码载自 https://pan.quark.cn/s/fb533687a163 《C++经典代码大全》是一部专门针对C++入门者的重要参考资料,其核心目标在于提供易于理解的C++编程范例,旨在协助新学者迅速领会C++语言的关键概念与技术要点。此压缩文件所包含的信息或许涵盖了从基础到高级的各类C++编程技巧,涉及面向对象编程中的类与对象、函数的应用、程序流程控制、数据结构设计、模板技术以及异常管理等多个关键领域。 1. **基础语法** - 变量声明与初始化:掌握如何声明并初始化不同数据类型的变量,例如整型(int)、浮点型(float)、字符型(char)等。 - 基本输入输出:学习运用`std::cin`和`std::cout`执行标准数据输入与输出操作。 - 控制流语句:熟练运用条件语句(if、if-else、switch-case)以及循环语句(for、while、do-while)来控制程序流程。 2. **类与对象** - 类的定义:学会如何构建类,包含其成员变量与成员函数的设定。 - 对象的创建与使用:掌握如何实例化对象,并经由对象访问类的成员函数。 - 封装:理解封装的理念,并学习使用private和public访问修饰符来保护数据。 - 构造函数与析构函数:掌握如何为类定义自定义的构造过程与析构过程。 3. **函数** - 函数的定义与调用:理解函数的功能与作用,以及如何进行函数的定义和调用。 - 函数参数:精通不同类型的参数传递方法,包括值传递和引用传递。 - 函数重载:学习在同一作用域内定义多个具有相同名称但参数列表不同的函数。 - 函数指针:了解函数指针的运用方法,及其在回调函数和模板中的应用场景。 4. **数组与字符串** -...
内容概要:本文研究了一种计及自适应预测修正的微电网模型预测控制(MPC)优化调度方法,并提供了Matlab代码实现。该方法针对微电网中风电出力等可再生能源的强不确定性,引入自适应预测修正机制,动态调整预测模型以提升短期功率预测精度,从而增强调度决策的准确性与系统运行的鲁棒性。研究构建了完整的MPC滚动优化框架,涵盖预测模型建立、多时间尺度优化求解、实时反馈校正等关键环节,实现了系统运行成本最小化、能源高效利用与功率平衡的多重目标。所提方法有效应对了负荷波动与新能源出力随机性带来的调度挑战,提升了微电网能量管理系统的智能化水平。; 适合人群:具备电力系统、自动化、控制理论或相关领域基础知识的研究生、科研人员及工程技术人员,尤其适合从事微电网优化、可再生能源集成、模型预测控制研究的专业人士,熟悉Matlab编程与优化算法者更佳。; 使用场景及目标:①应用于高比例可再生能源接入的微电网能量管理系统,提升调度方案的实时性与鲁棒性;②为不确定性环境下电力系统动态优化控制策略的研究提供仿真验证平台;③支持学术论文复现、科研课题攻关及实际工程项目的前期技术验证与方案预研。; 阅读建议:建议结合Matlab代码逐模块分析算法实现细节,重点关注预测模型构建与反馈修正机制的设计逻辑,通过调整风电出力、负荷需求等场景参数进行仿真实验,深入理解MPC在微电网调度中的滚动优化特性与自适应修正能力。
代码下载链接: https://pan.quark.cn/s/a4b39357ea24 在信息技术领域中,字符编码扮演着处理文本数据的核心角色。本文着重研究在微控制器系统中,运用C语言如何将UTF-8编码格式换为GBK编码格式,旨在处理串口通信、TF卡存储或LCD显示屏上可能出现的中文显示错误问题。我们将详细剖析UTF-8与GBK编码的运作机制,并研究基于Keil开发平台的C语言实现流程。 UTF-8是一种被广泛接纳的Unicode字符编码方案,它采用可变长度的字节序列来表示字符,每个Unicode字符都对应一个独一无二的数字标识,即码点。UTF-8的一个显著特点是对ASCII字符(英文文本)保持不变,因此在网络传输和文件存储方面展现出优秀的兼容性。 GBK编码,正式名称为“汉字内码扩展规范”,是中国大陆的标准化编码,是对GB2312编码的延伸,总共涵盖了20902个汉字及其他符号,每个字符使用两个字节来表示。GBK在GB2312的基础上扩充了许多繁体字、少数民族文字以及特殊符号,目的是满足更广泛的语言需求。 将UTF-8换为GBK的主要难点在于GBK是一种固定长度的双字节编码,而UTF-8则是可变长度的编码。换过程中需要将UTF-8的多字节序列解析为相应的Unicode码点,然后依据GBK的编码规则查找匹配的编码。这一过程通常借助查表法完成,即建立一个从Unicode码点到GBK编码的映射库。 在Keil开发环境中,使用C语言实现UTF-8到GBK的换可以遵循以下步骤: 1. **构建查表法所需的GBK编码库**:需要准备一个包含所有GBK字符二进制形式的GBK编码库。这个库通常是一个二进制文件,其大小大约为41KB。 2. **解析UTF-8编码**...
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值