- 故障时的线程
(gdb) info thread
Id Target Id Frame
8 Thread 0x7f469fa66700 (LWP 31069) "rdk:main" 0x00007f46a1c83027 in pthread_join () from /lib64/libpthread.so.0
7 Thread 0x7f469ea64700 (LWP 31071) "rdk:broker1" 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
6 Thread 0x7f469e263700 (LWP 31072) "rdk:broker2" 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
5 Thread 0x7f469da62700 (LWP 31073) "rdk:broker3" 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
4 Thread 0x7f469d261700 (LWP 31074) "rdk:broker4" 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
3 Thread 0x7f469ca60700 (LWP 31075) "rdk:broker5" 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
2 Thread 0x7f4697fff700 (LWP 31076) "rdk:broker6" 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
* 1 Thread 0x7f46a1f96f40 (LWP 31045) "test" 0x00007f46a1c83027 in pthread_join () from /lib64/libpthread.so.0
- 故障时的 librdkafka 线程栈
(gdb) t 2
[Switching to thread 2 (Thread 0x7f4697fff700 (LWP 31076))]
#0 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f46a0b5cc5d in poll () from /lib64/libc.so.6
#1 0x0000000000752e8d in rd_kafka_transport_poll (rktrans=rktrans@entry=0x7f4684004e80, tmout=tmout@entry=1) at rdkafka_transport.c:1613
#2 0x0000000000752f0b in rd_kafka_transport_io_serve (rktrans=0x7f4684004e80, timeout_ms=timeout_ms@entry=1) at rdkafka_transport.c:1476
#3 0x0000000000741100 in rd_kafka_broker_serve (rkb=rkb@entry=0x1acc140, abs_timeout=abs_timeout@entry=9681368183471) at rdkafka_broker.c:2555
#4 0x0000000000741597 in rd_kafka_broker_ua_idle (rkb=rkb@entry=0x1acc140, timeout_ms=<optimized out>, timeout_ms@entry=-1) at rdkafka_broker.c:2617
#5 0x0000000000741cff in rd_kafka_broker_thread_main (arg=arg@entry=0x1acc140) at rdkafka_broker.c:3552
#6 0x000000000078f1c7 in _thrd_wrapper_function (aArg=<optimized out>) at tinycthread.c:583
#7 0x00007f46a1c81eb5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f46a0b678fd in clone () from /lib64/libc.so.6
- 发生故障的代码段
死循环出现在函数 rd_kafka_broker_thread_main 中:
// rdkafka_broker.c:3552
// rkb: rdkafka broker
// rkb_refcnt: reference count of rdkafka broker
// #define rd_kafka_broker_terminating(rkb) (rd_refcnt_get(&(rkb)->rkb_refcnt) <= 1)
int rd_kafka_broker_thread_main() {
// 死循环发生在这个 while 循环
while (!rd_kafka_broker_terminating(rkb)) { // rdkafka_broker.c:3493
rd_kafka_broker_ua_idle() { // rdkafka_broker.c:3552
do {
rd_kafka_broker_serve(rkb, abs_timeout); // rdkafka_broker.c:2617
} while (!rd_kafka_broker_terminating(rkb)&&...) // 注:死循环不是发生在这儿
}
}
}
- 问题原因
调用 Consumer::consume() 后,没有对该函数返回的 Message* 值调用 delete 释放。
- RdKafka::Message
一个纯虚的 C++ 类,它的实现类为 RdKafka::MessageImpl 。
// rkm: rdkafka message
MessageImpl::~MessageImpl()
{
if (free_rkmessage_)
rd_kafka_message_destroy(const_cast<rd_kafka_message_t *>(rkmessage_));
if (key_)
delete key_;
}
// rdkafka_msg.c:808
void rd_kafka_message_destroy (rd_kafka_message_t *rkmessage) {
rd_kafka_op_t *rko;
if (likely((rko = (rd_kafka_op_t *)rkmessage->_private) != NULL))
/* => */ rd_kafka_op_destroy(rko);
else {
rd_kafka_msg_t *rkm = rd_kafka_message2msg(rkmessage);
rd_kafka_msg_destroy(NULL, rkm);
}
}
// rdkafka_op.c:214
void rd_kafka_op_destroy (rd_kafka_op_t *rko) {
rd_kafka_msg_destroy(NULL, &rko->rko_u.fetch.rkm);
}
// rdkafka_msg.c:45
void rd_kafka_msg_destroy (rd_kafka_t *rk, rd_kafka_msg_t *rkm) {
rd_kafka_topic_destroy0(rd_kafka_topic_a2s(rkm->rkm_rkmessage.rkt));
}
- delete message 调用栈
#0 rd_atomic32_sub (v=1, ra=0x7ff61c002b50) at rdatomic.h:86
#1 rd_refcnt_sub0 (R=0x7ff61c002b50) at rd.h:305
#2 rd_kafka_topic_destroy0 (s_rkt=0x7ff61c002b40) at rdkafka_topic.h:127
#3 rd_kafka_msg_destroy (rk=<optimized out>, rkm=0x7ff6200033e0) at rdkafka_msg.c:59
#4 0x000000000072b983 in rd_kafka_op_destroy (rko=0x7ff620003370) at rdkafka_op.c:219
#5 0x00000000006f0dd9 in ~MessageImpl (this=0x1b990d0, __in_chrg=<optimized out>) at rdkafkacpp_int.h:125
#6 RdKafka::MessageImpl::~MessageImpl (this=0x1b990d0, __in_chrg=<optimized out>) at rdkafkacpp_int.h:128
本文分析了librdkafka中出现的线程故障,重点探讨了死循环和内存泄漏问题,涉及pthread_join、rd_kafka_broker_thread_main函数及RdKafka::Message对象的生命周期管理。
3510

被折叠的 条评论
为什么被折叠?



