Flink广播状态中使用非基本类型

文章介绍了在Flink状态编程中如何使用广播状态,特别是当基本类型无法满足需求时,如何定义和使用HashMap作为广播变量。通过一个案例展示了如何从Kafka的两个topic消费数据,将其中一个作为广播数据,与另一个数据流进行关联操作,从而实现特定的功能。

背景

在Flink状态编程中,经常会用到状态编程,其中也包括广播状态。广播变量作为K-V类型状态数据,平时使用的基本类型比较多(比如String,Boolean,Byte,Short,Int,Long,Float,Double,Char,Date,Void,BigInteger,BigDecimal,Instant等),以K和V都是String举例,定义如下:

MapStateDescriptor<String, String> mapStateDescriptor = new MapStateDescriptor<String, String>("testMapState", BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO);

在这次的项目中,基本类型已无法满足业务场景,经过研究,可以在广播状态中使用其他的类型,比如HashMap,定义广播变量的时候,只需要在类型声明出做出调整

MapStateDescriptor<String, HashMap> mapMapStateDescriptor = new MapStateDescriptor<String, HashMap>("testMapMapState", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(new TypeHint<HashMap>() {
    @Override
    public TypeInformation<HashMap> getTypeInfo() {
        return super.getTypeInfo();
    }
}));

当然,这里直接用的是父类的方法,可以不用重写,改造如下:

MapStateDescriptor<String, HashMap> mapMapStateDescriptor = new MapStateDescriptor<String, HashMap>("testMapMapState", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(new TypeHint<HashMap>() {}));

参考官网资料:Apache Flink 1.12 Documentation: Broadcast State 模式

案例说明

下面以案例来说明HashMap在广播变量中的使用

Flink DataStream消费kafka的两个topic,形成两个流,数据格式如下:

topic1:{"name":"zhangsan","province":"anhui","city":"hefei"}

topic2:{"province":"anhui","city":"hefei","address":"rongchuang"}

topic1 -> stream1,topic2 -> stream2;

topic2的数据作为广播数据;topic1的数据关联topic2的数据,获取address(逻辑可能不严谨,能满足功能测试即可)。

整体代码实现如下:

package flinkbroadcasttest;

import flinkbroadcasttest.processfunction.FlinkBroadcastTestProcess;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.common.state.MapStateDescriptor;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.streaming.api.datastream.BroadcastConnectedStream;
import org.apache.flink.streaming.api.datastream.BroadcastStream;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;

import java.util.HashMap;
import java.util.Properties;

public class FlinkBroadcastTest {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 禁用全局任务链
        env.disableOperatorChaining();

        String brokers = "kafka-log1.test.xl.com:9092,kafka-log2.test.xl.com:9092,kafka-log3.test.xl.com:9092";
        String topic1 = "0000-topic1";
        String topic2 = "0000-topic2";
        String groupId = "demo";

        Properties props = new Properties();
        props.setProperty("bootstrap.servers", brokers);
        props.setProperty("group.id", groupId);
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("auto.offset.reset", "earliest");
        props.put("max.poll.records", 1000);
        props.put("session.timeout.ms", 90000);
        props.put("request.timeout.ms", 120000);
        props.put("enable.auto.commit", true);
        props.put("auto.commit.interval.ms", 100);

        FlinkKafkaConsumer<String> consumer1 = new FlinkKafkaConsumer<String>(topic1, new SimpleStringSchema(), props);
        consumer1.setCommitOffsetsOnCheckpoints(true);
        DataStream<String> data1KafkaDataDS = env.addSource(consumer1);

        FlinkKafkaConsumer<String> consumer2 = new FlinkKafkaConsumer<String>(topic2, new SimpleStringSchema(), props);
        consumer2.setCommitOffsetsOnCheckpoints(true);
        DataStream<String> data2KafkaDataDS = env.addSource(consumer2);
        
        MapStateDescriptor<String, HashMap> mapMapStateDescriptor = new MapStateDescriptor<String, HashMap>("testMapMapState", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(new TypeHint<HashMap>() {}));
        BroadcastStream<String> broadcast = data2KafkaDataDS.broadcast(mapMapStateDescriptor);
        BroadcastConnectedStream<String, String> connect = data1KafkaDataDS.connect(broadcast);
        DataStream<String> result = connect.process(new FlinkBroadcastTestProcess());
        result.print();

        env.execute("FlinkBroadcastTest");
    }
}
package flinkbroadcasttest.processfunction;

import com.alibaba.fastjson2.JSON;
import com.alibaba.fastjson2.JSONObject;
import org.apache.flink.api.common.state.BroadcastState;
import org.apache.flink.api.common.state.MapStateDescriptor;
import org.apache.flink.api.common.state.ReadOnlyBroadcastState;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.streaming.api.functions.co.BroadcastProcessFunction;
import org.apache.flink.util.Collector;

import java.util.HashMap;

public class FlinkBroadcastTestProcess extends BroadcastProcessFunction<String, String, String> {

    MapStateDescriptor<String, HashMap> mapMapStateDescriptor = new MapStateDescriptor<String, HashMap>("testMapMapState", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(new TypeHint<HashMap>() {}));

    @Override
    public void processElement(String value, ReadOnlyContext ctx, Collector<String> out) throws Exception {
        try {
            ReadOnlyBroadcastState<String, HashMap> broadcastState = ctx.getBroadcastState(mapMapStateDescriptor);
            JSONObject obj = JSON.parseObject(value);
            String name = obj.getString("name");
            String province = obj.getString("province");
            String city = obj.getString("city");
            HashMap hashMap = broadcastState.get(province);
            if (hashMap != null && hashMap.containsKey(city)) {
                String address = hashMap.get(city).toString();
                System.out.println(address);
                JSONObject object = new JSONObject();
                obj.put("name", name);
                object.put("province", province);
                object.put("city", city);
                object.put("address", address);
                out.collect(object.toString());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    @Override
    public void processBroadcastElement(String value, Context ctx, Collector<String> out) throws Exception {
        try {
            BroadcastState<String, HashMap> broadcastState = ctx.getBroadcastState(mapMapStateDescriptor);
            JSONObject obj = JSON.parseObject(value);
            String province = obj.getString("province");
            String city = obj.getString("city");
            String address = obj.getString("address");
            String kind = obj.getString("kind");
            HashMap hashMap = broadcastState.get(province);
            if (kind.equals("delete")) {
                if (hashMap != null && hashMap.containsKey(city)) {
                    hashMap.remove(city);
                    broadcastState.put(province, hashMap);
                }
            } else if (kind.equals("add")) {
                if (hashMap == null) {
                    hashMap = new HashMap();
                }
                hashMap.put(city, address);
                broadcastState.put(province, hashMap);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

内容概要:本文系统梳理了多个科研领域的前沿研究与技术实现,重点涵盖FDTD方法中的完美匹配层(PML)研究,以及Matlab/Simulink在电磁、电力、控制、通信、信号处理、图像处理、路径规划、能源系统优化等领域的仿真与算法实现。文中列举了大量基于Matlab和Python的科研案例,如风电功率预测、负荷预测、无人机三维路径规划、电池系统故障诊断、雷达模拟、通信编码、微电网优化调度等,并强调结合智能优化算法(如粒子群、遗传算法、深度学习等)提升系统性能。同时,提供了丰富的代码资源与仿真模型,涵盖永磁同步电机控制、逆变器设计、多智能体任务分配、虚拟电厂调度等复杂系统,助力科研人员快速开展复现实验与创新研究。; 适合人群:具备一定编程基础,熟悉Matlab/Python工具,从事电气工程、自动化、通信、人工智能、新能源、控制科学等相关领域研究的研发人员及研究生。; 使用场景及目标:① 学习并实现FDTD仿真中的PML边界条件以有效抑制数值反射;② 掌握Matlab/Simulink在多物理场建模、控制系统设计与优化算法中的综合应用;③ 借助提供的代码资源完成科研复现、课程设计、竞赛项目或工程原型开发; 阅读建议:此资源以科研实战为导向,不仅提供理论方法,更强调代码实现与仿真验证。建议读者结合自身研究方向,按目录顺序查阅相关模块,下载配套代码进行调试与二次开发,以达到学以致用、融会贯通的目的。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值