博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Spark User-guide Summary - Streaming
阅读量:6816 次
发布时间:2019-06-26

本文共 1148 字,大约阅读时间需要 3 分钟。

Simple Example

This simple example is word count

from pyspark.sql import SparkSessionfrom pyspark.sql.functions import explodefrom pyspark.sql.functions import splitspark = SparkSession.builder.appName("StructuredNetworkWordCount")\    .getOrCreate()# Create DataFrame with input lines from connection to localhost:9999lines = spark.readStream.format("socket")\    .option("host", "localhost").option("port", 9999).load()# Split the lines into wordswords = lines.select(    explode(        split(lines.value, " ")    ).alias("word"))# Generate running word countwordCounts = words.groupBy("word").count()

Then, use a write stream to output

# Start running the query that prints the running counts to the consolequery = wordCounts.writeStream.outputMode("complete").format("console").start()query.awaitTermination()

Every data item that is arriving on the stream is like a new row being appended to the Input Table. For example, input

w1 w1 w2w2 w1

to the above program will get output as w1: 3, w2: 2 finally

clipboard.png

Note the Result table is real, the Input unbounded table does not exist, the data is discarded after it is used to update

DataFrame API

...

转载地址:http://hpdzl.baihongyu.com/

你可能感兴趣的文章
洛谷3396:哈希冲突——题解
查看>>
Mysql之数据库设计
查看>>
Java Enum
查看>>
method="post" 用户名和密码不显示在网址里
查看>>
LeetCode----8. String to Integer (atoi)(Java)
查看>>
JSP标签
查看>>
Python--day65--母版和继承的基本使用
查看>>
在python 3.6的eclipse中,导入from lxml import etree老是提示,Unresolved import:etree的错误...
查看>>
经纬度计算距离
查看>>
Linux 在添加一个新账号后却没有权限怎么办
查看>>
React 源码剖析系列 - 不可思议的 react diff
查看>>
走近抽象类与抽象方法
查看>>
4. 寻找两个有序数组的中位数
查看>>
React组件开发总结
查看>>
各种符号
查看>>
大道至简,职场上做人做事做管理
查看>>
抗干扰的秘诀:分类、整理与专注
查看>>
Number of Connected Components in an Undirected Graph
查看>>
BZOJ 3143 游走(高斯消元)
查看>>
SpringBoot 配置文件存放位置及读取顺序
查看>>