Spark_02_快速上手
Enoch

1. spark 的安装

如果在一个没有安装过Spark的系统中,需要安装Spark的话,可以遵循以下步骤

需要注意 Spark 和 hadoop 有版本对应关系

  1. 下载安装包
  2. 上传并解压

2. 进入Spark的交互式终端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[hadoop@hadoop bin]$ spark-shell --master local[2]
25/03/18 21:55:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop:4040
Spark context available as 'sc' (master = local[2], app id = local-1742306146943).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.3
/_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_341)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

3. 基础功能演示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
scala> println("hello world")
hello world

scala> sc.textFile("file:///home/hadoop/word.txt")
res1: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/word.txt MapPartitionsRDD[1] at textFile at <console>:25

scala> res1.count
res3: Long = 3

scala> res1.filter
def filter(f: String => Boolean): org.apache.spark.rdd.RDD[String]

scala> val myTextFile = sc.textFile("file:///home/hadoop/word.txt")
myTextFile: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/word.txt MapPartitionsRDD[3] at textFile at <console>:24

scala> myTextFile.count
res4: Long = 3

scala> val fileLineCount = myTextFile.count
fileLineCount: Long = 3

scala> val containMe = myTextFile.filter( line => line.contains("me") )
containMe: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at <console>:25

scala> val collect = containMe.collect()
collect: Array[String] = Array(hello me)

sc 对象是spark core api编程入口,session为dataframe api的编程入口

 评论
评论插件加载失败
正在加载评论插件
由 Hexo 驱动 & 主题 Keep
访客数 访问量