Spark_02_快速上手 | Enoch's blog

1. spark 的安装

如果在一个没有安装过Spark的系统中，需要安装Spark的话，可以遵循以下步骤

需要注意 Spark 和 hadoop 有版本对应关系

下载安装包
上传并解压

2. 进入Spark的交互式终端

[hadoop@hadoop bin]$ spark-shell --master local[2]
25/03/18 21:55:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop:4040
Spark context available as 'sc' (master = local[2], app id = local-1742306146943).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_341)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

3. 基础功能演示

scala> println("hello world")
hello world             

scala> sc.textFile("file:///home/hadoop/word.txt")
res1: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/word.txt MapPartitionsRDD[1] at textFile at <console>:25

scala> res1.count
res3: Long = 3

scala> res1.filter
   def filter(f: String => Boolean): org.apache.spark.rdd.RDD[String]

scala> val myTextFile = sc.textFile("file:///home/hadoop/word.txt")
myTextFile: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/word.txt MapPartitionsRDD[3] at textFile at <console>:24

scala> myTextFile.count
res4: Long = 3

scala> val fileLineCount = myTextFile.count
fileLineCount: Long = 3

scala> val containMe = myTextFile.filter( line => line.contains("me")   )
containMe: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at <console>:25

scala> val collect = containMe.collect()
collect: Array[String] = Array(hello me)

sc 对象是spark core api编程入口，session为dataframe api的编程入口