GeoSpark是基于Spark分布式的地理信息计算引擎,相比于传统的ArcGIS,GeoSpark可以提供更好性能的空间分析、查询服务。
准备工作
- Ubuntu18.04
- IDEA
- GeoSpark支持Java、Scala两种,本次开发语言选择Java。
JDK8安装
- 下载JDK8:https://download.oracle.com/otn/java/jdk/8u211-b12/478a62b7d4e34b78b671c754eaaf38ab/jdk-8u211-linux-x64.tar.gz (注:现在需要注册Oracle账户才允许下载) 
- 下载解压后,复制到 - /opt下面,然后在- ~/.bashrc下面添加环境变量- 1 
 2
 3- export JAVA_HOME=/opt/jdk1.8.0_172 #这里改成你的jdk目录名 
 export PATH=${JAVA_HOME}/bin:$PATH
 export CLASSPAHT=.:/opt/jdk1.8.0_172/lib:/opt/jdk1.8.0_172/lib/dt.jar:/opt/jdk1.8.0_172/lib/tools.jar #在JDK8后应该是不需要在配置CLASSPATH,这里为了保险起见,还是加上了
Scala配置
- 下载Scala2.12.8:https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz 
- 下载解压后,复制到 - /opt下面,然后在- ~/.bashrc下面添加环境变量- 1 
 2- export SCALA_HOME=/opt/scala-2.12.8 
 export PATH=${SCALA_HOME}/bin:$PATH
- 然后执行 - source ~/.bashrc
- 执行 - scala -version,如果出现有类似以下信息,则表示安装成功- 1 - Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc. 
Spark单机配置
- 这里配置的是单机版Spark,不需要集群,不需要部署Hadoop等环境. 
- 下载Spark2.4.3: https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.6.tgz 
- 下载解压后,复制到用户目录下面 - /home/{user},然后在- ~/.bashrc下面添加环境变量:- 1 
 2
 3- export SPARK_HOME=/home/hwang/spark-2.4.3-bin-hadoop2.6 
 export SPARK_LOCAL_IP="127.0.0.1"
 export PATH=${SPARK_HOME}/bin:$PATH
- 然后执行 - spark-shell,如果出现以下信息则表示安装成功- 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12- Spark context Web UI available at http://localhost:4040 
 Spark context available as 'sc' (master = local[*], app id = local-1559006613213).
 Spark session available as 'spark'.
 Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 2.4.3
 /_/
 
 Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172)
 scala>
GeoSpark
- 打开IDEA,创建Maven新工程,修改pom.xml文件 - 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78- <properties> 
 <scala.version>2.11</scala.version>
 <geospark.version>1.2.0</geospark.version>
 <spark.compatible.verison>2.3</spark.compatible.verison>
 <spark.version>2.4.3</spark.version>
 <hadoop.version>2.7.2</hadoop.version>
 </properties>
 
 <dependencies>
 <dependency>
 <groupId>org.scala-lang</groupId>
 <artifactId>scala-library</artifactId>
 <version>2.11.0</version>
 </dependency>
 <dependency>
 <groupId>org.datasyslab</groupId>
 <artifactId>geospark</artifactId>
 <version>${geospark.version}</version>
 </dependency>
 <dependency>
 <groupId>org.datasyslab</groupId>
 <artifactId>geospark-sql_${spark.compatible.verison}</artifactId>
 <version>${geospark.version}</version>
 </dependency>
 <dependency>
 <groupId>org.datasyslab</groupId>
 <artifactId>geospark-viz_${spark.compatible.verison}</artifactId>
 <version>${geospark.version}</version>
 </dependency>
 <dependency>
 <groupId>org.datasyslab</groupId>
 <artifactId>sernetcdf</artifactId>
 <version>0.1.0</version>
 </dependency>
 <dependency>
 <groupId>org.apache.spark</groupId>
 <artifactId>spark-core_${scala.version}</artifactId>
 <version>${spark.version}</version>
 <scope>${dependency.scope}</scope>
 <exclusions>
 <exclusion>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>*</artifactId>
 </exclusion>
 </exclusions>
 </dependency>
 <dependency>
 <groupId>org.apache.spark</groupId>
 <artifactId>spark-sql_${scala.version}</artifactId>
 <version>${spark.version}</version>
 <scope>${dependency.scope}</scope>
 </dependency>
 <dependency>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-mapreduce-client-core</artifactId>
 <version>${hadoop.version}</version>
 <scope>${dependency.scope}</scope>
 </dependency>
 <dependency>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-common</artifactId>
 <version>${hadoop.version}</version>
 <scope>${dependency.scope}</scope>
 </dependency>
 </dependencies>
 <build>
 <plugins>
 <plugin>
 <groupId>org.apache.maven.plugins</groupId>
 <artifactId>maven-compiler-plugin</artifactId>
 <version>3.8.0</version>
 <configuration>
 <source>1.8</source>
 <target>1.8</target>
 </configuration>
 </plugin>
 </plugins>
 </build>
- 我们从CSV中创建一个Spark的RDD,CSV内容如下: - 1 
 2
 3
 4- -88.331492,32.324142,hotel 
 -88.175933,32.360763,gas
 -88.388954,32.357073,bar
 -88.221102,32.35078,restaurant- 然后我们初始化一个SparkContext,并调用GeoSpark的PointRDD,将我们的CSV导入。 - 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12- SparkConf conf = new SparkConf(); 
 conf.setAppName("GeoSpark01");
 conf.setMaster("local[*]");
 conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
 conf.set("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator");
 JavaSparkContext sc = new JavaSparkContext(conf);
 
 String pointRDDInputLocation = Learn01.class.getResource("checkin.csv").toString();
 Integer pointRDDOffset = 0; // 地理位置(经纬度)从第0列开始
 FileDataSplitter pointRDDSplitter = FileDataSplitter.CSV;
 Boolean carryOtherAttributes = true; // 第二列的属性(酒店名)
 PointRDD rdd = new PointRDD(sc, pointRDDInputLocation, pointRDDOffset, pointRDDSplitter, carryOtherAttributes);
- 坐标系转换 - GeoSpark采用EPGS标准坐标系,其坐标系也可参考EPSG官网:https://epsg.io/ 
- 1 
 2
 3
 4- // 坐标系转换 
 String sourceCrsCode = "epsg:4326";
 String targetCrsCode = "epsg:3857";
 rdd.CRSTransform(sourceCrsCode, targetCrsCode);
 
