Configuration:
DSE 5.0.6 (See Datastax Cassandra on AWS for Installation Details)
/etc/dse/spark/spark-env.sh
export SPARK_PUBLIC_DNS=<node1_public_ip>
export SPARK_DRIVER_MEMORY="2048M"
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY="4G"
/etc/dse/spark/spark-defaults.conf
spark.scheduler.mode FAIR
spark.cores.max 2
spark.executor.memory 1g
spark.cassandra.auth.username analytics
spark.cassandra.auth.password *****
spark.scheduler.allocation.file /etc/dse/spark/fairscheduler.xml
spark.eventLog.enabled True
#spark.default.parallelism: 3*4cores=12
spark.default.parallelism 12
/etc/dse/spark/fairscheduler.xml
<allocations>
<pool name="default">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>4</minShare>
</pool>
<pool name="admin">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>4</minShare>
</pool>
</allocations>
$ grep initial_spark_worker_resources /etc/dse/dse.yaml
initial_spark_worker_resources: 0.7
so when you start dse spark or dse spark-sql, in spark UI, you can see 3 out of 4 cores allocated
Verification:
$ dse -u cassandra -p ***** client-tool spark version
1.6.2.2
$ dse -u cassandra -p ***** client-tool spark master-address
spark://172.31.12.201:7077
$ dse -u cassandra -p ***** client-tool spark sql-schema
Finding the spark master
$ dsetool status
ap-southeast_1_cassandra Workload: Analytics Graph: no Analytics Master: 172.31.1.147
Spark Web UI:
http://<node1>:7080
Now in http://:4040/stages/, we can see 1 Fair Scheduler Pool, with schedule mode as FAIR
From AWS security perspective, it is best practice to use VPN and not open up Cassandra node to public, or use SSH Tunneling
Spark Logs:
/var/log/spark/master/master.log
/var/log/spark/worker/worker.log
Cleanup app* directories under /var/lib/spark/worker/ that occupies root filesystem
mkdir /tmp/spark-events
chmod ugo+rwx /tmp/spark-events
In /etc/dse/spark/spark-defaults.conf
spark.eventLog.enabled True
Tips
Restart spark worker alone (useful when spark master switches node, and spark worker is not able to communicate with the new master)
$ dsetool sparkworker restart
Write a Reply or Comment