October 23, 2017
Spark and Qlik Integration
Steps:
- Start Spark Thrift Server on Datastax Cluster
$ dse -u cassandra -p <password> spark-sql-thriftserver start --conf spark.cores.max=4 --conf spark.executor.memory=2G --conf spark.driver.maxResultSize=1G --conf spark.kryoserializer.buffer.max=512M --conf spark.sql.thriftServer.incrementalCollect=true
- Enable Qlik Server’s Security Group on AWS to access port 10000 (basically from qlik, need to connect to thrift server port 10000)
- Install Simba ODBC Driver for Spark on the Qilk Windows EC2 Instance
Create System DSN as follows:
Spark Server Type: | SparkThriftServer |
---|---|
Host: | internal-spark-thriftserver-prod-lb-861234576.ap-southeast-1.elb.amazonaws.com (DNS name of spark thrift server ELB) |
Port: | 10000 |
Database: | avm_analytics |
Authentication Mechanism: | Username |
Thrift Transport: | SASL |
- Now go to Qlik Admin UI -> Data Connections, click on above DSN, it gets connected
- In the Data Editor, give below to execute query
LIB connect TO ‘Simba Spark(Qlik-sense-administration)’
select txn_id,txn_date from transactions where txn_date>=‘2017-06-05’ and txn_date<‘2017-06-06’
- Observe the execution of spark job in Spark Web UI