October 23, 2017

Spark and Qlik Integration

Steps:

  • Start Spark Thrift Server on Datastax Cluster
$ dse -u cassandra -p <password> spark-sql-thriftserver start --conf spark.cores.max=4 --conf spark.executor.memory=2G --conf spark.driver.maxResultSize=1G --conf spark.kryoserializer.buffer.max=512M --conf spark.sql.thriftServer.incrementalCollect=true
  • Enable Qlik Server’s Security Group on AWS to access port 10000 (basically from qlik, need to connect to thrift server port 10000)
  • Install Simba ODBC Driver for Spark on the Qilk Windows EC2 Instance
    Create System DSN as follows:
Spark Server Type:SparkThriftServer
Host:internal-spark-thriftserver-prod-lb-861234576.ap-southeast-1.elb.amazonaws.com (DNS name of spark thrift server ELB)
Port:10000
Database:avm_analytics
Authentication Mechanism:Username
Thrift Transport:SASL
  • Now go to Qlik Admin UI -> Data Connections, click on above DSN, it gets connected
  • In the Data Editor, give below to execute query
LIB connect TO ‘Simba Spark(Qlik-sense-administration)’  
select txn_id,txn_date from transactions where txn_date>=‘2017-06-05’ and txn_date<‘2017-06-06’  
  • Observe the execution of spark job in Spark Web UI