spark-submit yarn-cluster on azure-hdinsight cannot provide additional jars (like JNR) to my app -


i'm having problems using hdinsight spark cluster application needs 2 jars,

the first 1 jnr (com.github.jnr:jnr-constants:0.9.0 exactly)
and other 1 jna (net.java.dev.jna:jna:4.1.0) required jruby use.

the problem have whenever run app have error:

[error] exception java.lang.nosuchmethoderror : jnr.constants.platform.openflags.defined()z 

and have same problem jna if remove code calls jnr

my.process.check.run.checkrun$.main(checkrun.scala:219): [error] exception java.lang.nosuchmethoderror : com.sun.jna.platform.is64bit()z 

(is64bit()z functions not available on jna v3.5.1)

checked on workers have this:

myclusteruser@wn0-test:/$ find . -name '*jna*.jar' 2>/dev/null ./usr/lib/hdinsight-scpnet/scp/jvm/jna-3.5.1.jar ./usr/hdp/2.4.2.0-258/storm/extlib/jna-3.5.1.jar myclusteruser@wn0-test:/$ find . -name '*jnr*.jar' 2>/dev/null myclusteruser@wn0-test:/$ 

on head have this:

myclusteruser@hn0-test:/$ find . -name '*jna*.jar' 2>/dev/null ./usr/lib/hdinsight-scpnet/scp/jvm/jna-3.5.1.jar ./usr/hdp/2.4.2.0-258/storm/extlib/jna-3.5.1.jar myclusteruser@hn0-test:/$ find . -name '*jnr*.jar' 2>/dev/null myclusteruser@hn0-test:/$ 
  • i first tried make "fat jar" mvn assembly have needed included, including versions of jna (4.1.0) , jnr, have 129mb jar got same error.
  • i tried add them on spark submit --packages option

    spark-submit \ --verbose \ --packages net.java.dev.jna:jna:4.1.0,com.github.jnr:jnr- constants:0.9.0,org.jruby:jruby:9.0.1.0,com.databricks:spark-csv_2.10:1.4.0 \ --conf spark.executor.extraclasspath=./ \ --conf spark.driver.maxresultsize=2g \ --conf spark.executor.memory=1500m \ --conf spark.yarn.executor.memoryoverhead=500 \ --conf spark.executor.instances=2 \ --conf spark.sql.shuffle.partitions=4 \ --conf 'spark.executor.extrajavaoptions=-xx:permsize=512m -xx:maxpermsize=512m' \ --conf 'spark.driver.extrajavaoptions=-xx:permsize=512m -xx:maxpermsize=512m' \ --deploy-mode cluster \ --master yarn-cluster \ --class my.process.check.run.checkrun \ wasb:///checkrun/my-checkrun-1.0.6-snapshot-jar-with-dependencies.jar \ --nostdin \ --nodb \ --log_level 0 

the permsize options there avoid out of memory problems because hdinsight using java 7 , not java 8.

when can see on each worker have my-checkrun-1.0.6-snapshot-jar-with-dependencies.jar copied on yarn/local/filecache

myclusteruser@wn3-test:/$ find . -name '*checkrun*.jar' 2>/dev/null ./mnt/resource/hadoop/yarn/local/filecache/10/my-checkrun-1.0.6-snapshot-jar-with-dependencies.jar 

and folder contains nothing else jar.

i see spark submit retrieve versions of jar specified on --packages option, store them on local repository m2

then put them on temporary wsab (hdfs) folder alongside spark conf archive,
temporary storage during run
inside archive have spark_conf.properties

#spark configuration. #mon jul 11 13:18:01 utc 2016 spark.executor.memory=1500m spark.yarn.submit.file.replication=3 spark.yarn.jar=local\:///usr/hdp/current/spark-client/lib/spark-assembly.jar spark.yarn.executor.memoryoverhead=500 spark.yarn.driver.memoryoverhead=384 spark.history.kerberos.keytab=none spark.submit.deploymode=cluster spark.yarn.secondary.jars=net.java.dev.jna_jna-4.1.0.jar,com.github.jnr_jnr-constants-0.9.0.jar spark.yarn.scheduler.heartbeat.interval-ms=5000 spark.yarn.preserve.staging.files=false spark.eventlog.enabled=true spark.executor.extraclasspath=./ spark.yarn.queue=default spark.history.provider=org.apache.spark.deploy.history.fshistoryprovider spark.history.ui.port=18080 spark.yarn.historyserver.address=hn0-testr.su4ft5rezscepaqpicvo04xrkb.fx.internal.cloudapp.net\:18080 spark.master=yarn-cluster spark.yarn.containerlaunchermaxthreads=25 spark.executor.cores=2 spark.yarn.max.executor.failures=3 spark.yarn.services= spark.history.fs.logdirectory=wasb\:///hdp/spark-events spark.sql.shuffle.partitions=4 spark.executor.extrajavaoptions=-xx\:permsize\=512m -xx\:maxpermsize\=512m spark.executor.instances=2 spark.app.name=my.process.check.run.checkrun spark.driver.maxresultsize=2g spark.history.kerberos.principal=none spark.driver.extrajavaoptions=-xx\:permsize\=512m -xx\:maxpermsize\=512m spark.eventlog.dir=wasb\:///hdp/spark-events 

as can see have additional jars listed on spark.yarn.secondary.jars parameter.

after run can find more jna , jnr on head node (nothing changed on worker nodes)

myclusteruser@hn0-test:/$ find . -name '*jnr*.jar' 2>/dev/null ./home/myclusteruser/.ivy2/cache/com.github.jnr/jnr-netdb/jars/jnr-netdb-1.1.4.jar ./home/myclusteruser/.ivy2/cache/com.github.jnr/jnr-posix/jars/jnr-posix-3.0.15.jar ./home/myclusteruser/.ivy2/cache/com.github.jnr/jnr-x86asm/jars/jnr-x86asm-1.0.2.jar ./home/myclusteruser/.ivy2/cache/com.github.jnr/jnr-enxio/jars/jnr-enxio-0.9.jar ./home/myclusteruser/.ivy2/cache/com.github.jnr/jnr-unixsocket/jars/jnr-unixsocket-0.8.jar ./home/myclusteruser/.ivy2/cache/com.github.jnr/jnr-constants/jars/jnr-constants-0.9.0.jar ./home/myclusteruser/.ivy2/jars/com.github.jnr_jffi-1.2.9.jar ./home/myclusteruser/.ivy2/jars/com.github.jnr_jnr-constants-0.9.0.jar ./home/myclusteruser/.ivy2/jars/com.github.jnr_jnr-enxio-0.9.jar ./home/myclusteruser/.ivy2/jars/com.github.jnr_jnr-x86asm-1.0.2.jar ./home/myclusteruser/.ivy2/jars/com.github.jnr_jnr-netdb-1.1.4.jar ./home/myclusteruser/.ivy2/jars/com.github.jnr_jnr-posix-3.0.15.jar ./home/myclusteruser/.ivy2/jars/com.github.jnr_jnr-unixsocket-0.8.jar  myclusteruser@hn0-test:/$ find . -name '*jna*.jar' 2>/dev/null ./usr/lib/hdinsight-scpnet/scp/jvm/jna-3.5.1.jar ./usr/hdp/2.4.2.0-258/storm/extlib/jna-3.5.1.jar ./home/myclusteruser/.ivy2/cache/net.java.dev.jna/jna/jars/jna-4.1.0.jar ./home/myclusteruser/.ivy2/jars/net.java.dev.jna_jna-4.1.0.jar 
  • i tried remove jars have included --packages option sure there no conflict it.

does got idea should make run using jna , jnr jars provide?


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -