Debugging a OAC-BDC Connection
Debuggn a connection between Oracle Analytics Cloud & Big Data Cloud
blog oac bdc sparkDebugging Oracle Analytics Cloud & Big Data Cloud
My job is pretty varied but one thing that stays constant is requests to connect OAC to BDC.
Tail OSA logs
Not everyone knows where to find these logs but just run the below command!
tail -f /u01/data/domain/fmw/user_projects/domains/bi/servers/AdminServer/logs/bi.log
In our case it reveals some cryptic output… but it seems that there is a network timeout. That means the issue probably isn’t on the OAC side
####<May 16, 2018 3:28:13,804 PM UTC> <Error> <oracle.bi.tech.services.dataset.StandaloneDatasetService> <aeoacs-bi-1> <bi_server1> <[STANDBY] ExecuteThread: '40' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <0bda8caa-ead8-4bb9-ac04-2173ae17e26c-000faa87> <1526484493804> <[severity-value: 8] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-000000> <Failed to create and convert dataset: javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: Read time out after 300000 millis
Tail BDC Logs
tail -f /data/var/log/spark2-thrift/spark-hive-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-aebigdata-bdcsce-1.out
reveals more cryptic errors… it looks like maybe the JVM ran out of space during a GC?
18/05/16 15:27:42 ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.
18/05/16 15:27:43 ERROR Utils: Uncaught exception in thread task-result-getter-0
java.lang.OutOfMemoryError: Java heap space
Exception in thread "task-result-getter-0" java.lang.OutOfMemoryError: Java heap space
18/05/16 15:37:37 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception
java.io.IOException: All datanodes DatanodeInfoWithStorage[100.65.22.122:50010,DS-f8d560be-7098-4529-9b01-7e26b089b17e,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1142)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:904)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:411)
Tail BDC spark logs
Logs on BDC are stored in a variety of places…
tail -f /data/var/log/spark2-thrift/spark-spark-org.apache.spark.deploy.history.HistoryServer-1-aebigdata-bdcsce-1.out
reveals the connection AND the subsequent error!
18/05/16 16:18:53 DEBUG Client: IPC Client (2118255842) connection to /100.65.22.122:8010 from blk_1073742054_1231 got value #1156
18/05/16 16:18:53 DEBUG ProtobufRpcEngine: Call: getReplicaVisibleLength took 10ms
18/05/16 16:18:53 DEBUG Client: stopping client from cache: org.apache.hadoop.ipc.Client@6399551e
18/05/16 16:18:53 ERROR FsHistoryProvider: Exception encountered when attempting to load application log hdfs://aebigdata-bdcsce-1.compute-599616642.oraclecloud.internal:8020/spark-history/application_1523503765653_0002.inprogress
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-2031294994-100.65.22.122-1523503732419:blk_1073742054_1231; getBlockSize()=244; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[100.65.22.122:50010,DS-f8d560be-7098-4529-9b01-7e26b089b17e,DISK]]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:428)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:336)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1540)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:300)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:300)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at org.apache.spark.scheduler.EventLoggingListener$.openEventLog(EventLoggingListener.scala:301)
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:643)
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:460)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$3$$anon$4.run(FsHistoryProvider.scala:352)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/05/16 16:18:53 DEBUG Client: IPC Client (2118255842) connection to /100.65.22.122:8010 from blk_1073742054_1231: closed
18/05/16 16:18:53 DEBUG Client: IPC Client (2118255842) connection to /100.65.22.122:8010 from blk_1073742054_1231: stopped, remaining connections 1
Check on last log
vim /data/var/log/spark2-thrift/spark-hive-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-aebigdata-bdcsce-1.out
more of the GC errors! Okay, corroborating evidence points to the JVM. We do have a lot of data hehehe….
18/05/16 19:01:40 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringCoding.decode(StringCoding.java:215)
at java.lang.String.<init>(String.java:463)
at java.lang.String.<init>(String.java:515)
at org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:1005)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply_6$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:303)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2378)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2378)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2378)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2382)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2382)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2793)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2382)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2358)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:245)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
18/05/16 19:01:40 ERROR SparkExecuteStatementOperation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:266)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Thank You StackOverflow
So I asked the interwebs and they replied “over yonder website doth thy answer live” and pointed me to the below link
I added spark.sql.thriftServer.incrementalCollect=true
to spark2-thirft-sparkconf.xml
in Ambari.
I went ahead and beefed up our spark-env.sh
too
SPARK_EXECUTOR_CORES="2" #Number of cores for the workers (Default: 1).
SPARK_EXECUTOR_MEMORY="2G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
SPARK_DRIVER_MEMORY="1080M" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)