amazon web services - EMR Hadoop-streaming job fails while looking for container_tokens -
amazon web services - EMR Hadoop-streaming job fails while looking for container_tokens -
attempt run emr streaming job fails with:
2014-10-15 18:36:36,560 error [main] org.apache.hadoop.yarn.yarnuncaughtexceptionhandler: thread thread[main,5,main] threw exception. java.io.ioexception: exception reading /mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1413396780703_0003/container_1413396780703_0003_01_000218/container_tokens @ org.apache.hadoop.security.credentials.readtokenstoragefile(credentials.java:177) @ org.apache.hadoop.security.usergroupinformation.loginuserfromsubject(usergroupinformation.java:744) @ org.apache.hadoop.security.usergroupinformation.getloginuser(usergroupinformation.java:703) @ org.apache.hadoop.security.usergroupinformation.getcurrentuser(usergroupinformation.java:605) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:98) caused by: java.io.filenotfoundexception: /mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1413396780703_0003/container_1413396780703_0003_01_000218/container_tokens (no such file or directory) @ java.io.fileinputstream.open(native method) @ java.io.fileinputstream.<init>(fileinputstream.java:146) @ org.apache.hadoop.security.credentials.readtokenstoragefile(credentials.java:172) ... 4 more
the failure indeterminate, frequent on big clusters. how launch cluster:
elastic-mapreduce --create --alive --instance-group master --instance-type m1.large \ --instance-count 1 \ --instance-group core --instance-type r3.xlarge \ --instance-count 200 --hadoop-version "2.4.0" \ --ami-version "3.2.1" --enable-debugging --json ./emr_config \ --bootstrap-action 's3://path/to/bootstrap.sh' --bootstrap-name bootstrap
and step configuration (emr_config):
[ { "name": "step name", "actiononfailure": "continue", "hadoopjarstep": { "jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar", "args": [ "-files", "s3://path/to/mapper.py", "-input", "s3://path/to/input/", "-output", "s3://path/to/output/", "-mapper", "mapper.py", "-reducer", "/bin/cat", "-jobconf", "mapreduce.map.java.opts=-xmx22528m", "-jobconf", "mapreduce.map.memory.mb=23424", "-jobconf", "mapreduce.task.timeout=24000000", "-jobconf", "mapreduce.job.maps=200", "-jobconf", "mapreduce.tasktracker.map.tasks.maximum=1", "-jobconf", "mapred.map.tasks.speculative.execution=false" ] } } ]
anyone know source of problem, or workaround?
hadoop amazon-web-services hadoop-streaming yarn emr
Comments
Post a Comment