python 2.7 - Distributed TensorFlow example doesn't work on TensorFlow 0.9 -
i'm trying out this tensorflow distributed tutorial same operating system , python version on own computer. create first script , run in terminal, open terminal , run second script , following error:
e0629 10:11:01.979187251 15265 tcp_server_posix.c:284] bind addr=[::]:2222: address in use e0629 10:11:01.979243221 15265 server_chttp2.c:119] no address added out of total 1 resolved traceback (most recent call last): file "worker0.py", line 7, in <module> task_index=0) file "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/server_lib.py", line 142, in __init__ server_def.serializetostring(), status) file "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() file "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status pywrap_tensorflow.tf_getcode(status)) tensorflow.python.framework.errors.internalerror: not start grpc server
i similar error when trying official distributed tutorial.
edit: tried on machine have same packages , following error log:
e0629 11:17:44.500224628 18393 tcp_server_posix.c:284] bind addr=[::]:2222: address in use e0629 11:17:44.500268362 18393 server_chttp2.c:119] no address added out of total 1 resolved segmentation fault (core dumped)
what issue?
the problem using same port number (2222) both workers. each port number can used 1 process on given host. that's error "bind addr=[::]:2222: address in use" means.
i'm guessing either have "localhost:2222" twice in cluster specification, or have specified same task_index 2 tasks.
i hope helps!
Comments
Post a Comment