tensorflow weights only 2 values change? -


i wrote simple nn tensorflow actuate real robotic finger.
problem is, after training hour, seemed has little bit learned in direction go, when @ weights in tensorboard, seemed 2 values gets updated, other values stay around 0 (to initialzed) ?

here code: https://github.com/flobotics/flobotics_tensorflow_controller/blob/master/nodes/listener.py

the loss decreasing should, looks good, if isnt :)

link picture of tensorboard weights

edit: tried minimize code this, hope ok ?

num_states = 200+200+1024+1024  #200 degree angle_goal, 200 possible degrees joint move, 1024 force values, 2 times num_actions = 9  #3^2=9      ,one stop-state, 1 different speed left, 1 diff.speed right, 2 servos  session = tf.session() build_reward_state()  state = tf.placeholder("float", [none, num_states]) action = tf.placeholder("float", [none, num_actions]) target = tf.placeholder("float", [none])  weights = tf.variable(tf.truncated_normal([num_states, num_actions], mean=0.1, stddev=0.02, dtype=tf.float32, seed=1), name="weights")  biases = tf.variable(tf.zeros([num_actions]), name="biases")  output = tf.matmul(state, weights) + biases  output1 = tf.nn.relu(output)  readout_action = tf.reduce_sum(tf.mul(output1, action), reduction_indices=1)  loss = tf.reduce_mean(tf.square(target - readout_action))  train_operation = tf.train.adamoptimizer(0.1).minimize(loss)  session.run(tf.initialize_all_variables())   while 1==1: if a==0:     #a==0 run once @ beginning, a==1,2,3 running      state_from_env = get_current_state()  #we array of (1,2448)     last_action = nothing #array (1,9), e.g. [0,0,1,0,0,0,0,0,0]     a=1 if a==1:     random action or learned action, array of (1,9)     run action (move servo motors)     save action in last_action     a=2 if a==2:     stop servo motors (so movements not continous)     a=3 if a==3:     get_current_state()   #arrray of (1,2448)     reward             # 1 value     observations.append((last_state, last_action, reward, current_state))      if training_time:         random sample observations         agents_reward_per_action = session.run(output, feed_dict={state: current_states})         agents_expected_reward.append(rewards[i] + future_reward_discount * np.max(agents_reward_per_action[i]))         _, result = session.run([train_operation, merged], feed_dict={state: previous_states, action : actions, target: agents_expected_reward})      #update values     last_state = current_state     a=1 


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -