tensorflow weights only 2 values change? -
i wrote simple nn tensorflow actuate real robotic finger.
problem is, after training hour, seemed has little bit learned in direction go, when @ weights in tensorboard, seemed 2 values gets updated, other values stay around 0 (to initialzed) ?
here code: https://github.com/flobotics/flobotics_tensorflow_controller/blob/master/nodes/listener.py
the loss decreasing should, looks good, if isnt :)
edit: tried minimize code this, hope ok ?
num_states = 200+200+1024+1024 #200 degree angle_goal, 200 possible degrees joint move, 1024 force values, 2 times num_actions = 9 #3^2=9 ,one stop-state, 1 different speed left, 1 diff.speed right, 2 servos session = tf.session() build_reward_state() state = tf.placeholder("float", [none, num_states]) action = tf.placeholder("float", [none, num_actions]) target = tf.placeholder("float", [none]) weights = tf.variable(tf.truncated_normal([num_states, num_actions], mean=0.1, stddev=0.02, dtype=tf.float32, seed=1), name="weights") biases = tf.variable(tf.zeros([num_actions]), name="biases") output = tf.matmul(state, weights) + biases output1 = tf.nn.relu(output) readout_action = tf.reduce_sum(tf.mul(output1, action), reduction_indices=1) loss = tf.reduce_mean(tf.square(target - readout_action)) train_operation = tf.train.adamoptimizer(0.1).minimize(loss) session.run(tf.initialize_all_variables()) while 1==1: if a==0: #a==0 run once @ beginning, a==1,2,3 running state_from_env = get_current_state() #we array of (1,2448) last_action = nothing #array (1,9), e.g. [0,0,1,0,0,0,0,0,0] a=1 if a==1: random action or learned action, array of (1,9) run action (move servo motors) save action in last_action a=2 if a==2: stop servo motors (so movements not continous) a=3 if a==3: get_current_state() #arrray of (1,2448) reward # 1 value observations.append((last_state, last_action, reward, current_state)) if training_time: random sample observations agents_reward_per_action = session.run(output, feed_dict={state: current_states}) agents_expected_reward.append(rewards[i] + future_reward_discount * np.max(agents_reward_per_action[i])) _, result = session.run([train_operation, merged], feed_dict={state: previous_states, action : actions, target: agents_expected_reward}) #update values last_state = current_state a=1
Comments
Post a Comment