The StrongFit Equation: State vs. Action

January 26, 2018

At its core, the principles that underlie StrongFit are a system of learning. As the system continues to evolve, so do the structures that shape it and the concept of reinforcement learning presents a powerful allegory to how humans grow and adapt. Though easily buried by an abstract algorithm, the basic of Q-learning is this: success is found in moving beyond our mistakes and that our intention within the present matters. Over the next few weeks, we’ll be breaking down the Q-learning algorithm and what we can learn about human nature from this advancement in machine learning. 


Q-learning works by incrementally updating the expected values of actions in states based on the maximum possible reward. For every possible state, every possible action is given a value that measures both immediate reward and the expected future reward based on the new resulting state from taking that action. 


The point of emphasis here is that action values are measured in perspective of the state in which they are performed and/or the state they will lead to and not for the actions themselves. This is the fundamental value in principle over method and knowledge over information - the message that memorization is not learning and that the ‘how’ is more important than the ‘what’. The state in which we perform an action is more important than the action we take itself and our intention and perception will always be the greater determinant of the outcome.


For the coach, this means that we have to be able also to teach in this way as well. Cueing position preaches action according to the position in what looks right. Yet even a textbook-perfect deadlift has often resulted in a painful experience for the athlete, leading to much frustration of the individual and coach. On the other hand, teaching an athlete to become aware of tension may look less than perfect at first, but if the value is placed in A) the immediate reward that the athlete does not have to experience pain in order to perform the action and B) that the athlete will also be stronger and healthier in this newer state of mobility, then the best action has been found. 


The same holds true in mindset practice and goal-setting. The point-system that we give to achieving our goals is only as accurate as the motivations we make them from or the impact we expect them to provide us with moving forward. If the reasons we are in pursuit of our goals is out of fear or expectation, it does not matter the action that we take anyway. Likewise, if the achievement of our goals does not contribute to the greater picture then we don’t benefit there, either. The reward of a 10lbs PR loses its value when the cost is an injury or another detriment to our well-being, including the measure of sacrifices made for the achievement. 


We can’t cheat our way to success, but we can’t earn it through an action for the sake of action, either. The best method of learning is in accepting where we’re at and identifying what we must do in order to maximize our potential for how we are headed for the greatest reward.