Allan Luders
Francisco Calderón
Eric Schearer
All the Matlab and C files used for this assignment can be downloaded here. The description of each file is included in each individual part of the writeup.
We varied the gains for swing-ups from the stable equilibrium position to the horizontal position. With a constant Kd = 0.001 we varied Kp until we saw unstable low -frequency oscillation. Then using the maximum stable Kp value we varied Kd until we observed the unstable high-frequency oscillation. As in the assignment we found the robot to go unstable around Kp = 0.5 and Kd = 0.01. Below are plots of position and velocity vs. time using these gains. The plots were created using the Matlab files ./data/data_load(‘log23’) and ./data/plots.m.


Part 2: Get the fastest step response with a PD controller
Here we used the maximum Kd = 0.01 and varied Kp starting at 0.5 and decreasing in increments
of 0.05. The fastest swing-up time
keeping within 0.1 radian bounds is 0.28 seconds using Kp
= 0.35 and Kd = 0.01. The position vs. time plot using these gains
is shown below. The rise time
computation was done in Matlab using ./data/data_load(‘log25’) and ./data/riseplot.m.

For the calibration of the encoder we knew the encoder had exactly 1000 tics per revolution, therefore we assumed the calibration coefficient is just:
.
For the tachometer calibration we used the data from a step response with the pendulum starting in its equilibrium position (looking down). For this case the pendulum accelerated and did several complete turns. Next, we differentiated the encoder data to obtain speed in rads/s. Next the data is shown:

After that we plotted both the encoder-based speed versus the tachometer speed in an X-Y plot, next the figure is shown:

After looking at the plot we decided that the relation was very linear therefore we fitted a line using least squares in Matlab. The working principle of this calibration method is that the encoder-based speed is bad at low speeds, but improves at higher speeds. Therefore by fitting a curve the low speed behavior can be estimated from what happens at high speeds assuming it is linear. Also the fitting neutralizes the high frequency noise generated by differentiating the encoder position data.
The result of the fitting is of the form: y=Ax+b, were the A term is the calibration term and b is the offset which was neglected (was very small).
The calibration coefficient found for the tachometer is:
![]()
The calibration was done in Matlab using the script: calibration.m
We first built an analytical model of the robot taking into account the stiction (static friction when the robot is not moving), coulomb friction (constant amount resisting the movement) and viscous friction (proportional to velocity). The resulting system is the following:
, where:

Note: The coulomb friction only acts when the system is moving, while the stiction only acts when the system is still.
Stiction was calculated independently of the other terms by applying a ramped voltage to the pendulum starting in its upright position and continuing to increase the voltage until the pendulum fell from the upright position. We conducted the same experiment three times and averaged the voltage required to achieve an arbitrary (something more than zero) threshold velocity. Below are velocity and voltage vs. time plots for one of the three tests. The average voltage required to overcome stiction in these tests, tstiction = 0.1447 V. These results were found using ./data_load(‘ramp_volt_02_up.txt’) and ./data/static.m.


We made 36 experiments on the real system (plus the data obtained for the controller used in previous sections) by applying several open-loop step voltage inputs and logging the response of the system. We did tests for several voltage magnitudes, for positive and negative inputs, and for starting points in the unstable and stable equilibrium. We then used Matlab to obtain the real parameters of the system, by fitting the model to the real data and trying it to appear equal according to an arbitrary cost function.
Given that we are only asked to find the moment of inertia and friction model, and in order to simplify the problem of finding the best parameters we grouped the mgr term as a single term, and we never explicitly calculate its individual components.
In
first place, we calculated the magnitude of the mgr term by using the
information obtained when controlling the pendulum with the PID controller in
the horizontal position. The torques (in Volt units) involved in maintaining
the horizontal position are only
and stiction, which together have a mean magnitude of ~0.7V. As
the control was performed with a starting point in the stable equilibrium, we
assume that the control applied just the minimum voltage to get into the
equilibrium (therefore, stiction is applying the
maximum possible torque against gravity, in favor of the motor). Therefore we
subtract the value of stiction to ~0.7 in order to
determine mgr.
Once we got this value we made a one-step simulator to obtain the acceleration, velocity and position, given position and velocities in the last cycle, input torque, time step, moment of inertia and friction coefficients.
The acceleration is obtained directly from the formula exposed at the beginning of this section, while velocities and positions are obtained by using a simple integrator:

In order to get the unknown parameters, we used the fminsearch and fminunc functions in Matlab. We defined as score the function equivalent to the sum squared of the differences of positions and velocities between the model and the obtained data, for all the 36 experiments. This is:
![]()
Not always fminsearch and fminunc converge to the exact same solution, therefore in order to determine the “real” parameters we use both results for all 36 experiments, and calculate an inversely-weighted average of the parameters based on the obtained value of the cost function. First, we arbitrarily decide to eliminate all parameters where the cost is greater than 1, as we consider that the result didn’t converge. We also eliminate all parameters which have a negative sign as this is a physically impossible solution. Finally, the value of each parameter is calculated in the following way:

We obtained the following results:
I = 0.0156 [V*s2/rad](moment of inertia respect to the rotation axis)
b = 0.0057 [V*s/rad] (viscous friction coefficient)
Tc =0.1647 [V] (Coulomb)
Below, we graphically show the obtained model by fmisearch for two different experiments (step functions) to show how accurate our model looks:
Rise from stable equilibrium, for an input of -1V:


Fall from unstable equilibrium, for an input of 0.75V:


We implemented the described algorithm in Matlab using the following files:
myopt.m: calls fminsearch and fminunc to get the optimal parameters for 1 data set.
score.m: used by fminsearch and fminunc to determine the cost of the function. This function calls solver.m (one-step simulator) multiple times in order to get the total cost difference between the input parameters and the data.
solver.m: is the actual simulator, executing a discrete step and giving as an output the velocity and position of the system.
commander.m: is the top-level program. For each file in the “data2” directory executes the myopt.m, stores the obtained parameters and obtained costs and calculates the weighted average output.
Since we already had a good non-linear model of the system from part 2, we decided to search for an optimal swing-up trajectory using Dynamic Programming. For this purpose we used the code given for homework 2.
First we modified the dynamics.c file in order to accept the new model. For doing this we replaced the M*G*R*L/2 gravitational constant with the value we obtained from part two. We did the same for the inertia term. We also added the effects of the Coulomb friction and stiction effect to the model.
Once the model was ready, we used the program “linearize-dynamics” provided with the Dynamic Programming files to obtain a linear model around the position of 180°. The A and B matrices where then used with the Matlab function “dlqr” to obtain the K gains and S costs.
The linear model found was:
A =
1.0018 0.0100
0.3561 0.9963
B =
0.0032
0.6410
The output from the dlqr function is (more weight was
given to the R matrix in order to minimize the action usage):
K =
1.1251
0.3272
S =
0.2422
0.0187
0.0187
0.0062
After that we added these values to the dp-subroutines.c file to initialize the cost function and to operate the LQR controller once the solution is near the desired states. Also the actions where limited to [-1, 1], which are the possible voltage values that can be applied.
We started doing dynamic programming with a state penalty of 0.1, and the results where the following:

The trajectory found by DP optimized the score function used which penalizes the action usage and then time to get to the desired position. The total cost was of 0.37 with a state_penalty of 0.1. Since for this homework there is no penalty for the time to get to the desired state we lowered the state_penalty to a level in which the pendulum had the enough incentives to get to desired position before the 10 seconds.
The new lowered state_penalty found was of 0.003, and had the following shape:

Now the DP score was lowered to just 0.1, a much better result. The following plot shows the cost function:

The torque (voltage) policy is the following:

Files: All the dynamic programming files are in the folder dynamic_programming folder.
We planned an optimal trajectory for the system using dynamic programming, as described in part 4. We saved the data in a text file (traj_2_osc file), and implemented a PD controller on the real system to follow the desired trajectory. The PD controller has a time-dependant setpoint which varies with time according to the reference text file. We empirically tuned the controller in order for it to follow the trajectory “appropriately”, which happens when we consistently obtain low scores for the given gains. We defined the score as the sum of the applied inputs (voltages) over the complete run (10 seconds), weighted according to the total time the input was applied. Therefore, the score is defined in the following way:
, with v[t] in Volts
The least score we were able to obtain consistently was 0.1989, for proportional and derivative gains equivalent to Kp=4.5 and Kd=0.019, respectively.
In order to compare our results, we made our best efforts to tune a standard PD controller with a fixed setpoint to the vertical position to obtain the least possible score. The best score we got was 0.579 for proportional and derivative gains equivalent to Kp=0.03 and Kd= 0.005, respectively.
We were able to prove the consistent better performance of our controller, which uses a 34% of the energy required by the PD controller. Naturally, these numbers are empirical results, and in both cases tuning may eventually improve results on both systems. Anyway, although the results are not absolute magnitudes, our planned trajectory provided always significant better scores than the PD.
In order to show how well we tuned the system, we display below how well the real system follows the reference trajectory generated using dynamic programming:

We clearly see that the resulting system follows the reference relatively closely, meaning that we did a descent job tuning the PD, although it might be still be possible to make improvements. We must emphasize that we tuned the PD controller in order to decrease the empirical and not in obtaining a close fit between the reference and real curves. Anyway, it is not a surprise that both coincide.
The following figures show the difference of applied voltages and obtained velocities of the DP model respect to the real system:


The obtained results for velocities and applied voltages confirm the correctness of our model. Although we did not make any control of the angular velocity nor we did apply the expected torques in an open loop, the shape of the obtained curves and the expected values by dynamic programming are pretty much the same. In the case of velocities, the obtained results are almost identical; in the case of the required voltages to make the desired output, they are similar in shape but show differences in the peak magnitudes. The differences are likely caused because there are factors that we didn’t include in the model, e.g. the time response of the power amplifier. There probably is a delay of the power amplifier after it is provided with the commanded voltage, factor that we didn’t consider as we only used step functions to obtain the model parameters.
Programming Code
The described control is implemented in the sample.cc file, which uses the code of Matthew McNaughton as a base.