Rotary Inverted Pendulum + RL Demo
Objective
The objectives of this laboratory experiment are as follows:
Obtain the linear state-space representation of the rotary pendulum system.
Design a controller that balances the pendulum in its upright position using Pole Placement.
Train a controller that balances the pendulum using Reinforcement Learning.
Simulate the closed-loop system to ensure the given specifications are met.
Implement the balance controller on the Quanser Rotary Pendulum system and evaluate its performance.
Equipment
QUBE-Servo 3 Rotary Inverted Pendulum Model
A picture of the QUBE-Servo 3 with rotary pendulum module is shown in Fig 1. The numbered components in Fig. 1 are listed in Table 1 along with the numerical values of the system parameters in Table 2.

Table 1: QUBE-Servo 3 components (Figure 1)
1
Rotary servo
2
Rotary arm housing
3
Pendulum encoder
4
Rotary arm
5
Pendulum link
Table 2: QUBE-Servo 3 main parameters
Mass of pendulum
0.024
Total length of pendulum
0.129
Pendulum moment of inertia about center of mass
3.3282 · 10-5
Pendulum viscous damping coefficient as seen at the pivot axis
5 · 10-5
Mass of rotary arm
0.095
Rotary arm length from pivot to tip
0.085
Rotary arm moment of inertia about its center of mass
5.7198 · 10-5
Rotary arm viscous damping coefficient as seen at the pivot axis
0.001
Motor armature resistance
8.4
Current-torque constant
0.0422
Back-emf constant
0.0422
A. Modeling
Model Convention
A simplified form of the inverted pendulum system used in the development of the mathematical model is shown in Figure 2. The rotary arm pivot is attached to the QUBE-Servo 3 system. The arm has a length of , a moment of inertia of , and its angle, , increases positively when it rotates counterclockwise (CCW). The servo (and thus the arm) should turn in the CCW direction when the control voltage is positive, i.e. > 0.
The pendulum link is connected to the end of the rotary arm. It has a total length of and its center of mass is at . The moment of inertia about its center of mass is . The inverted pendulum angle, , is zero when it is perfectly upright in the vertical position and increases positively when rotated CCW.

Nonlinear Equations of Motion
Instead of using classical (Newtonian) mechanics, the Lagrange method is used to find the equations of motion of the system. This systematic method is often used for more complicated systems such as robot manipulators with multiple joints.
Specifically, the equations that describe the motions of the rotary arm and the pendulum with respect to the servo motor voltage, i.e., the dynamics, are obtained using the Euler-Lagrange equation:
The variables are called generalized coordinates, the variables are called generalized forces and is the Langrangian (difference between the kinetic and potential energies of the system). For this system let
where, as shown in Figure 2, is the rotary arm angle and is the inverted pendulum angle. The corresponding angular rates are
With the generalized coordinates defined, the Euler-Lagrange equations for the rotary pendulum system are
The Lagrangian of the system is described by
where is the total kinetic energy of the system and is the total potential energy of the system. Thus the Lagrangian is the difference between a system’s kinetic and potential energies.
The generalized forces are used to describe the nonconservative forces (e.g. friction) applied to a system with respect to the generalized coordinates. In this case, the generalized force acting on the rotary arm is
and acting on the pendulum is
Our control variable is the input servo motor voltage, . Opposing the applied torque is the viscous friction torque, or viscous damping, corresponding to the term . Since the pendulum is not actuated, the only force acting on the link is the damping. The viscous damping coefficient of the pendulum is denoted by .
Once expressions for the kinetic and potential energy are obtained and the Lagrangian is found, then the task is to compute various derivatives to get the EOMs. After going through this process, the nonlinear equations of motion for the Rotary Pendulum are:
The torque applied at the base of the rotary arm (i.e. at the load gear) is generated by the servo motor as described by the equation 7. Refer to Table A in the Appendix for the Rotary Servo parameters.
Linearization
Linearization of a nonlinear function about a selected point is obtained by retaining up to first order term in the Taylor Series expansion of the function about the selected point. For example, linearization of a two variable nonlinear function where
about the point
can be written as
Linearization of Inverted Pendulum Equations
The nonlinear equations of the inverted pendulum system obtained as equations (7) and (8) are linearized about the equilibrium point with the pendulum in the upright position, i.e.,
, , , , ,
Linearization of Equation (7) about the above equilibrium state gives
where, from equation (9) the servo motor torque coefficient is
and the back emf coefficient is
Likewise, linearization of equation (8) about the equilibrium point with the pendulum in the upright position gives
where is the acceleration due to gravity. Note that the negative sign of the gravity torque ( term) in equation (11) indicates negative stiffness with the pendulum in the vertically upright position.
Equations (10) and (11) can be arranged in the matrix form as
Where mass matrix , damping matrix , and the stiffness matrix become
Defining the state vector , output vector and the control input as
Equation (12) can be rewritten in state space form as
where the system state matrix , control matrix , output matrix and the feedthrough of input to the output matrix become
In the equations above, note that is a matrix of zeros, is a matrix of zeros, and is a identity matrix. Using the parameters of the system listed in Table 2, the linear model matrices and can be computed.
Analysis: Modeling
Download the Lab4_Indv.zip file to your personal device and extract the folder contents.
Open the rotpen_student.mlx live script and run the Modeling section. It will automatically load the parameters required for the state-space representation, and subsequently generate the A, B, C and D matrices required for the upcoming analysis. Please refer to Table 2 for additional information regarding the parameters. Note: The representative C and D matrices have already been included. The actuator dynamics have been added to convert your state-space matrices to be in terms of voltage. Recall that the input of the state-space model is the torque acting at the servo load gear. However, we control the servo input voltage instead of control torque directly. The script uses the voltage torque relationship given in Equation 9 to transform torque to voltage.
Note down your state-space matrices for your report. Note: You may want to cross-check the state-space matrix with TAs before proceeding to balance control.
Find the open-loop poles of the system. Hint: Use
eig(A).
B. Balance Control
Specification
The control design and time-response requirements are: Specification 1: Damping ratio: 0.6 < < 0.8 Specification 2: Natural frequency: 3.5 rad/s < < 4.5 rad/s Specification 3: Maximum pendulum angle deflection: < 15 deg. Specification 4: Maximum control effort / voltage: < 10 V. The necessary closed-loop poles are found from specifications 1 and 2. The pendulum deflection and control effort requirements (i.e. specifications 3 and 4) are to be satisfied when the rotary arm is tracking a degree angle square wave.
Stability
The stability of a system can be determined from its poles ([2]):
Stable systems have poles only in the left-half of the complex plane.
Unstable systems have at least one pole in the right-half of the complex plane and/or poles of multiplicity greater than 1 on the imaginary axis.
Marginally stable systems have one pole on the imaginary axis and the other poles in the left-half of the complex plane.
The poles are the roots of the system’s characteristic equation. From the state-space, the characteristic equation of the system can be found using
where is the determinant of a matrix, is the Laplace operator, and is the identity matrix. These are the eigenvalues of the system matrix .
Controllability
If the control input of a system can take each state variable, where , from an initial state to a final state in finite time then the system is controllable, otherwise it is uncontrollable ([1]).
Rank Test The system is controllable if the rank of its controllability matrix
equals the number of states in the system, i.e.
Companion Matrix
For a controllable system with nxn system matrix A and nx1 control matrix B. The companion matrices of A and B are
and
Where are the coefficients of the characteristic equation of the system matrix A written as
Now define W,
where is the controllability matrix defined in Equation 16 and
Then
and
Pole Placement Theory
If are controllable, then pole placement can be used to design the controller. Given the control law , the state-space model of equation (13) becomes
We can generalize the procedure to design a gain for a controllable system as follows:
Step 1 Find the companion matrices and . Compute . Step 2 Compute to assign the poles of to the desired locations.
Step 3 Find to get the feedback gain for the original system . Remark-1: It is important to do the conversion. Remember that represents the actual system while the companion matrices and do not.
Remark-2: The entire control design procedure using the pole placement method can be simply done in MATLAB using the function called 'place' or 'acker'. For a selected desired set of closed loop poles DP, the full state feedback gain matrix is obtained from
>> K = acker(A,B,DP);
Desired Poles
The rotary inverted pendulum system has four poles. As depicted in Figure 3, poles and are the complex conjugate dominant poles and are chosen to satisfy the natural frequency, , and the damping ratio, , as given in the specifications. Let the conjugate poles be
and
where and is the damped frequency. The remaining closed-loop poles, and , are placed along the real-axis to the left of the dominant poles, as shown in Figure 3.

Simulation Model with Feedback
The feedback control loop that balances the rotary pendulum is illustrated in Figure 4. The reference state is defined as
where is the desired rotary arm angle. The controller is
Note that if then , which is the control used in the pole-placement algorithm.

When running this on the actual system, the pendulum begins in the hanging, downward position. We only want the balance control to be enabled when the pendulum is brought up around its upright vertical position. The controller is therefore
where is the angle about which the controller should engage. Also is the pendulum angle. For example if degrees, then the control will begin when the pendulum is within ±10 degrees of its upright position, i.e. when degrees.
Reinforcement Learning
Fundamentals
Although we have found a sufficient model for the Inverted Pendulum system, there are often systems where the plant dynamics are difficult or impossible to model. In this case, we look to model-free control.
Reinforcement learning algorithms, which include model-free algorithms, seek to train an agent. The agent receives observations of the plant system in addition to feedback from a reward function, and outputs a corresponding action according to its policy. The agent may be trained on the physical system in a series of discrete training episodes. Following each episode, the agent updates its policy according to the specific reinforcement learning algorithm in use. A summary of the reinforcement learning training loop is shown in Fig. 5.

Following training, the agent may be deployed on the system to ideally achieve the desired behavior. The particular reinforcement learning algorithm used in this lab is the deep deterministic policy gradient (DDPG) algorithm, a model-free actor-critic algorithm. For more information about the DDPG algorithm in particular, please visit the corresponding Mathworks article.
Reinforcement Learning for the Inverted Pendulum System
To implement the DDPG algorithm, we need a reward function, which evaluates the observed performance of the system. A good reward function promotes behavior we desire in the system, and penalizes behavior we want to eliminate. In this case, we seek to command the system to Additionally, we especially want to avoid the servo angle exceeding the physical limit of the system at , with a margin of error such that , the pendulum angle exceeding the balance control limit of , and the input voltage limit of . Thus, we use the following quadratic reward function
where are weights. This function covers all of the desired behavior we seek from the system. By default, the weights are and reflecting the relative importance of each state.
For a model-free application, an engineer would typically train the reinforcement learning model on the physical plant system or a digital twin of the plant system. However, doing this during a 2.5 hour lab session is impractical, as this would require repeatedly raising the pendulum for 1000 training episodes. Consequently, we train our reinforcement learning agent in simulation on the state space model we developed in part A.
B.1 Experiment: Training the Reinforcement Learning Controller
Download the Lab4_Group.zip file to the lab PC and extract the folder contents.
Open the rotpen_group.mlx live script and run the Initialization section. This section generates all necessary parameters for the reinforcement learning training. Ensure that the drop down menu is configured to 'train new agent.'
Run the Observation and Action Signals section. This section creates the learning environment for the reinforcement learning agent, configures the observation signal read by the agent, and configures the action signal generated by the agent.
Run the Create DDPG Agent section. This section creates and configures the critic and actor neural networks, and compiles these networks into a single agent.
Run the Train/Load Agent section. This section begins the training session. By default, the training session is configured to contain 1000 episodes with a random starting pendulum angle between -20 and 20 degrees.
The s_rotpen_train_rl.slx Simulink diagram should automatically open, in addition to a Reinforcement Learning Training Monitor window. Open the 'alpha' and 'theta' scopes and configure your monitor so you can clearly see all three windows, as in Fig. 6. You should now be able to watch as the reinforcement learning agent trains on your state space model.
Wait for the training to complete. This should take 20-30 minutes. Ensure that the main MATLAB window is open and visible to avoid slowdown due to CPU allocation. Proceed to the next section to continue your pole placement control design and simulation while the reinforcement learning training continues in the background. On training completion, your trained agent will be saved to the current directory as agent_student.mat.

B.2 Experiment: Designing the Pole Placement Controller
Select and from the table below. Ensure to choose a different set of values from your groupmates
Parameter12345678(rad/s)
3.7544.253.754.253.7544.250.650.650.650.70.70.750.750.75In the rotpen_student.mlx live script open on your personal device, go to the Pole Placement Balance Control section. Enter the chosen
zeta
andomega_n
values.Determine the locations of the two dominant poles and based on the specifications and enter their values in the MATLAB live script. Ensure that the other poles are placed at
p3 = -30
andp4 = -40
. Hint: Use equation 27.Find gain
K
using a predefined Compensator Design MATLAB commandK = acker(A,B,DP)
, which is based on pole-placement design. Note:DP
is a row vector of the desired poles found in Step 3.
For sanity check, if you use a damping ratio of 0.7 and a natural frequency of 4 rad/s, you should get approximately K = [-12 63 -5.5 7].
B.3 Experiment: Simulating the Pole Placement Controller
The s_rotpen_bal.slx SIMULINK diagram shown in Fig. 7 is used to simulate the closed-loop response of the Rotary Pendulum using the state-feedback control described in Balance Control with the control gain K found above. The Signal Generator block generates a 0.1 Hz square wave (with an amplitude of 1). The Amplitude (deg) gain block is used to change the desired rotary arm position. The state-feedback gain K is set in the Control Gain gain block and is read from the MATLAB workspace. The SIMULINK State-Space block reads the A, B, C, and D state-space matrices that are loaded in the MATLAB workspace. The Find State X block contains high-pass filters to find the angular rates of the rotary arm and pendulum.

Ensure you have run the Pole Placement Balance Control section of the rotpen_student.mlx live script. Ensure the gain K you found is loaded in the workspace (type K matrix in the command window).
Open and run the s_rotpen_bal.slx Simulink model for 10 seconds. The responses in the scopes shown in Fig. 8 were generated using an arbitrary feedback control gain. Note: When the simulation stops, the last 10 seconds of data is automatically saved in the MATLAB workspace to the variables
data_theta
,data_alpha
, anddata_Vm
.Figure 8: Balance control simulation Save the data corresponding to the simulated response of the rotary arm, pendulum, and motor input voltage obtained using your obtained gain K. Note: The time is stored in the
data_theta(:,1)
vector, the desired and measured rotary arm angles are saved in thedata_theta(:,2)
anddata_theta(:,3)
arrays. Similarly, the pendulum angle is stored in thedata_alpha(:,2)
vector, and the control input is in thedata_Vm(:,2)
structure.Measure the pendulum deflection and voltage used. Are the specifications given satisfied?
C. Controller Implementation
C.1 Experiment: Implementing the Pole Placement Controller
In this section, the state-feedback control that was designed and simulated in the previous sections is run on the actual Rotary Pendulum device.
Experiment Setup
The q_rotpen_bal SIMULINK diagram shown in Fig. 9 is used to run the state-feedback control on the Quanser Rotary Pendulum system. The Rotary Pendulum Interface subsystem contains QUARC blocks that interface with the DC motor and sensors of the system. The feedback developed in the previous section is implemented using a Simulink Gain block.

Continue in the rotpen_group.mlx script.
Go to the Implement Pole Placement on Hardware section and put the gain K you found in Step 4 of Designing the Balance Control experiment.
Run the section. The q_rotpen_bal.slx SIMULINK diagram should open automatically.
As shown in Figure 7, the SIMULINK diagram is incomplete. Add the necessary blocks from the Simulink library to implement the balance control.
You need to add a switch logic to implement Equation 30. Use a Multi-port switch with 2 data points and the zero-based contiguous setting. The output from the compare to constant block will be 0 if false and 1 if true. Check your block with your TA.
Ensure that you connect the final signal going into the u(V) terminal of the Rotary Pendulum Interface to the u scope terminal.
Turn ON the QUBE-Servo 3.
Ensure the pendulum is in the hanging down position, with the rotary arm aligned with the 0 marking, and is motionless.
To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring
. This generates the controller code.
Press Connect
button under Monitor & Tune and Press Start
.
Once it is running, manually bring up the pendulum to its upright vertical position. You should feel the voltage kick-in when it is within the range where the balance control engages. Once it is balanced, the controller will introduce the ±20 degree rotary arm rotation.
The response should look similar to your simulation. Once you have obtained a response, click on the STOP button to stop the controller (data is saved for the last 10 seconds, so stop SIMULINK around 18-19 seconds once the response looks similar to Fig. 10).
CAUTION Be careful, as the pendulum will fall down when the controller is stopped.
Similar to the simulation Simulink model, the response data will be saved to the workspace. Copy and paste into your group's folder. Ensure that the data variables have 10 seconds of data saved.

C.2 Experiment: Implementing the Reinforcement Learning Controller
Proceed to the Implement Reinforcement Learning on Hardware section of the rotpen_group.mlx script.
Ensure the 'doPolicy' drop down box is set to 'true'.
Run the section. The q_rotpen_rl_student.slx Simulink diagram should open automatically.
To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring
. This generates the controller code.
Press Connect
button under Monitor & Tune and Press Start
.
Once it is running, manually bring up the pendulum to its upright vertical position, as you did in the previous section. You may need to try this a few times.
Observe the behavior of your controller. It is possible that your reinforcement learning controller is unable to balance the pendulum, or, if it is, the control is not very robust. This is OK! These are the natural consequences of Reinforcement Learning based control.
D. Swing-Up Demonstration
In this section a nonlinear, energy-based control scheme is developed to swing the pendulum up from its hanging, downward position. The swing-up control described herein is based on the strategy outlined in [3]. Once upright, the control developed to balance the pendulum in the upright vertical position can be used.
Pendulum Dynamics
The dynamics of the pendulum can be redefined in terms of pivot acceleration (see Fig. 11) as

The pivot acceleration, , is the linear acceleration of the pendulum link base. The acceleration is proportional to the torque of the rotary arm and is expressed as
Control Law based on Lyapunov Function
According to Lyapunov’s stability theory, a sufficient condition for asymptotic stability of a nonlinear system about an equilibrium point is that the first time derivative of a selected Lyapunov’s function () is negative, i.e.,
Given
sufficient condition for asymptotic stability is
Swing-up Control
Let us select a candidate Lyapunov function for arriving at the control law as a quadratic function of the difference in total energy () and the reference energy () when the pendulum is in equilibrium in the upright position, i.e.,
where the total energy () is the sum of kinetic energy and potential energy .
Also, the reference energy of the pendulum in equilibrium in its fully upright position as compared to its fully downward position becomes
Taking the time derivative of Equation 35, we get
Taking the time derivative of Equation 38, we get
Now, we replace the bracketed term on the right-hand side of Equation 41 using the equation of motion of the pendulum obtained in Equation 31 to get
Substituting Equation 42 in Equation 38, the time rate of change of the selected Lyapunov equation becomes
Now, we need to select such that for asymptotic stability. This can easily be achieved by selecting as
With the above selection of control law for the pivot acceleration, Equation 43 becomes
which guarantees .
The selected control law (Equation 44) will continuously decrease the difference between current energy () and the energy of the pendulum in the vertically up position (). Note that the selected control law is nonlinear, it changes sign for and .
Now, for the quickest change in energy, we may want to use the maximum controller input (acceleration of the pivot), i.e.,
but this controller can lead to chattering. Instead, we use
where is a tunable controller gain.
Recall that the acceleration of the pendulum pivot is related to the torque applied on the rotary arm
Additionally, from Equation 9 of the balance controller design section, we have
Then, the voltage supplied to the rotary base motor is obtained by combining Equations 49 and 50 as
Where from Equation 48,
The selected nonlinear control law will swing up the pendulum from the downward position towards the upright position. Once the pendulum is near the upright position, it is balanced around the fully upward position using the linear balance controller.
Combined Balance and Swing-up Control
The energy-based swing-up control can be combined with the balancing control in Equation 29 to obtain a control law that performs the dual tasks of swinging up the pendulum and balancing it. This can be accomplished by switching between the two control systems.
Basically, the same switching implemented for the balance control in Equation 30 is used. Only instead of feeding 0 V when the balance control is not enabled, the swing-up control is engaged. The controller therefore becomes
where
The parameter in Equation 51 is a user-selected range of over which the balance controller becomes active.
Experiment: Implementing the Swing up Control
Run the Part D: Swing-Up Demonstration part of the rotpen_group.mlx script.
The q_rotpen_swingup.slx Simulink model should open automatically.
Check if the correct gain K value is loaded onto the workspace.
To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring
. This generates the controller code.
Press Connect
button under Monitor & Tune and Press Start
. The pendulum should be moving back and forth slowly. Gradually increase until the pendulum goes up. You may do this by increasing the gain slider. When the pendulum swings up to the vertical upright position, the balance controller should engage and balance the link.
After the link swings up and is balanced, wait for ~5 seconds and stop the SIMULINK.
OPTIONAL: Save the data_alpha, data_theta, and data_Vm. Ensure that the data variables have 10 seconds of data saved.
Results for Report
A) Modeling
The linear state-space representation of the rotary inverted pendulum system, i.e., , , and matrices (numerical values).
Open-loop poles of the system.
B.2) Pole Placement Controller Design
Chosen and based on design specifications.
Corresponding locations of the two dominant poles and .
Gain vector .
B.3) Pole Placement Controller Simulation
Plots of the commanded position of the rotary arm (), simulated responses of the rotary arm (), pendulum (), and motor input voltage () generated using your obtained gain K.
Are Design Specifications 3 and 4 satisfied? Justify using the measured maximum pendulum deflection and motor input voltage values.
C.1) Pole Placement Controller Implementation
From Step B.3.10, plots of the commanded position of the rotary arm (), experimental responses of the rotary arm (), pendulum (), and motor input voltage () generated using the chosen gain K.
Are Design Specifications 3 and 4 satisfied? Justify using the measured maximum pendulum deflection and motor input voltage values.
Questions for Report
A) Modeling
Based on your open-loop poles found in Result A.2, is the system stable, marginally stable, or unstable?
Did you expect the stability of the inverted pendulum to be as what was determined? Justify.
B.2) Pole Placement Controller Design
For the questions below, calculations and intermediate steps must be shown.
Determine the controllability matrix of the system. Is the inverted pendulum system controllable? Hint: Use Equation 17.
Using the open-loop poles, find the characteristic equation of . Hint: The roots of the characteristic equation are the open-loop poles.
Find the corresponding companion matrices and . Hint: For , use the characteristic equation of A found in Question B.1.2 and Equation 19. For , use Equation 20.
Determine the controllability matrix of the companion system.
Determine the transformation matrix .
Check if and with the obtained matrices.
Using the locations of the two dominant poles, and , based on the specifications (Result B.1.1), and the other poles at and , determine the desired closed-loop characteristic equation. Hint: The roots of the closed-loop characteristic equation are the closed-loop poles.
When applying the control to the companion form, it changes to . Find the gain that assigns the poles to their new desired location. Hint: Use Equation 26 and find the corresponding characteristic equation. Compare this equation with the desired closed-loop characteristic equation found in Question B.1.7 to determine the gain vector .
Once you have found , find using Step 3 in Pole Placement Theory.
Compare the gain vector calculated using Pole Placement Theory (Question B.1.9) with the gain vector obtained using MATLAB (Result B.1.3).
D) Swing-Up Demonstration
Briefly summarize the swing-up controller experiment and your observations. Did the swing-up control behave as you expected?
References
[1] Norman S. Nise. Control Systems Engineering. John Wiley & Sons, Inc., 2008. (Improper citation, will fix.)
[2] TBA (Will fix)
[3] K. J. Åström and K. Furuta. Swinging up a pendulum by energy control. 13th IFAC World Congress, 1996. (Improper citation, will fix)
Last updated
Was this helpful?