Rotary Inverted Pendulum + RL Demo

Objective

The objectives of this laboratory experiment are as follows:

Obtain the linear state-space representation of the rotary pendulum system.
Design a controller that balances the pendulum in its upright position using Pole Placement.
Train a controller that balances the pendulum using Reinforcement Learning.
Simulate the closed-loop system to ensure the given specifications are met.
Implement the balance controller on the Quanser Rotary Pendulum system and evaluate its performance.

Equipment

Rotary pendulum (Quanser QUBE-Servo 3 Rotary inverted pendulum module)
Rotary servo base (Quanser QUBE-Servo 3)
MATLAB and SIMULINK

QUBE-Servo 3 Rotary Inverted Pendulum Model

A picture of the QUBE-Servo 3 with rotary pendulum module is shown in Fig 1. The numbered components in Fig. 1 are listed in Table 1 along with the numerical values of the system parameters in Table 2.

Table 1: QUBE-Servo 3 components (Figure 1)

ID#

Component

Rotary servo

Rotary arm housing

Pendulum encoder

Rotary arm

Pendulum link

Table 2: QUBE-Servo 3 main parameters

Symbol

Description

Value

Unit

Mass of pendulum

0.024

$\mathrm{kg}$

Total length of pendulum

0.129

$\rm{m}$

Pendulum moment of inertia about center of mass

3.3282 · 10^-5

$\rm{kg \cdot m^2}$

Pendulum viscous damping coefficient as seen at the pivot axis

5 · 10^-5

$\rm{N\cdot m\cdot s/rad}$

Mass of rotary arm

0.095

$\rm{kg}$

Rotary arm length from pivot to tip

0.085

$\rm{m}$

Rotary arm moment of inertia about its center of mass

5.7198 · 10^-5

$\rm{kg \cdot m^2}$

Rotary arm viscous damping coefficient as seen at the pivot axis

0.001

$\rm{N\cdot m\cdot s/rad}$

$R_m$

Motor armature resistance

8.4

$\Omega$

$k_t$

Current-torque constant

0.0422

$\text{N}\cdot\text{m} / \text{A}$

$k_m$

Back-emf constant

0.0422

$\text{V}\cdot\text{s} / \text{rad}$

A. Modeling

Model Convention

A simplified form of the inverted pendulum system used in the development of the mathematical model is shown in Figure 2. The rotary arm pivot is attached to the QUBE-Servo 3 system. The arm has a length of , a moment of inertia of , and its angle, $\bm\theta$ , increases positively when it rotates counterclockwise (CCW). The servo (and thus the arm) should turn in the CCW direction when the control voltage is positive, i.e. > 0.

The pendulum link is connected to the end of the rotary arm. It has a total length of $L_p$ and its center of mass is at . The moment of inertia about its center of mass is . The inverted pendulum angle, $\bm\alpha$ , is zero when it is perfectly upright in the vertical position and increases positively when rotated CCW.

Nonlinear Equations of Motion

Instead of using classical (Newtonian) mechanics, the Lagrange method is used to find the equations of motion of the system. This systematic method is often used for more complicated systems such as robot manipulators with multiple joints.

Specifically, the equations that describe the motions of the rotary arm and the pendulum with respect to the servo motor voltage, i.e., the dynamics, are obtained using the Euler-Lagrange equation:

\frac{d}{dt}\frac{\partial L}{\partial \dot{q_i}} - \frac{\partial L}{\partial q_i} = Q_i \qquad \qquad \tag{1}

The variables $q_i$ are called generalized coordinates, the variables $Q_i$ are called generalized forces and $L$ is the Langrangian (difference between the kinetic and potential energies of the system). For this system let

q(t) =\begin{bmatrix} \theta(t) & \alpha(t) \end{bmatrix}^T \qquad \qquad \tag{2}

where, as shown in Figure 2, $\theta(t)$ is the rotary arm angle and $\alpha(t)$ is the inverted pendulum angle. The corresponding angular rates are

\dot{q}(t)=\begin{bmatrix} \displaystyle \frac{d\theta(t)}{dt} & \displaystyle\frac{d\alpha(t)}{dt} \end{bmatrix} ^T \qquad \qquad \tag{3}

With the generalized coordinates defined, the Euler-Lagrange equations for the rotary pendulum system are

\frac{d}{dt} \frac{\partial L}{\partial \dot{\theta}} - \frac{\partial L}{\partial \theta} = Q_1 \qquad \qquad \tag{4a}

\frac{d}{dt}\frac{\partial L}{\partial \dot{\alpha}} - \frac{\partial L}{\partial \alpha} = Q_2 \qquad \qquad \tag{4b}

The Lagrangian of the system is described by

L = T - V \qquad \qquad \tag{5}

where $T$ is the total kinetic energy of the system and $V$ is the total potential energy of the system. Thus the Lagrangian is the difference between a system’s kinetic and potential energies.

The generalized forces $Q_i$ are used to describe the nonconservative forces (e.g. friction) applied to a system with respect to the generalized coordinates. In this case, the generalized force acting on the rotary arm is

Q_1 = \tau - B_\mathrm{r}\dot{\theta} \qquad \qquad \tag{6a}

and acting on the pendulum is

Q_2 = -B_\mathrm{p}\dot{\alpha} \qquad \qquad\tag{6b}

Our control variable is the input servo motor voltage, $V_m$ . Opposing the applied torque is the viscous friction torque, or viscous damping, corresponding to the term $B_\mathrm{r}$ . Since the pendulum is not actuated, the only force acting on the link is the damping. The viscous damping coefficient of the pendulum is denoted by $B_\mathrm{p}$ .

Once expressions for the kinetic and potential energy are obtained and the Lagrangian is found, then the task is to compute various derivatives to get the EOMs. After going through this process, the nonlinear equations of motion for the Rotary Pendulum are:

\displaystyle\left (m_pL_r^2 + \frac{1}{4}m_pL_p^2 - \frac{1}{4}m_pL_p^2 \cos^2(\alpha)+J_r \right )\ddot{\theta} - \left (\frac{1}{2}m_pL_pL_r\cos(\alpha) \right)\ddot{\alpha} \\+ \displaystyle\left(\frac{1}{2}m_pL_p^2\sin(\alpha)\cos(\alpha)\right)\dot{\theta}\dot{\alpha} + \left( \frac{1}{2}m_pL_pL_r\sin(\alpha)\right)\dot{\alpha}^2 = \tau - B_r\dot{\theta} \qquad \qquad \tag{7}

\displaystyle{\left(-\frac{1}{2}m_pL_pL_r\cos(\alpha) \right)\ddot{\theta} + \left(J_p + \frac{1}{4}m_pL_p^2 \right) \ddot{\alpha} - \left(\frac{1}{4} m_pL_p^2\cos(\alpha)\sin(\alpha)\right)\dot{\theta}^2} \\ -\frac{1}{2} m_pL_pg\sin(\alpha) = -B_p\dot{\alpha} \qquad \qquad \tag{8}

The torque applied at the base of the rotary arm (i.e. at the load gear) is generated by the servo motor as described by the equation 7. Refer to Table A in the Appendix for the Rotary Servo parameters.

\tau = \displaystyle\frac{k_t(V_m - k_m\dot{\theta})}{R_m} \qquad \qquad \tag{9}

Linearization

Linearization of a nonlinear function about a selected point is obtained by retaining up to first order term in the Taylor Series expansion of the function about the selected point. For example, linearization of a two variable nonlinear function $f(z)$ where

z = \begin {bmatrix} z_1 & z_2\end {bmatrix}^T

about the point

z_0 = \begin {bmatrix} a & b\end {bmatrix}^T

can be written as

f_{lin}(z) = f(z_0) + \frac{\partial f(z)}{\partial z_1}\Bigr|_{z = z_0} (z_1 - a)+\frac{\partial f(z)}{\partial z_2}\Bigr|_{z=z_0}(z_2 - b)

Linearization of Inverted Pendulum Equations

The nonlinear equations of the inverted pendulum system obtained as equations (7) and (8) are linearized about the equilibrium point with the pendulum in the upright position, i.e.,

$\theta_o = 0$ , $\alpha_o = 0$ , $\dot{\theta}_o = 0$ , $\dot{\alpha}_o = 0$ , $\ddot{\theta}_o = 0$ , $\ddot{\alpha}_o = 0$

Linearization of Equation (7) about the above equilibrium state gives

(J_r + m_pL_r^2)\ddot{\theta} - \frac{1}{2}m_pL_pL_r\ddot{\alpha} = kV_m-b\dot{\theta} - B_r\dot{\theta} \quad\quad (10)

where, from equation (9) the servo motor torque coefficient $k$ is

k = \frac{k_t}{R_m}

and the back emf coefficient $b$ is

b = \frac{k_tk_m}{R_m}

Likewise, linearization of equation (8) about the equilibrium point with the pendulum in the upright position gives

-\frac{1}{2}m_pL_pL_r\ddot{\theta} + (J_p + \frac{1}{4}m_pL_p^2)\ddot{\alpha} - \frac{1}{2}m_pL_pg\alpha= -B\dot{\alpha} \quad \quad (11)

where $g$ is the acceleration due to gravity. Note that the negative sign of the gravity torque ( $\alpha$ term) in equation (11) indicates negative stiffness with the pendulum in the vertically upright position.

Equations (10) and (11) can be arranged in the matrix form as

[M]\begin{bmatrix}\ddot{\theta} \\ \ddot{\alpha} \end{bmatrix} +[D] \begin{bmatrix}\dot{\theta} \\ \dot{\alpha} \end{bmatrix} + [K] \begin{bmatrix}\theta \\ \alpha \end{bmatrix} = \begin{bmatrix}k \\ 0 \end{bmatrix}V_m \quad \quad (12)

Where mass matrix $M$ , damping matrix $D$ , and the stiffness matrix $K$ become

[M] = \begin{bmatrix} (J_r + m_pL_r^2) & -\frac{1}{2}m_pL_pL_r \\ -\frac{1}{2}m_pL_pL_r & (J_p + \frac{1}{4}m_pL_p^2) \end{bmatrix}

[D] = \begin{bmatrix} b+B_r & 0 \\ 0 & B_p \end{bmatrix}

[K] = \begin{bmatrix} 0 & 0 \\ 0 & -\frac{1}{2}m_pL_pg \end{bmatrix}

Defining the state vector $x$ , output vector $y$ and the control input $u$ as

x = \begin{bmatrix} \theta & \alpha & \dot{\theta} & \dot{\alpha}\end{bmatrix}^T , \quad y = \begin{bmatrix} \theta & \alpha \end{bmatrix}^T, \quad u = V_m

Equation (12) can be rewritten in state space form as

\dot{x} = Ax + Bu \quad \quad (13)

y = Cx + Du \quad \quad (14)

where the system state matrix $A$ , control matrix $B$ , output matrix $C$ and the feedthrough of input to the output matrix $D$ become

A = \begin{bmatrix} O_{2\times2} & I_{2\times2} \\ -[M]^{-1}[K] & -[M]^{-1}[D] \end{bmatrix}

B = \begin{bmatrix} O_{2\times1} \\ M^{-1} \begin{bmatrix} k \\0 \end{bmatrix} \end{bmatrix}

C = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}

D = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

In the equations above, note that $O_{2\times2}$ is a $2 \times 2$ matrix of zeros, $O_{2\times1}$ is a $2 \times 1$ matrix of zeros, and $I_{2 \times 2}$ is a $2\times2$ identity matrix. Using the parameters of the system listed in Table 2, the linear model matrices $A$ and $B$ can be computed.

Analysis: Modeling

56KB

Lab4_Indv.zip

B. Balance Control

Specification

The control design and time-response requirements are: Specification 1: Damping ratio: 0.6 < $\zeta$ < 0.8 Specification 2: Natural frequency: 3.5 rad/s < < 4.5 rad/s Specification 3: Maximum pendulum angle deflection: $|\alpha|$ < 15 deg. Specification 4: Maximum control effort / voltage: < 10 V. The necessary closed-loop poles are found from specifications 1 and 2. The pendulum deflection and control effort requirements (i.e. specifications 3 and 4) are to be satisfied when the rotary arm is tracking a $\pm 20$ degree angle square wave.

Stability

The stability of a system can be determined from its poles ([2]):

Stable systems have poles only in the left-half of the complex plane.
Unstable systems have at least one pole in the right-half of the complex plane and/or poles of multiplicity greater than 1 on the imaginary axis.
Marginally stable systems have one pole on the imaginary axis and the other poles in the left-half of the complex plane.

The poles are the roots of the system’s characteristic equation. From the state-space, the characteristic equation of the system can be found using

\mathrm{det}(sI - A) = 0 \qquad \qquad\tag{15}

where $\mathrm{det}()$ is the determinant of a matrix, $s$ is the Laplace operator, and $I$ is the identity matrix. These are the eigenvalues of the system matrix $A$ .

Controllability

If the control input $u$ of a system can take each state variable, $x_i$ where $i = 1 ... n$ , from an initial state to a final state in finite time then the system is controllable, otherwise it is uncontrollable ([1]).

Rank Test The system is controllable if the rank of its controllability matrix

T = [B \ AB \ A^2B \ ... \ A^{n-1}B] \qquad \qquad \tag{16}

equals the number of states in the system, i.e.

\mathrm{rank}(T) = n \qquad \qquad \tag{17}

\mathrm{rank}(T) = n \qquad \qquad \tag{17}

Companion Matrix

For a controllable system with nxn system matrix A and nx1 control matrix B. The companion matrices of A and B are

\tilde{A} = \begin{bmatrix} 0 & 1 & \dots & 0& 0 \\ 0 & 0 & \dots & 0& 0 \\ \vdots & \vdots & \ddots & \vdots& \vdots \\ 0 & 0 & \dots & 0& 1 \\ -a_1 & -a_2 & \dots & -a_{n-1}& -a_n \end{bmatrix} \qquad \qquad \tag{18}

and

\tilde{B} = \begin{bmatrix} 0\\\vdots\\ \\1 \end{bmatrix} \qquad \qquad \tag{19}

Where $a_1, a_2, ...a_n$ are the coefficients of the characteristic equation of the system matrix A written as

s^n + a_ns^{n-1} + a_{n-1}s^{n-2} + ...+a_2s + a_1 = 0

Now define W,

W = T\tilde{T}^{-1} \qquad \qquad \tag{20}

where $T$ is the controllability matrix defined in Equation 16 and

\tilde{T} = \begin{bmatrix}\tilde{B} & \tilde{A}\tilde{B} & \cdots & \tilde{A}^{n-1}\tilde{B}\end{bmatrix} \qquad \qquad \tag{21}

Then

W^{-1}AW = \tilde{A} \qquad \qquad \tag{22}

and

W^{-1}B = \tilde{B} \qquad \qquad \tag{23}

Pole Placement Theory

If $(A,B)$ are controllable, then pole placement can be used to design the controller. Given the control law $u = -Kx$ , the state-space model of equation (13) becomes

\dot{x} = Ax + B(-Kx) =(A - BK)x \qquad \qquad \tag{25}

We can generalize the procedure to design a gain $K$ for a controllable $(A,B)$ system as follows:

Step 1 Find the companion matrices $\tilde{A}$ and $\tilde{B}$ . Compute $W = T\tilde{T}^{-1}$ . Step 2 Compute $\tilde{K}$ to assign the poles of $\tilde{A} - \tilde{B}\tilde{K}$ to the desired locations.

\tilde{A} - \tilde{B}\tilde{K}= \begin{bmatrix} 0 & 1 & \dots & 0& 0 \\ 0 & 0 & \dots & 0& 0 \\ \vdots & \vdots & \ddots & \vdots& \vdots \\ 0 & 0 & \dots & 0& 1 \\ -a_1-\tilde{k_1} & -a_2-\tilde{k_2} & \dots & -a_{n-1}-\tilde{k_{n-1}}& -a_n -\tilde{k_n} \end{bmatrix} \qquad \qquad \tag{26}

Step 3 Find $K = \tilde{K}W^{-1}$ to get the feedback gain for the original system $(A,B)$ . Remark-1: It is important to do the $\tilde{K} \rightarrow K$ conversion. Remember that $(A,B)$ represents the actual system while the companion matrices $\tilde{A}$ and $\tilde{B}$ do not.

Remark-2: The entire control design procedure using the pole placement method can be simply done in MATLAB using the function called 'place' or 'acker'. For a selected desired set of closed loop poles DP, the full state feedback gain matrix $K$ is obtained from

>> K = acker(A,B,DP);

Desired Poles

The rotary inverted pendulum system has four poles. As depicted in Figure 3, poles $p_1$ and $p_2$ are the complex conjugate dominant poles and are chosen to satisfy the natural frequency, $\omega_n$ , and the damping ratio, $\zeta$ , as given in the specifications. Let the conjugate poles be

p_1 = -\sigma + j\omega_d \qquad \qquad \tag{27a}

and

p_2 = -\sigma - j\omega_d \qquad \qquad \tag{27b}

where $\sigma = \zeta\omega_n$ and $\omega_d = \omega_n \sqrt{1-\zeta^2}$ is the damped frequency. The remaining closed-loop poles, $p_3$ and $p_4$ , are placed along the real-axis to the left of the dominant poles, as shown in Figure 3.

Simulation Model with Feedback

The feedback control loop that balances the rotary pendulum is illustrated in Figure 4. The reference state is defined as

x_d = [\theta_d \ 0\ 0\ 0]^ \intercal \qquad \qquad \tag{28}

where $\theta_d$ is the desired rotary arm angle. The controller is

u = K(x_d - x) \qquad \qquad \tag{29}

Note that if $x_d = 0$ then $u = -Kx$ , which is the control used in the pole-placement algorithm.

When running this on the actual system, the pendulum begins in the hanging, downward position. We only want the balance control to be enabled when the pendulum is brought up around its upright vertical position. The controller is therefore

u = \begin{cases} K(x_d - x) & |x_2| <\epsilon \\\ 0 & \text{otherwise} \end{cases} \qquad \qquad \tag{30}

where $\epsilon$ is the angle about which the controller should engage. Also $x_2$ is the pendulum angle. For example if $\epsilon = 10$ degrees, then the control will begin when the pendulum is within ±10 degrees of its upright position, i.e. when $|x_2| < 10$ degrees.

Reinforcement Learning

Fundamentals

Although we have found a sufficient model for the Inverted Pendulum system, there are often systems where the plant dynamics are difficult or impossible to model. In this case, we look to model-free control.

Reinforcement learning algorithms, which include model-free algorithms, seek to train an agent. The agent receives observations of the plant system in addition to feedback from a reward function, and outputs a corresponding action according to its policy. The agent may be trained on the physical system in a series of discrete training episodes. Following each episode, the agent updates its policy according to the specific reinforcement learning algorithm in use. A summary of the reinforcement learning training loop is shown in Fig. 5.

Following training, the agent may be deployed on the system to ideally achieve the desired behavior. The particular reinforcement learning algorithm used in this lab is the deep deterministic policy gradient (DDPG) algorithm, a model-free actor-critic algorithm. For more information about the DDPG algorithm in particular, please visit the corresponding Mathworks article.

Reinforcement Learning for the Inverted Pendulum System

To implement the DDPG algorithm, we need a reward function, which evaluates the observed performance of the system. A good reward function promotes behavior we desire in the system, and penalizes behavior we want to eliminate. In this case, we seek to command the system to $\theta =0,\,\alpha=0,\,\dot{\theta}=0,\,\dot{\alpha} =0.$ Additionally, we especially want to avoid the servo angle $\theta$ exceeding the physical limit of the system at $\theta_{\text{max, actual}} = 90\degree$ , with a margin of error such that $\theta_{\text{max}} = 60\degree$ , the pendulum angle $\alpha$ exceeding the balance control limit of $\alpha_{\text{max}}=\epsilon=12\degree$ , and the input voltage limit of $u_{\text{max}}=5\,\text{V}$ . Thus, we use the following quadratic reward function

r = -(q_{11}\theta^2+q_{22}\alpha^2+q_{33}\dot{\theta}^2+q_{44}\dot{\alpha}^2+r_{11}u^2)+B(|\theta|>\theta_{\text{max}} \,OR\,|\alpha|>\alpha_{\text{max}}\,OR\,|u|>u_{\text{max}}),\qquad \qquad\quad \tag{31}

where $q_{ii}.\,r_{ii}, B$ are weights. This function covers all of the desired behavior we seek from the system. By default, the weights are $q_{33}=0<r_{11}=0.1<q_{44}=1<q_{11}=10<q_{22}=20,$ and $B=-100,$ reflecting the relative importance of each state.

For a model-free application, an engineer would typically train the reinforcement learning model on the physical plant system or a digital twin of the plant system. However, doing this during a 2.5 hour lab session is impractical, as this would require repeatedly raising the pendulum for 1000 training episodes. Consequently, we train our reinforcement learning agent in simulation on the state space model we developed in part A.

B.1 Experiment: Training the Reinforcement Learning Controller

764KB

Lab4_Group.zip

B.2 Experiment: Designing the Pole Placement Controller

Select $\zeta$ and $\omega_n$ from the table below. Ensure to choose a different set of values from your groupmates
Parameter
1
2
3
4
5
6
7
8
$w_n$ (rad/s)
3.75
4
4.25
3.75
4.25
3.75
4
4.25
$\zeta$
0.65
0.65
0.65
0.7
0.7
0.75
0.75
0.75
In the rotpen_student.mlx live script open on your personal device, go to the Pole Placement Balance Control section. Enter the chosen zeta and omega_n values.
Determine the locations of the two dominant poles $p_1$ and $p_2$ based on the specifications and enter their values in the MATLAB live script. Ensure that the other poles are placed at p3 = -30 and p4 = -40. Hint: Use equation 27.
Find gain K using a predefined Compensator Design MATLAB command K = acker(A,B,DP), which is based on pole-placement design. Note: DP is a row vector of the desired poles found in Step 3.

For sanity check, if you use a damping ratio of 0.7 and a natural frequency of 4 rad/s, you should get approximately K = [-12 63 -5.5 7].

B.3 Experiment: Simulating the Pole Placement Controller

The s_rotpen_bal.slx SIMULINK diagram shown in Fig. 7 is used to simulate the closed-loop response of the Rotary Pendulum using the state-feedback control described in Balance Control with the control gain K found above. The Signal Generator block generates a 0.1 Hz square wave (with an amplitude of 1). The Amplitude (deg) gain block is used to change the desired rotary arm position. The state-feedback gain K is set in the Control Gain gain block and is read from the MATLAB workspace. The SIMULINK State-Space block reads the A, B, C, and D state-space matrices that are loaded in the MATLAB workspace. The Find State X block contains high-pass filters to find the angular rates of the rotary arm and pendulum.

Ensure you have run the Pole Placement Balance Control section of the rotpen_student.mlx live script. Ensure the gain K you found is loaded in the workspace (type K matrix in the command window).
Open and run the s_rotpen_bal.slx Simulink model for 10 seconds. The responses in the scopes shown in Fig. 8 were generated using an arbitrary feedback control gain. Note: When the simulation stops, the last 10 seconds of data is automatically saved in the MATLAB workspace to the variables data_theta, data_alpha, and data_Vm.
Figure 8: Balance control simulation
Save the data corresponding to the simulated response of the rotary arm, pendulum, and motor input voltage obtained using your obtained gain K. Note: The time is stored in the data_theta(:,1) vector, the desired and measured rotary arm angles are saved in the data_theta(:,2) and data_theta(:,3) arrays. Similarly, the pendulum angle is stored in the data_alpha(:,2) vector, and the control input is in the data_Vm(:,2) structure.
Measure the pendulum deflection and voltage used. Are the specifications given satisfied?

C. Controller Implementation

C.1 Experiment: Implementing the Pole Placement Controller

In this section, the state-feedback control that was designed and simulated in the previous sections is run on the actual Rotary Pendulum device.

Experiment Setup

The q_rotpen_bal SIMULINK diagram shown in Fig. 9 is used to run the state-feedback control on the Quanser Rotary Pendulum system. The Rotary Pendulum Interface subsystem contains QUARC blocks that interface with the DC motor and sensors of the system. The feedback developed in the previous section is implemented using a Simulink Gain block.

Continue in the rotpen_group.mlx script.
Go to the Implement Pole Placement on Hardware section and put the gain K you found in Step 4 of Designing the Balance Control experiment.
Run the section. The q_rotpen_bal.slx SIMULINK diagram should open automatically.
As shown in Figure 7, the SIMULINK diagram is incomplete. Add the necessary blocks from the Simulink library to implement the balance control.
- You need to add a switch logic to implement Equation 30. Use a Multi-port switch with 2 data points and the zero-based contiguous setting. The output from the compare to constant block will be 0 if false and 1 if true. Check your block with your TA.
- Ensure that you connect the final signal going into the u(V) terminal of the Rotary Pendulum Interface to the u scope terminal.
Turn ON the QUBE-Servo 3.
Ensure the pendulum is in the hanging down position, with the rotary arm aligned with the 0 marking, and is motionless.
To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring . This generates the controller code.
Press Connect button under Monitor & Tune and Press Start .
Once it is running, manually bring up the pendulum to its upright vertical position. You should feel the voltage kick-in when it is within the range where the balance control engages. Once it is balanced, the controller will introduce the ±20 degree rotary arm rotation.
The response should look similar to your simulation. Once you have obtained a response, click on the STOP button to stop the controller (data is saved for the last 10 seconds, so stop SIMULINK around 18-19 seconds once the response looks similar to Fig. 10).
CAUTION Be careful, as the pendulum will fall down when the controller is stopped.
Similar to the simulation Simulink model, the response data will be saved to the workspace. Copy and paste into your group's folder. Ensure that the data variables have 10 seconds of data saved.

C.2 Experiment: Implementing the Reinforcement Learning Controller

Proceed to the Implement Reinforcement Learning on Hardware section of the rotpen_group.mlx script.
Ensure the 'doPolicy' drop down box is set to 'true'.
Run the section. The q_rotpen_rl_student.slx Simulink diagram should open automatically.
To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring . This generates the controller code.
Press Connect button under Monitor & Tune and Press Start .
Once it is running, manually bring up the pendulum to its upright vertical position, as you did in the previous section. You may need to try this a few times.
Observe the behavior of your controller. It is possible that your reinforcement learning controller is unable to balance the pendulum, or, if it is, the control is not very robust. This is OK! These are the natural consequences of Reinforcement Learning based control.

D. Swing-Up Demonstration

In this section a nonlinear, energy-based control scheme is developed to swing the pendulum up from its hanging, downward position. The swing-up control described herein is based on the strategy outlined in [3]. Once upright, the control developed to balance the pendulum in the upright vertical position can be used.

Pendulum Dynamics

The dynamics of the pendulum can be redefined in terms of pivot acceleration $A$ (see Fig. 11) as

J_p \ddot{\alpha} - \frac{1}{2}m_pgL_p\sin{\alpha} = \frac{1}{2}m_pL_pA\cos{\alpha} \quad \quad (32)

The pivot acceleration, $A$ , is the linear acceleration of the pendulum link base. The acceleration is proportional to the torque of the rotary arm and is expressed as

\tau = m_rL_rA \quad \quad (33)

Control Law based on Lyapunov Function

According to Lyapunov’s stability theory, a sufficient condition for asymptotic stability of a nonlinear system about an equilibrium point is that the first time derivative of a selected Lyapunov’s function ( $V(x)$ ) is negative, i.e.,

Given

V(x) > 0 \qquad \forall~x \neq 0 \quad \quad (34)

sufficient condition for asymptotic stability is

\dot{V}(x) < 0 \qquad \forall~x \neq 0 \quad \quad (35)

Swing-up Control

Let us select a candidate Lyapunov function for arriving at the control law as a quadratic function of the difference in total energy ( $E$ ) and the reference energy ( $E_r$ ) when the pendulum is in equilibrium in the upright position, i.e.,

V = \frac{1}{2}\left( E-E_r \right )^2 \quad \quad (36)

where the total energy ( $E$ ) is the sum of kinetic energy $E_{KE}$ and potential energy $E_{PE}$ .

E_{KE} = \frac{1}{2}J_p{\dot{\alpha}}^2 \quad \quad (37)

E_{PE} = \frac{1}{2}m_pgL_p(\cos{\alpha} +1) \quad \quad (38)

E = E_{KE} + E_{PE} = \frac{1}{2}J_p{\dot{\alpha}}^2 + \frac{1}{2}m_pgL_p(\cos{\alpha} +1) \quad \quad (39)

Also, the reference energy of the pendulum in equilibrium in its fully upright position as compared to its fully downward position becomes

E_r = m_pL_pg \quad \quad (40)

Taking the time derivative of Equation 35, we get

\dot{V} = \left( E-E_r \right ) \dot{E} \quad \quad (41)

Taking the time derivative of Equation 38, we get

\dot{E} = \dot{\alpha}\left( J_p \ddot{\alpha} - \frac{1}{2}m_pL_pg\sin{\alpha} \right ) \quad \quad (42)

Now, we replace the bracketed term on the right-hand side of Equation 41 using the equation of motion of the pendulum obtained in Equation 31 to get

\dot{E} = \frac{1}{2}m_pL_pA\dot{\alpha}\cos{\alpha} \quad \quad (43)

Substituting Equation 42 in Equation 38, the time rate of change of the selected Lyapunov equation becomes

\dot{V} = \left( E - E_r \right) \dot{E} = \frac{1}{2}m_pL_p \left( E - E_r \right)A \dot{\alpha}\cos{\alpha} \quad \quad (44)

Now, we need to select $A$ such that $\dot{V}<0$ for asymptotic stability. This can easily be achieved by selecting $A$ as

A = -\left( E - E_r \right) \dot{\alpha}\cos{\alpha} \quad \quad (45)

With the above selection of control law for the pivot acceleration, Equation 43 becomes

\dot{V} = -\frac{1}{2}m_pL_p \left [ \left( E - E_r \right) \dot{\alpha}\cos{\alpha} \right ]^2 \quad \quad (46)

which guarantees $\dot{V}<0$ .

The selected control law (Equation 44) will continuously decrease the difference between current energy ( $E$ ) and the energy of the pendulum in the vertically up position ( $E_r$ ). Note that the selected control law is nonlinear, it changes sign for $90^{\circ}<\alpha<270^{\circ}$ and $\dot{\alpha}<0$ .

Now, for the quickest change in energy, we may want to use the maximum controller input (acceleration of the pivot), i.e.,

A = -A_{\text{max}}\text{sign}\left [ \left( E - E_r \right) \dot{\alpha}\cos{\alpha} \right ] \quad \quad (47)

but this controller can lead to chattering. Instead, we use

A = -\text{sat}_{A, \text{max}}(\mu \left( E-E_r \right ) \text{sign}(\dot{\alpha}\cos{\alpha})) \quad \quad (48)

where $\mu$ is a tunable controller gain.

Recall that the acceleration of the pendulum pivot is related to the torque applied on the rotary arm

\tau = m_rL_rA \quad \quad (49)

Additionally, from Equation 9 of the balance controller design section, we have

\tau = \frac{k_t}{R_m}\left( V_m - k_m\dot{\theta} \right ) \quad \quad (50)

Then, the voltage supplied to the rotary base motor is obtained by combining Equations 49 and 50 as

V_m = \frac{R_mm_rL_rA}{k_t} + k_m\dot{\theta} \quad \quad (50)

Where from Equation 48, $A = -\text{sat}_{A, \text{max}}(\mu \left( E-E_r \right ) \text{sign}(\dot{\alpha}\cos{\alpha}))$

The selected nonlinear control law will swing up the pendulum from the downward position towards the upright position. Once the pendulum is near the upright position, it is balanced around the fully upward position using the linear balance controller.

Combined Balance and Swing-up Control

The energy-based swing-up control can be combined with the balancing control in Equation 29 to obtain a control law that performs the dual tasks of swinging up the pendulum and balancing it. This can be accomplished by switching between the two control systems.

Basically, the same switching implemented for the balance control in Equation 30 is used. Only instead of feeding 0 V when the balance control is not enabled, the swing-up control is engaged. The controller therefore becomes

V_m = \begin{cases} K(x_d - x), & \text{if}\ |\alpha| < \epsilon \\ \displaystyle \frac{R_mm_rL_rA}{\eta_gK_g\eta_mk_t} + K_gk_m\dot{\theta}, & \text{otherwise} \end{cases} \quad \quad (51)

where $A = -\text{sat}_{A, \text{max}}(\mu \left( E-E_r \right ) \text{sign}(\dot{\alpha}\cos{\alpha}))$

The parameter $\epsilon$ in Equation 51 is a user-selected range of $\alpha$ over which the balance controller becomes active.

Experiment: Implementing the Swing up Control

Run the Part D: Swing-Up Demonstration part of the rotpen_group.mlx script.
The q_rotpen_swingup.slx Simulink model should open automatically.
Check if the correct gain K value is loaded onto the workspace.
To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring . This generates the controller code.
Press Connect button under Monitor & Tune and Press Start . The pendulum should be moving back and forth slowly. Gradually increase $\mu$ until the pendulum goes up. You may do this by increasing the gain slider. When the pendulum swings up to the vertical upright position, the balance controller should engage and balance the link.
After the link swings up and is balanced, wait for ~5 seconds and stop the SIMULINK.
OPTIONAL: Save the data_alpha, data_theta, and data_Vm. Ensure that the data variables have 10 seconds of data saved.

Results for Report

A) Modeling

The linear state-space representation of the rotary inverted pendulum system, i.e., $A$ , $B$ , $C$ and $D$ matrices (numerical values).
Open-loop poles of the system.

B.2) Pole Placement Controller Design

Chosen $\zeta$ and $\omega_n$ based on design specifications.
Corresponding locations of the two dominant poles $p_1$ and $p_2$ .
Gain vector $K$ .

B.3) Pole Placement Controller Simulation

Plots of the commanded position of the rotary arm ( $\theta_c$ ), simulated responses of the rotary arm ( $\theta$ ), pendulum ( $\alpha$ ), and motor input voltage ( $V_m$ ) generated using your obtained gain K.
Are Design Specifications 3 and 4 satisfied? Justify using the measured maximum pendulum deflection and motor input voltage values.

C.1) Pole Placement Controller Implementation

From Step B.3.10, plots of the commanded position of the rotary arm ( $\theta_c$ ), experimental responses of the rotary arm ( $\theta$ ), pendulum ( $\alpha$ ), and motor input voltage ( $u(V_m)$ ) generated using the chosen gain K.
Are Design Specifications 3 and 4 satisfied? Justify using the measured maximum pendulum deflection and motor input voltage values.

Questions for Report

A) Modeling

Based on your open-loop poles found in Result A.2, is the system stable, marginally stable, or unstable?
Did you expect the stability of the inverted pendulum to be as what was determined? Justify.

B.2) Pole Placement Controller Design

For the questions below, calculations and intermediate steps must be shown.

Determine the controllability matrix $T$ of the system. Is the inverted pendulum system controllable? Hint: Use Equation 17.
Using the open-loop poles, find the characteristic equation of $A$ . Hint: The roots of the characteristic equation are the open-loop poles.
Instead of using $\mathrm{det}(sI - A) = 0$ , characteristic polynomials can also be found using MATLAB function poly().
Find the corresponding companion matrices $\tilde{A}$ and $\tilde{B}$ . Hint: For $\tilde{A}$ , use the characteristic equation of A found in Question B.1.2 and Equation 19. For $\tilde{B}$ , use Equation 20.
Determine the controllability matrix $\tilde{T}$ of the companion system.
Determine the transformation matrix $W$ .
Check if $\tilde{A} = W^{-1}AW$ and $\tilde{B} = W^{-1}B$ with the obtained matrices.
Using the locations of the two dominant poles, $p_1$ and $p_2$ , based on the specifications (Result B.1.1), and the other poles at $p_3 = -30$ and $p_4 = -40$ , determine the desired closed-loop characteristic equation. Hint: The roots of the closed-loop characteristic equation are the closed-loop poles.
When applying the control $u = -\tilde{K}x$ to the companion form, it changes $(\tilde{A}, \tilde{B})$ to $(\tilde{A}-\tilde{B}\tilde{K}, \tilde{B})$ . Find the gain $\tilde{K}$ that assigns the poles to their new desired location. Hint: Use Equation 26 and find the corresponding characteristic equation. Compare this equation with the desired closed-loop characteristic equation found in Question B.1.7 to determine the gain vector $\tilde{K}$ .
Once you have found $\tilde{K}$ , find $K$ using Step 3 in Pole Placement Theory.
Compare the gain vector $K$ calculated using Pole Placement Theory (Question B.1.9) with the gain vector $K$ obtained using MATLAB (Result B.1.3).

D) Swing-Up Demonstration

Briefly summarize the swing-up controller experiment and your observations. Did the swing-up control behave as you expected?

References

[1] Norman S. Nise. Control Systems Engineering. John Wiley & Sons, Inc., 2008. (Improper citation, will fix.)

[2] TBA (Will fix)

[3] K. J. Åström and K. Furuta. Swinging up a pendulum by energy control. 13th IFAC World Congress, 1996. (Improper citation, will fix)

PreviousWeek 2

Last updated 22 days ago

Was this helpful?

Objective

Equipment

QUBE-Servo 3 Rotary Inverted Pendulum Model

Table 1: QUBE-Servo 3 components (Figure 1)

Table 2: QUBE-Servo 3 main parameters

A. Modeling

Model Convention

Nonlinear Equations of Motion

Linearization

Linearization of Inverted Pendulum Equations

Analysis: Modeling

B. Balance Control

Specification

Stability

Controllability

Companion Matrix

Pole Placement Theory

Desired Poles

Simulation Model with Feedback

Reinforcement Learning

Fundamentals

Reinforcement Learning for the Inverted Pendulum System

B.1 Experiment: Training the Reinforcement Learning Controller

B.2 Experiment: Designing the Pole Placement Controller

B.3 Experiment: Simulating the Pole Placement Controller

C. Controller Implementation

C.1 Experiment: Implementing the Pole Placement Controller

C.2 Experiment: Implementing the Reinforcement Learning Controller

D. Swing-Up Demonstration

Pendulum Dynamics

Control Law based on Lyapunov Function​

Swing-up Control

Combined Balance and Swing-up Control

Experiment: Implementing the Swing up Control

Results for Report

A) Modeling

B.2) Pole Placement Controller Design

B.3) Pole Placement Controller Simulation

C.1) Pole Placement Controller Implementation

Questions for Report

A) Modeling

B.2) Pole Placement Controller Design

D) Swing-Up Demonstration

References

Control Law based on Lyapunov Function