Rotary Inverted Pendulum + RL Demo

Objective

The objectives of this laboratory experiment are as follows:

  1. Obtain the linear state-space representation of the rotary pendulum system.

  2. Design a controller that balances the pendulum in its upright position using Pole Placement.

  3. Train a controller that balances the pendulum using Reinforcement Learning.

  4. Simulate the closed-loop system to ensure the given specifications are met.

  5. Implement the balance controller on the Quanser Rotary Pendulum system and evaluate its performance.

Equipment

QUBE-Servo 3 Rotary Inverted Pendulum Model

A picture of the QUBE-Servo 3 with rotary pendulum module is shown in Fig 1. The numbered components in Fig. 1 are listed in Table 1 along with the numerical values of the system parameters in Table 2.

Figure 1: QUBE-Servo 3 base with Rotary Pendulum module

Table 1: QUBE-Servo 3 components (Figure 1)

ID#
Component

1

Rotary servo

2

Rotary arm housing

3

Pendulum encoder

4

Rotary arm

5

Pendulum link

Table 2: QUBE-Servo 3 main parameters

Symbol
Description
Value
Unit

Mass of pendulum

0.024

kg\mathrm{kg}

Total length of pendulum

0.129

m\rm{m}

Pendulum moment of inertia about center of mass

3.3282 · 10-5

kgm2\rm{kg \cdot m^2}

Pendulum viscous damping coeffi­cient as seen at the pivot axis

5 · 10-5

Nms/rad\rm{N\cdot m\cdot s/rad}

Mass of rotary arm

0.095

kg\rm{kg}

Rotary arm length from pivot to tip

0.085

m\rm{m}

Rotary arm moment of inertia about its center of mass

5.7198 · 10-5

kgm2\rm{kg \cdot m^2}

Rotary arm viscous damping coeffi­cient as seen at the pivot axis

0.001

Nms/rad\rm{N\cdot m\cdot s/rad}

RmR_m

Motor armature resistance

8.4

Ω\Omega

ktk_t

Current-torque constant

0.0422

Nm/A\text{N}\cdot\text{m} / \text{A}

kmk_m

Back-emf constant

0.0422

Vs/rad\text{V}\cdot\text{s} / \text{rad}

A. Modeling

Model Convention

A simplified form of the inverted pendulum system used in the development of the mathematical model is shown in Figure 2. The rotary arm pivot is attached to the QUBE-Servo 3 system. The arm has a length of , a moment of inertia of , and its angle, θ\bm\theta, increases positively when it rotates counterclockwise (CCW). The servo (and thus the arm) should turn in the CCW direction when the control voltage is positive, i.e. > 0.

The pendulum link is connected to the end of the rotary arm. It has a total length of LpL_p and its center of mass is at . The moment of inertia about its center of mass is . The inverted pendulum angle, α\bm\alpha, is zero when it is perfectly upright in the vertical position and increases positively when rotated CCW.

Figure 2 Rotary inverted pendulum conventions

Nonlinear Equations of Motion

Instead of using classical (Newtonian) mechanics, the Lagrange method is used to find the equations of motion of the system. This systematic method is often used for more complicated systems such as robot manipulators with multiple joints.

Specifically, the equations that describe the motions of the rotary arm and the pendulum with respect to the servo motor voltage, i.e., the dynamics, are obtained using the Euler-Lagrange equation:

ddtLqi˙Lqi=Qi(1)\frac{d}{dt}\frac{\partial L}{\partial \dot{q_i}} - \frac{\partial L}{\partial q_i} = Q_i \qquad \qquad \tag{1}

The variables qiq_i are called generalized coordinates, the variables QiQ_i are called generalized forces and LL is the Langrangian (difference between the kinetic and potential energies of the system). For this system let

q(t)=[θ(t)α(t)]T(2)q(t) =\begin{bmatrix} \theta(t) & \alpha(t) \end{bmatrix}^T \qquad \qquad \tag{2}

where, as shown in Figure 2, θ(t)\theta(t) is the rotary arm angle and α(t)\alpha(t) is the inverted pendulum angle. The corresponding angular rates are

q˙(t)=[dθ(t)dtdα(t)dt]T(3)\dot{q}(t)=\begin{bmatrix} \displaystyle \frac{d\theta(t)}{dt} & \displaystyle\frac{d\alpha(t)}{dt} \end{bmatrix} ^T \qquad \qquad \tag{3}

With the generalized coordinates defined, the Euler-Lagrange equations for the rotary pendulum system are

ddtLθ˙Lθ=Q1(4a)\frac{d}{dt} \frac{\partial L}{\partial \dot{\theta}} - \frac{\partial L}{\partial \theta} = Q_1 \qquad \qquad \tag{4a}
ddtLα˙Lα=Q2(4b)\frac{d}{dt}\frac{\partial L}{\partial \dot{\alpha}} - \frac{\partial L}{\partial \alpha} = Q_2 \qquad \qquad \tag{4b}

The Lagrangian of the system is described by

L=TV(5)L = T - V \qquad \qquad \tag{5}

where TT is the total kinetic energy of the system and VV is the total potential energy of the system. Thus the Lagrangian is the difference between a system’s kinetic and potential energies.

The generalized forces QiQ_i are used to describe the nonconservative forces (e.g. friction) applied to a system with respect to the generalized coordinates. In this case, the generalized force acting on the rotary arm is

Q1=τBrθ˙(6a)Q_1 = \tau - B_\mathrm{r}\dot{\theta} \qquad \qquad \tag{6a}

and acting on the pendulum is

Q2=Bpα˙(6b)Q_2 = -B_\mathrm{p}\dot{\alpha} \qquad \qquad\tag{6b}

Our control variable is the input servo motor voltage, VmV_m. Opposing the applied torque is the viscous friction torque, or viscous damping, corresponding to the term BrB_\mathrm{r}. Since the pendulum is not actuated, the only force acting on the link is the damping. The viscous damping coefficient of the pendulum is denoted by BpB_\mathrm{p}.

Once expressions for the kinetic and potential energy are obtained and the Lagrangian is found, then the task is to compute various derivatives to get the EOMs. After going through this process, the nonlinear equations of motion for the Rotary Pendulum are:​

(mpLr2+14mpLp214mpLp2cos2(α)+Jr)θ¨(12mpLpLrcos(α))α¨+(12mpLp2sin(α)cos(α))θ˙α˙+(12mpLpLrsin(α))α˙2=τBrθ˙(7)\displaystyle\left (m_pL_r^2 + \frac{1}{4}m_pL_p^2 - \frac{1}{4}m_pL_p^2 \cos^2(\alpha)+J_r \right )\ddot{\theta} - \left (\frac{1}{2}m_pL_pL_r\cos(\alpha) \right)\ddot{\alpha} \\+ \displaystyle\left(\frac{1}{2}m_pL_p^2\sin(\alpha)\cos(\alpha)\right)\dot{\theta}\dot{\alpha} + \left( \frac{1}{2}m_pL_pL_r\sin(\alpha)\right)\dot{\alpha}^2 = \tau - B_r\dot{\theta} \qquad \qquad \tag{7}
(12mpLpLrcos(α))θ¨+(Jp+14mpLp2)α¨(14mpLp2cos(α)sin(α))θ˙212mpLpgsin(α)=Bpα˙(8)\displaystyle{\left(-\frac{1}{2}m_pL_pL_r\cos(\alpha) \right)\ddot{\theta} + \left(J_p + \frac{1}{4}m_pL_p^2 \right) \ddot{\alpha} - \left(\frac{1}{4} m_pL_p^2\cos(\alpha)\sin(\alpha)\right)\dot{\theta}^2} \\ -\frac{1}{2} m_pL_pg\sin(\alpha) = -B_p\dot{\alpha} \qquad \qquad \tag{8}

The torque applied at the base of the rotary arm (i.e. at the load gear) is generated by the servo motor as described by the equation 7. Refer to Table A in the Appendix for the Rotary Servo parameters.

τ=kt(Vmkmθ˙)Rm(9)\tau = \displaystyle\frac{k_t(V_m - k_m\dot{\theta})}{R_m} \qquad \qquad \tag{9}

Linearization

Linearization of a nonlinear function about a selected point is obtained by retaining up to first order term in the Taylor Series expansion of the function about the selected point. For example, linearization of a two variable nonlinear function f(z)f(z) where

z=[z1z2]Tz = \begin {bmatrix} z_1 & z_2\end {bmatrix}^T

about the point

z0=[ab]Tz_0 = \begin {bmatrix} a & b\end {bmatrix}^T

can be written as

flin(z)=f(z0)+f(z)z1z=z0(z1a)+f(z)z2z=z0(z2b)f_{lin}(z) = f(z_0) + \frac{\partial f(z)}{\partial z_1}\Bigr|_{z = z_0} (z_1 - a)+\frac{\partial f(z)}{\partial z_2}\Bigr|_{z=z_0}(z_2 - b)

Linearization of Inverted Pendulum Equations

The nonlinear equations of the inverted pendulum system obtained as equations (7) and (8) are linearized about the equilibrium point with the pendulum in the upright position, i.e.,

θo=0\theta_o = 0, αo=0\alpha_o = 0, θ˙o=0\dot{\theta}_o = 0, α˙o=0\dot{\alpha}_o = 0, θ¨o=0\ddot{\theta}_o = 0, α¨o=0\ddot{\alpha}_o = 0

Linearization of Equation (7) about the above equilibrium state gives

(Jr+mpLr2)θ¨12mpLpLrα¨=kVmbθ˙Brθ˙(10)(J_r + m_pL_r^2)\ddot{\theta} - \frac{1}{2}m_pL_pL_r\ddot{\alpha} = kV_m-b\dot{\theta} - B_r\dot{\theta} \quad\quad (10)

where, from equation (9) the servo motor torque coefficient kk is

k=ktRmk = \frac{k_t}{R_m}

and the back emf coefficient bb is

b=ktkmRmb = \frac{k_tk_m}{R_m}

Likewise, linearization of equation (8) about the equilibrium point with the pendulum in the upright position gives

12mpLpLrθ¨+(Jp+14mpLp2)α¨12mpLpgα=Bα˙(11)-\frac{1}{2}m_pL_pL_r\ddot{\theta} + (J_p + \frac{1}{4}m_pL_p^2)\ddot{\alpha} - \frac{1}{2}m_pL_pg\alpha= -B\dot{\alpha} \quad \quad (11)

where gg is the acceleration due to gravity. Note that the negative sign of the gravity torque (α\alpha term) in equation (11) indicates negative stiffness with the pendulum in the vertically upright position.

Equations (10) and (11) can be arranged in the matrix form as

[M][θ¨α¨]+[D][θ˙α˙]+[K][θα]=[k0]Vm(12)[M]\begin{bmatrix}\ddot{\theta} \\ \ddot{\alpha} \end{bmatrix} +[D] \begin{bmatrix}\dot{\theta} \\ \dot{\alpha} \end{bmatrix} + [K] \begin{bmatrix}\theta \\ \alpha \end{bmatrix} = \begin{bmatrix}k \\ 0 \end{bmatrix}V_m \quad \quad (12)

Where mass matrix MM, damping matrix DD, and the stiffness matrix KK become

[M]=[(Jr+mpLr2)12mpLpLr12mpLpLr(Jp+14mpLp2)][M] = \begin{bmatrix} (J_r + m_pL_r^2) & -\frac{1}{2}m_pL_pL_r \\ -\frac{1}{2}m_pL_pL_r & (J_p + \frac{1}{4}m_pL_p^2) \end{bmatrix}
[D]=[b+Br00Bp][D] = \begin{bmatrix} b+B_r & 0 \\ 0 & B_p \end{bmatrix}
[K]=[00012mpLpg][K] = \begin{bmatrix} 0 & 0 \\ 0 & -\frac{1}{2}m_pL_pg \end{bmatrix}

Defining the state vector xx, output vector yy and the control input uu as

x=[θαθ˙α˙]T,y=[θα]T,u=Vmx = \begin{bmatrix} \theta & \alpha & \dot{\theta} & \dot{\alpha}\end{bmatrix}^T , \quad y = \begin{bmatrix} \theta & \alpha \end{bmatrix}^T, \quad u = V_m

Equation (12) can be rewritten in state space form as

x˙=Ax+Bu(13)\dot{x} = Ax + Bu \quad \quad (13)
y=Cx+Du(14)y = Cx + Du \quad \quad (14)

where the system state matrix AA, control matrix BB, output matrix CC and the feedthrough of input to the output matrix DD become

A=[O2×2I2×2[M]1[K][M]1[D]]A = \begin{bmatrix} O_{2\times2} & I_{2\times2} \\ -[M]^{-1}[K] & -[M]^{-1}[D] \end{bmatrix}
B=[O2×1M1[k0]]B = \begin{bmatrix} O_{2\times1} \\ M^{-1} \begin{bmatrix} k \\0 \end{bmatrix} \end{bmatrix}
C=[10000100]C = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}
D=[00]D = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

In the equations above, note that O2×2O_{2\times2} is a 2×22 \times 2 matrix of zeros, O2×1O_{2\times1} is a 2×12 \times 1 matrix of zeros, and I2×2I_{2 \times 2} is a 2×22\times2 identity matrix. Using the parameters of the system listed in Table 2, the linear model matricesAA and BB can be computed.

Analysis: Modeling

56KB
Open
  1. Download the Lab4_Indv.zip file to your personal device and extract the folder contents.

  2. Open the rotpen_student.mlx live script and run the Modeling section. It will automatically load the parameters required for the state-space representation, and subsequently generate the A, B, C and D matrices required for the upcoming analysis. Please refer to Table 2 for additional information regarding the parameters. Note: The representative C and D matrices have already been included. The actuator dynamics have been added to convert your state-space matrices to be in terms of voltage. Recall that the input of the state-space model is the torque acting at the servo load gear. However, we control the servo input voltage instead of control torque directly. The script uses the voltage torque relationship given in Equation 9 to transform torque to voltage.

  3. Note down your state-space matrices for your report. Note: You may want to cross-check the state-space matrix with TAs before proceeding to balance control.

  4. Find the open-loop poles of the system. Hint: Use eig(A).

B. Balance Control

Specification

The control design and time-response requirements are: Specification 1: Damping ratio: 0.6 < ζ\zeta < 0.8 Specification 2: Natural frequency: 3.5 rad/s < < 4.5 rad/s Specification 3: Maximum pendulum angle deflection: α|\alpha| < 15 deg. Specification 4: Maximum control effort / voltage: < 10 V. The necessary closed-loop poles are found from specifications 1 and 2. The pendulum deflection and control effort requirements (i.e. specifications 3 and 4) are to be satisfied when the rotary arm is tracking a ±20\pm 20 degree angle square wave.

Stability

The stability of a system can be determined from its poles ([2]):

  • Stable systems have poles only in the left-half of the complex plane.

  • Unstable systems have at least one pole in the right-half of the complex plane and/or poles of multiplicity greater than 1 on the imaginary axis.

  • Marginally stable systems have one pole on the imaginary axis and the other poles in the left-half of the complex plane.

The poles are the roots of the system’s characteristic equation. From the state-space, the characteristic equation of the system can be found using

det(sIA)=0(15)\mathrm{det}(sI - A) = 0 \qquad \qquad\tag{15}

where det()\mathrm{det}()is the determinant of a matrix, ss is the Laplace operator, and II is the identity matrix. These are the eigenvalues of the system matrix AA.

Controllability

If the control input uu of a system can take each state variable, xix_i where i=1...ni = 1 ... n, from an initial state to a final state in finite time then the system is controllable, otherwise it is uncontrollable ([1]).

Rank Test The system is controllable if the rank of its controllability matrix

T=[B AB A2B ... An1B](16)T = [B \ AB \ A^2B \ ... \ A^{n-1}B] \qquad \qquad \tag{16}

equals the number of states in the system, i.e.

rank(T)=n(17)\mathrm{rank}(T) = n \qquad \qquad \tag{17}
rank(T)=n(17)\mathrm{rank}(T) = n \qquad \qquad \tag{17}

Companion Matrix

For a controllable system with nxn system matrix A and nx1 control matrix B. The companion matrices of A and B are

A~=[010000000001a1a2an1an](18)\tilde{A} = \begin{bmatrix} 0 & 1 & \dots & 0& 0 \\ 0 & 0 & \dots & 0& 0 \\ \vdots & \vdots & \ddots & \vdots& \vdots \\ 0 & 0 & \dots & 0& 1 \\ -a_1 & -a_2 & \dots & -a_{n-1}& -a_n \end{bmatrix} \qquad \qquad \tag{18}

and

B~=[01](19)\tilde{B} = \begin{bmatrix} 0\\\vdots\\ \\1 \end{bmatrix} \qquad \qquad \tag{19}

Where a1,a2,...ana_1, a_2, ...a_n are the coefficients of the characteristic equation of the system matrix A written as

sn+ansn1+an1sn2+...+a2s+a1=0s^n + a_ns^{n-1} + a_{n-1}s^{n-2} + ...+a_2s + a_1 = 0

Now define W,

W=TT~1(20)W = T\tilde{T}^{-1} \qquad \qquad \tag{20}

where TT is the controllability matrix defined in Equation 16 and

T~=[B~A~B~A~n1B~](21)\tilde{T} = \begin{bmatrix}\tilde{B} & \tilde{A}\tilde{B} & \cdots & \tilde{A}^{n-1}\tilde{B}\end{bmatrix} \qquad \qquad \tag{21}

Then

W1AW=A~(22)W^{-1}AW = \tilde{A} \qquad \qquad \tag{22}

and

W1B=B~(23)W^{-1}B = \tilde{B} \qquad \qquad \tag{23}

Pole Placement Theory

If (A,B)(A,B) are controllable, then pole placement can be used to design the controller. Given the control law u=Kxu = -Kx, the state-space model of equation (13) becomes

x˙=Ax+B(Kx)=(ABK)x(25)\dot{x} = Ax + B(-Kx) =(A - BK)x \qquad \qquad \tag{25}

We can generalize the procedure to design a gain KK for a controllable (A,B)(A,B) system as follows:

Step 1 Find the companion matrices A~\tilde{A} and B~\tilde{B}. Compute W=TT~1W = T\tilde{T}^{-1}. Step 2 Compute K~\tilde{K} to assign the poles of A~B~K~\tilde{A} - \tilde{B}\tilde{K} to the desired locations.

A~B~K~=[010000000001a1k1~a2k2~an1kn1~ankn~](26)\tilde{A} - \tilde{B}\tilde{K}= \begin{bmatrix} 0 & 1 & \dots & 0& 0 \\ 0 & 0 & \dots & 0& 0 \\ \vdots & \vdots & \ddots & \vdots& \vdots \\ 0 & 0 & \dots & 0& 1 \\ -a_1-\tilde{k_1} & -a_2-\tilde{k_2} & \dots & -a_{n-1}-\tilde{k_{n-1}}& -a_n -\tilde{k_n} \end{bmatrix} \qquad \qquad \tag{26}

Step 3 Find K=K~W1K = \tilde{K}W^{-1} to get the feedback gain for the original system (A,B)(A,B). Remark-1: It is important to do the K~K\tilde{K} \rightarrow K conversion. Remember that (A,B)(A,B) represents the actual system while the companion matrices A~\tilde{A} and B~\tilde{B} do not.

Remark-2: The entire control design procedure using the pole placement method can be simply done in MATLAB using the function called 'place' or 'acker'. For a selected desired set of closed loop poles DP, the full state feedback gain matrix KK is obtained from

>> K = acker(A,B,DP);

Desired Poles

The rotary inverted pendulum system has four poles. As depicted in Figure 3, poles p1p_1 and p2p_2 are the complex conjugate dominant poles and are chosen to satisfy the natural frequency, ωn\omega_n, and the damping ratio, ζ\zeta, as given in the specifications. Let the conjugate poles be

p1=σ+jωd(27a)p_1 = -\sigma + j\omega_d \qquad \qquad \tag{27a}

and

p2=σjωd(27b)p_2 = -\sigma - j\omega_d \qquad \qquad \tag{27b}

where​ σ=ζωn\sigma = \zeta\omega_n and ωd=ωn1ζ2\omega_d = \omega_n \sqrt{1-\zeta^2} is the damped frequency. The remaining closed-loop poles, p3p_3 and p4p_4, are placed along the real-axis to the left of the dominant poles, as shown in Figure 3.

Figure 3 Desired closed-loop pole locations

Simulation Model with Feedback

The feedback control loop that balances the rotary pendulum is illustrated in Figure 4. The reference state is defined as

xd=[θd 0 0 0](28)x_d = [\theta_d \ 0\ 0\ 0]^ \intercal \qquad \qquad \tag{28}

where θd\theta_d is the desired rotary arm angle. The controller is

u=K(xdx)(29)u = K(x_d - x) \qquad \qquad \tag{29}

Note that if xd=0x_d = 0 then u=Kxu = -Kx, which is the control used in the pole-placement algorithm.

Figure 4 State-feedback control loop

When running this on the actual system, the pendulum begins in the hanging, downward position. We only want the balance control to be enabled when the pendulum is brought up around its upright vertical position. The controller is therefore

u={K(xdx)x2<ϵ 0otherwise(30)u = \begin{cases} K(x_d - x) & |x_2| <\epsilon \\\ 0 & \text{otherwise} \end{cases} \qquad \qquad \tag{30}

where ϵ\epsilon is the angle about which the controller should engage. Also x2x_2 is the pendulum angle. For example if ϵ=10\epsilon = 10 degrees, then the control will begin when the pendulum is within ±10 degrees of its upright position, i.e. when x2<10|x_2| < 10 degrees.

Reinforcement Learning

Fundamentals

Although we have found a sufficient model for the Inverted Pendulum system, there are often systems where the plant dynamics are difficult or impossible to model. In this case, we look to model-free control.

Reinforcement learning algorithms, which include model-free algorithms, seek to train an agent. The agent receives observations of the plant system in addition to feedback from a reward function, and outputs a corresponding action according to its policy. The agent may be trained on the physical system in a series of discrete training episodes. Following each episode, the agent updates its policy according to the specific reinforcement learning algorithm in use. A summary of the reinforcement learning training loop is shown in Fig. 5.

Figure 5. Reinforcement learning training loop (Mathworks 2025) [2]

Following training, the agent may be deployed on the system to ideally achieve the desired behavior. The particular reinforcement learning algorithm used in this lab is the deep deterministic policy gradient (DDPG) algorithm, a model-free actor-critic algorithm. For more information about the DDPG algorithm in particular, please visit the corresponding Mathworks article.

Reinforcement Learning for the Inverted Pendulum System

To implement the DDPG algorithm, we need a reward function, which evaluates the observed performance of the system. A good reward function promotes behavior we desire in the system, and penalizes behavior we want to eliminate. In this case, we seek to command the system to θ=0,α=0,θ˙=0,α˙=0.\theta =0,\,\alpha=0,\,\dot{\theta}=0,\,\dot{\alpha} =0. Additionally, we especially want to avoid the servo angle θ\theta exceeding the physical limit of the system at θmax, actual=90°\theta_{\text{max, actual}} = 90\degree, with a margin of error such that θmax=60°\theta_{\text{max}} = 60\degree , the pendulum angle α\alpha exceeding the balance control limit of αmax=ϵ=12°\alpha_{\text{max}}=\epsilon=12\degree, and the input voltage limit of umax=5Vu_{\text{max}}=5\,\text{V} . Thus, we use the following quadratic reward function

r=(q11θ2+q22α2+q33θ˙2+q44α˙2+r11u2)+B(θ>θmaxORα>αmaxORu>umax),(31)r = -(q_{11}\theta^2+q_{22}\alpha^2+q_{33}\dot{\theta}^2+q_{44}\dot{\alpha}^2+r_{11}u^2)+B(|\theta|>\theta_{\text{max}} \,OR\,|\alpha|>\alpha_{\text{max}}\,OR\,|u|>u_{\text{max}}),\qquad \qquad\quad \tag{31}

where qii.rii,Bq_{ii}.\,r_{ii}, B are weights. This function covers all of the desired behavior we seek from the system. By default, the weights are q33=0<r11=0.1<q44=1<q11=10<q22=20,q_{33}=0<r_{11}=0.1<q_{44}=1<q_{11}=10<q_{22}=20, and B=100,B=-100, reflecting the relative importance of each state.

For a model-free application, an engineer would typically train the reinforcement learning model on the physical plant system or a digital twin of the plant system. However, doing this during a 2.5 hour lab session is impractical, as this would require repeatedly raising the pendulum for 1000 training episodes. Consequently, we train our reinforcement learning agent in simulation on the state space model we developed in part A.

B.1 Experiment: Training the Reinforcement Learning Controller

764KB
Open
  1. Download the Lab4_Group.zip file to the lab PC and extract the folder contents.

  2. Open the rotpen_group.mlx live script and run the Initialization section. This section generates all necessary parameters for the reinforcement learning training. Ensure that the drop down menu is configured to 'train new agent.'

  3. Run the Observation and Action Signals section. This section creates the learning environment for the reinforcement learning agent, configures the observation signal read by the agent, and configures the action signal generated by the agent.

  4. Run the Create DDPG Agent section. This section creates and configures the critic and actor neural networks, and compiles these networks into a single agent.

  5. Run the Train/Load Agent section. This section begins the training session. By default, the training session is configured to contain 1000 episodes with a random starting pendulum angle between -20 and 20 degrees.

  6. The s_rotpen_train_rl.slx Simulink diagram should automatically open, in addition to a Reinforcement Learning Training Monitor window. Open the 'alpha' and 'theta' scopes and configure your monitor so you can clearly see all three windows, as in Fig. 6. You should now be able to watch as the reinforcement learning agent trains on your state space model.

  7. Wait for the training to complete. This should take 20-30 minutes. Ensure that the main MATLAB window is open and visible to avoid slowdown due to CPU allocation. Proceed to the next section to continue your pole placement control design and simulation while the reinforcement learning training continues in the background. On training completion, your trained agent will be saved to the current directory as agent_student.mat.

Figure 6. Screen configuration for reinforcement learning training.

B.2 Experiment: Designing the Pole Placement Controller

  1. Select ζ\zeta and ωn\omega_n from the table below. Ensure to choose a different set of values from your groupmates

    Parameter
    1
    2
    3
    4
    5
    6
    7
    8

    wnw_n (rad/s)

    3.75
    4
    4.25
    3.75
    4.25
    3.75
    4
    4.25

    ζ\zeta

    0.65
    0.65
    0.65
    0.7
    0.7
    0.75
    0.75
    0.75
  2. In the rotpen_student.mlx live script open on your personal device, go to the Pole Placement Balance Control section. Enter the chosen zeta and omega_n values.

  3. Determine the locations of the two dominant poles p1p_1 and p2p_2 based on the specifications and enter their values in the MATLAB live script. Ensure that the other poles are placed at p3 = -30 and p4 = -40. Hint: Use equation 27.

  4. Find gain K using a predefined Compensator Design MATLAB command K = acker(A,B,DP), which is based on pole-placement design. Note: DP is a row vector of the desired poles found in Step 3.

B.3 Experiment: Simulating the Pole Placement Controller

The s_rotpen_bal.slx SIMULINK diagram shown in Fig. 7 is used to simulate the closed-loop response of the Rotary Pendulum using the state-feedback control described in Balance Control with the control gain K found above. The Signal Generator block generates a 0.1 Hz square wave (with an amplitude of 1). The Amplitude (deg) gain block is used to change the desired rotary arm position. The state-feedback gain K is set in the Control Gain gain block and is read from the MATLAB workspace. The SIMULINK State-Space block reads the A, B, C, and D state-space matrices that are loaded in the MATLAB workspace. The Find State X block contains high-pass filters to find the angular rates of the rotary arm and pendulum.

Figure 7: s_rotpen_bal SIMULINK diagram used to simulate the state-feedback control
  1. Ensure you have run the Pole Placement Balance Control section of the rotpen_student.mlx live script. Ensure the gain K you found is loaded in the workspace (type K matrix in the command window).

  2. Open and run the s_rotpen_bal.slx Simulink model for 10 seconds. The responses in the scopes shown in Fig. 8 were generated using an arbitrary feedback control gain. Note: When the simulation stops, the last 10 seconds of data is automatically saved in the MATLAB workspace to the variables data_theta, data_alpha, and data_Vm.

    Figure 8: Balance control simulation
  3. Save the data corresponding to the simulated response of the rotary arm, pendulum, and motor input voltage obtained using your obtained gain K. Note: The time is stored in the data_theta(:,1) vector, the desired and measured rotary arm angles are saved in the data_theta(:,2) and data_theta(:,3) arrays. Similarly, the pendulum angle is stored in the data_alpha(:,2) vector, and the control input is in the data_Vm(:,2) structure.

  4. Measure the pendulum deflection and voltage used. Are the specifications given satisfied?

C. Controller Implementation

C.1 Experiment: Implementing the Pole Placement Controller

In this section, the state-feedback control that was designed and simulated in the previous sections is run on the actual Rotary Pendulum device.

Experiment Setup

The q_rotpen_bal SIMULINK diagram shown in Fig. 9 is used to run the state-feedback control on the Quanser Rotary Pendulum system. The Rotary Pendulum Interface subsystem contains QUARC blocks that interface with the DC motor and sensors of the system. The feedback developed in the previous section is implemented using a Simulink Gain block.

Figure 9. q_rotpen_bal_student SIMULINK diagram can be used to run balance controller
  1. Continue in the rotpen_group.mlx script.

  2. Go to the Implement Pole Placement on Hardware section and put the gain K you found in Step 4 of Designing the Balance Control experiment.

  3. Run the section. The q_rotpen_bal.slx SIMULINK diagram should open automatically.

  4. As shown in Figure 7, the SIMULINK diagram is incomplete. Add the necessary blocks from the Simulink library to implement the balance control.

    • You need to add a switch logic to implement Equation 30. Use a Multi-port switch with 2 data points and the zero-based contiguous setting. The output from the compare to constant block will be 0 if false and 1 if true. Check your block with your TA.

    • Ensure that you connect the final signal going into the u(V) terminal of the Rotary Pendulum Interface to the u scope terminal.

  5. Turn ON the QUBE-Servo 3.

  6. Ensure the pendulum is in the hanging down position, with the rotary arm aligned with the 0 marking, and is motionless.

  7. To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring . This generates the controller code.

  8. Press Connect button under Monitor & Tune and Press Start .

  9. Once it is running, manually bring up the pendulum to its upright vertical position. You should feel the voltage kick-in when it is within the range where the balance control engages. Once it is balanced, the controller will introduce the ±20 degree rotary arm rotation.

  10. The response should look similar to your simulation. Once you have obtained a response, click on the STOP button to stop the controller (data is saved for the last 10 seconds, so stop SIMULINK around 18-19 seconds once the response looks similar to Fig. 10).

  11. Similar to the simulation Simulink model, the response data will be saved to the workspace. Copy and paste into your group's folder. Ensure that the data variables have 10 seconds of data saved.

Figure 10. Experiment balance control response example

C.2 Experiment: Implementing the Reinforcement Learning Controller

  1. Proceed to the Implement Reinforcement Learning on Hardware section of the rotpen_group.mlx script.

  2. Ensure the 'doPolicy' drop down box is set to 'true'.

  3. Run the section. The q_rotpen_rl_student.slx Simulink diagram should open automatically.

  4. To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring . This generates the controller code.

  5. Press Connect button under Monitor & Tune and Press Start .

  6. Once it is running, manually bring up the pendulum to its upright vertical position, as you did in the previous section. You may need to try this a few times.

  7. Observe the behavior of your controller. It is possible that your reinforcement learning controller is unable to balance the pendulum, or, if it is, the control is not very robust. This is OK! These are the natural consequences of Reinforcement Learning based control.

D. Swing-Up Demonstration

In this section a nonlinear, energy-based control scheme is developed to swing the pendulum up from its hanging, downward position. The swing-up control described herein is based on the strategy outlined in [3]. Once upright, the control developed to balance the pendulum in the upright vertical position can be used.

Pendulum Dynamics

The dynamics of the pendulum can be redefined in terms of pivot acceleration AA (see Fig. 11) as

Jpα¨12mpgLpsinα=12mpLpAcosα(32)J_p \ddot{\alpha} - \frac{1}{2}m_pgL_p\sin{\alpha} = \frac{1}{2}m_pL_pA\cos{\alpha} \quad \quad (32)
Figure 11. Force vector diagram

The pivot acceleration, AA, is the linear acceleration of the pendulum link base. The acceleration is proportional to the torque of the rotary arm and is expressed as

τ=mrLrA(33)\tau = m_rL_rA \quad \quad (33)

Control Law based on Lyapunov Function​

According to Lyapunov’s stability theory, a sufficient condition for asymptotic stability of a nonlinear system about an equilibrium point is that the first time derivative of a selected Lyapunov’s function (V(x)V(x)) is negative, i.e.,

Given

V(x)>0 x0(34)V(x) > 0 \qquad \forall~x \neq 0 \quad \quad (34)

sufficient condition for asymptotic stability is

V˙(x)<0 x0(35)\dot{V}(x) < 0 \qquad \forall~x \neq 0 \quad \quad (35)

Swing-up Control

Let us select a candidate Lyapunov function for arriving at the control law as a quadratic function of the difference in total energy (EE) and the reference energy (ErE_r) when the pendulum is in equilibrium in the upright position, i.e.,

V=12(EEr)2(36)V = \frac{1}{2}\left( E-E_r \right )^2 \quad \quad (36)

where the total energy (EE) is the sum of kinetic energy EKEE_{KE} and potential energy EPEE_{PE}.

EKE=12Jpα˙2(37)E_{KE} = \frac{1}{2}J_p{\dot{\alpha}}^2 \quad \quad (37)
EPE=12mpgLp(cosα+1)(38)E_{PE} = \frac{1}{2}m_pgL_p(\cos{\alpha} +1) \quad \quad (38)
E=EKE+EPE=12Jpα˙2+12mpgLp(cosα+1)(39)E = E_{KE} + E_{PE} = \frac{1}{2}J_p{\dot{\alpha}}^2 + \frac{1}{2}m_pgL_p(\cos{\alpha} +1) \quad \quad (39)

Also, the reference energy of the pendulum in equilibrium in its fully upright position as compared to its fully downward position becomes

Er=mpLpg(40)E_r = m_pL_pg \quad \quad (40)

Taking the time derivative of Equation 35, we get

V˙=(EEr)E˙(41)\dot{V} = \left( E-E_r \right ) \dot{E} \quad \quad (41)

Taking the time derivative of Equation 38, we get

E˙=α˙(Jpα¨12mpLpgsinα)(42)\dot{E} = \dot{\alpha}\left( J_p \ddot{\alpha} - \frac{1}{2}m_pL_pg\sin{\alpha} \right ) \quad \quad (42)

Now, we replace the bracketed term on the right-hand side of Equation 41 using the equation of motion of the pendulum obtained in Equation 31 to get

E˙=12mpLpAα˙cosα(43)\dot{E} = \frac{1}{2}m_pL_pA\dot{\alpha}\cos{\alpha} \quad \quad (43)

Substituting Equation 42 in Equation 38, the time rate of change of the selected Lyapunov equation becomes

V˙=(EEr)E˙=12mpLp(EEr)Aα˙cosα(44)\dot{V} = \left( E - E_r \right) \dot{E} = \frac{1}{2}m_pL_p \left( E - E_r \right)A \dot{\alpha}\cos{\alpha} \quad \quad (44)

Now, we need to select AA such that V˙<0\dot{V}<0 for asymptotic stability. This can easily be achieved by selecting AA as

A=(EEr)α˙cosα(45)A = -\left( E - E_r \right) \dot{\alpha}\cos{\alpha} \quad \quad (45)

With the above selection of control law for the pivot acceleration, Equation 43 becomes

V˙=12mpLp[(EEr)α˙cosα]2(46)\dot{V} = -\frac{1}{2}m_pL_p \left [ \left( E - E_r \right) \dot{\alpha}\cos{\alpha} \right ]^2 \quad \quad (46)

which guarantees V˙<0\dot{V}<0.

The selected control law (Equation 44) will continuously decrease the difference between current energy (EE) and the energy of the pendulum in the vertically up position (ErE_r). Note that the selected control law is nonlinear, it changes sign for 90<α<27090^{\circ}<\alpha<270^{\circ}and α˙<0\dot{\alpha}<0.

Now, for the quickest change in energy, we may want to use the maximum controller input (acceleration of the pivot), i.e.,

A=Amaxsign[(EEr)α˙cosα](47)A = -A_{\text{max}}\text{sign}\left [ \left( E - E_r \right) \dot{\alpha}\cos{\alpha} \right ] \quad \quad (47)

but this controller can lead to chattering. Instead, we use

A=satA,max(μ(EEr)sign(α˙cosα))(48)A = -\text{sat}_{A, \text{max}}(\mu \left( E-E_r \right ) \text{sign}(\dot{\alpha}\cos{\alpha})) \quad \quad (48)

where μ\mu is a tunable controller gain.

Recall that the acceleration of the pendulum pivot is related to the torque applied on the rotary arm

τ=mrLrA(49)\tau = m_rL_rA \quad \quad (49)

Additionally, from Equation 9 of the balance controller design section, we have

τ=ktRm(Vmkmθ˙)(50)\tau = \frac{k_t}{R_m}\left( V_m - k_m\dot{\theta} \right ) \quad \quad (50)

Then, the voltage supplied to the rotary base motor is obtained by combining Equations 49 and 50 as

Vm=RmmrLrAkt+kmθ˙(50)V_m = \frac{R_mm_rL_rA}{k_t} + k_m\dot{\theta} \quad \quad (50)

Where from Equation 48, A=satA,max(μ(EEr)sign(α˙cosα))A = -\text{sat}_{A, \text{max}}(\mu \left( E-E_r \right ) \text{sign}(\dot{\alpha}\cos{\alpha}))

The selected nonlinear control law will swing up the pendulum from the downward position towards the upright position. Once the pendulum is near the upright position, it is balanced around the fully upward position using the linear balance controller.

Combined Balance and Swing-up Control

The energy-based swing-up control can be combined with the balancing control in Equation 29 to obtain a control law that performs the dual tasks of swinging up the pendulum and balancing it. This can be accomplished by switching between the two control systems.

Basically, the same switching implemented for the balance control in Equation 30 is used. Only instead of feeding 0 V when the balance control is not enabled, the swing-up control is engaged. The controller therefore becomes

Vm={K(xdx),if α<ϵRmmrLrAηgKgηmkt+Kgkmθ˙,otherwise(51)V_m = \begin{cases} K(x_d - x), & \text{if}\ |\alpha| < \epsilon \\ \displaystyle \frac{R_mm_rL_rA}{\eta_gK_g\eta_mk_t} + K_gk_m\dot{\theta}, & \text{otherwise} \end{cases} \quad \quad (51)

where A=satA,max(μ(EEr)sign(α˙cosα))A = -\text{sat}_{A, \text{max}}(\mu \left( E-E_r \right ) \text{sign}(\dot{\alpha}\cos{\alpha}))

The parameter ϵ\epsilon in Equation 51 is a user-selected range of α\alpha over which the balance controller becomes active.

Experiment: Implementing the Swing up Control

  1. Run the Part D: Swing-Up Demonstration part of the rotpen_group.mlx script.

  2. The q_rotpen_swingup.slx Simulink model should open automatically.

  3. Check if the correct gain K value is loaded onto the workspace.

  4. To build the model, click the down arrow on Monitor & Tune under the Hardware tab and then click Build for monitoring . This generates the controller code.

  5. Press Connect button under Monitor & Tune and Press Start . The pendulum should be moving back and forth slowly. Gradually increase μ\mu until the pendulum goes up. You may do this by increasing the gain slider. When the pendulum swings up to the vertical upright position, the balance controller should engage and balance the link.

  6. After the link swings up and is balanced, wait for ~5 seconds and stop the SIMULINK.

  7. OPTIONAL: Save the data_alpha, data_theta, and data_Vm. Ensure that the data variables have 10 seconds of data saved.

Results for Report

A) Modeling

  1. The linear state-space representation of the rotary inverted pendulum system, i.e., AA, BB, CC and DD matrices (numerical values).

  2. Open-loop poles of the system.

B.2) Pole Placement Controller Design

  1. Chosen ζ\zeta and ωn\omega_n based on design specifications.

  2. Corresponding locations of the two dominant poles p1p_1 and p2p_2.

  3. Gain vector KK.

B.3) Pole Placement Controller Simulation

  1. Plots of the commanded position of the rotary arm (θc\theta_c), simulated responses of the rotary arm (θ\theta), pendulum (α\alpha), and motor input voltage (VmV_m) generated using your obtained gain K.

  2. Are Design Specifications 3 and 4 satisfied? Justify using the measured maximum pendulum deflection and motor input voltage values.

C.1) Pole Placement Controller Implementation

  1. From Step B.3.10, plots of the commanded position of the rotary arm (θc\theta_c), experimental responses of the rotary arm (θ\theta), pendulum (α\alpha), and motor input voltage (u(Vm)u(V_m)) generated using the chosen gain K.

  2. Are Design Specifications 3 and 4 satisfied? Justify using the measured maximum pendulum deflection and motor input voltage values.

Questions for Report

A) Modeling

  1. Based on your open-loop poles found in Result A.2, is the system stable, marginally stable, or unstable?

  2. Did you expect the stability of the inverted pendulum to be as what was determined? Justify.

B.2) Pole Placement Controller Design

For the questions below, calculations and intermediate steps must be shown.

  1. Determine the controllability matrix TT of the system. Is the inverted pendulum system controllable? Hint: Use Equation 17.

  2. Using the open-loop poles, find the characteristic equation of AA. Hint: The roots of the characteristic equation are the open-­loop poles.

    Instead of using det(sIA)=0\mathrm{det}(sI - A) = 0 , characteristic polynomials can also be found using MATLAB function poly().

  3. Find the corresponding companion matrices A~\tilde{A} and B~\tilde{B} . Hint: For A~\tilde{A} , use the characteristic equation of A found in Question B.1.2 and Equation 19. For B~\tilde{B}, use Equation 20.

  4. Determine the controllability matrix T~\tilde{T} of the companion system.

  5. Determine the transformation matrix WW.

  6. Check if A~=W1AW\tilde{A} = W^{-1}AW and B~=W1B\tilde{B} = W^{-1}B with the obtained matrices.

  7. Using the locations of the two dominant poles, p1p_1 and p2p_2, based on the specifications (Result B.1.1), and the other poles at p3=30p_3 = -30 and p4=40p_4 = -40, determine the desired closed-loop characteristic equation. Hint: The roots of the closed-loop characteristic equation are the closed-­loop poles.

  8. When applying the control u=K~xu = -\tilde{K}x to the companion form, it changes (A~,B~)(\tilde{A}, \tilde{B}) to (A~B~K~,B~)(\tilde{A}-\tilde{B}\tilde{K}, \tilde{B}). Find the gain K~\tilde{K} that assigns the poles to their new desired location. Hint: Use Equation 26 and find the corresponding characteristic equation. Compare this equation with the desired closed-loop characteristic equation found in Question B.1.7 to determine the gain vector K~\tilde{K}.

  9. Once you have found K~\tilde{K}, find KK using Step 3 in Pole Placement Theory.

  10. Compare the gain vector KKcalculated using Pole Placement Theory (Question B.1.9) with the gain vector KK obtained using MATLAB (Result B.1.3).

D) Swing-Up Demonstration

Briefly summarize the swing-up controller experiment and your observations. Did the swing-up control behave as you expected?

References

[1] Norman S. Nise. Control Systems Engineering. John Wiley & Sons, Inc., 2008. (Improper citation, will fix.)

[2] TBA (Will fix)

[3] K. J. Åström and K. Furuta. Swinging up a pendulum by energy control. 13th IFAC World Congress, 1996. (Improper citation, will fix)

Last updated

Was this helpful?