About VoLTE POLQA Testing


To measure real speech quality in VoLTE environment by means of Perceptual Objective Listening Quality Analysis (POLQA). The measurements are based on ITU-T P.863 algorithm. It compares a reference signal X(t) with a degraded signal Y(t) where Y(t) is the result of passing X(t) through a communication system. The result is MOS value in the range of 1-5. Higher the number better the quality of speech. The X(t) and Y(t) are a portion of speech in raw PCM format with sampling rate 8 or 16 KHz and communication system (SUT) is LTE component(s) and possibly IMS Network.

 

A two-arm testing example is shown below:

  1.  

POLQA is enabled on a manner similar to how Perceptual Evaluation of Video Quality (PEVQ) is enabled. Voice frame similar to Video frame in “QoM”.

Enable (POLQA) and “Enable (VMAF)” are mutually inclusive: do allow both controls being ON at the same time.

RTP Session on behalf of UE1 generates voice traffic (X(t) -> RTP packets) and UE2 collects receiving RTP packets and transforms them to Y(t).  If some of the packets are missed or not delivered on time then appropriate portion of speech (20, 40, or 60 milliseconds) in Y(t) signal fill in by “Comfort Noise”. Finally QoM of Y(t) signal will be evaluated against X(t) by means of  the POLQA algorithm. The result of the evaluation is MOS value in the range of 1-4.5 if narrow-band (300Hz – 3400HZ) listening quality scale is used (MOS-LQOn) or in 1-4.75 range if superwide-band (50Hz – 14000Hz) quality scale is used (MOS‑LQOsw).

 

As the Landslide acts behind Base Station (eNodeB) it is likely (but not necessary) that Y(t) voice signal will not be distorted by analog or any other means. As the reference signal X(t) passes through true digital communication system like SGW and PGW some of RTP packets carrying portions of X(t) may lost or reaching final destination late. Typically in the Landslide VoLTE test environment POLQA will be applied to Y(t) signal with potentially missed 20/40/60 milliseconds portions of speech. To create a condition for packet lost/late arrival POLQA measurements should be combined with massive RTP packet load. Only some of the entire RTP sessions will be chosen to perform POLQA measurements (POLQA sessions). In fact, from 40K RTP sessions of VoLTE Test 100 POLQA sessions can be chosen.

 

POLQA measurements should be applicable to both: one- & two-arm testing. 

 

In two-arm testing the SUT (SGW or/and PGW) is true digital communication system that may encounter just RTP packets lost or late arrival to the destination. To create this condition POLQA measurements should be combined with intensive RTP voice traffic: some of the UEs receiving RTP traffic should be designated to perform POLQA measurements.

 

There are two VoLTE’s two arm-testing environment:

·         Nodal vs Nodal (UE vs UE)

·         Nodal vs IMS Node (IMS simulates landline phone of mobile device of other vendor)

 

In one-arm testing the X(t) signal can be distorted by analog means. In these test cases the combination of high RTP traffic and POLQA measurements may not be necessary.

There are two VoLTE’s one-arm testing environments where the X(t) can be distorted

 

 

The POLQA specific tests are based on the RTP traffic generated/processed by “rtpvoice” DMF only. Landslide supports POLQA calculations for all the CODECs listed in “rtpvoice” DMF.

Parameters from “rtpvoice DMF” | RTP Voice :

·         Frames per RTP packet : Support for any number of Frames per RTP packet for CODECs that Landslide supports.

·         Jitter Buffer Size : There is no limitation to JBS (Jitter Buffer Size). This value is under full user control. It means that if an RTP packet will reach received side with a delay greater than the JBS value,  the packet will be ignored as having arrived too late; a comfort noise will be applied to an appropriate time slot (of 20/40/60 or whatever millisecond value). 

X(t) and Y(t) Signals as input for POLQA

 

POLQA (ITU-T P.863) measures speech quality by comparison of a reference signal X(t) with a degraded signal Y(t) where the Y(t) is the result of passing X(t) through a communication subsystem.

 

X(t) and Y(t) are vectors of PCM samples. For telephony sampling, the rate can be 8 KHz, 16 KHz or 48KHz. Only OPUS and EVS Codecs support a sampling rate of 48Khz.

 

X(t) is produced from WAVE files (see “rtpvoice” DMF) by stripping off the WAVE header and possibly converting from 8bits A-law or µ-Law speech presentation to 16 bits samples.

 

X(t) is encoded by codec and then packetized to RTP Stream (set of RTP packets). One side of RTP session transmits the packets according the RTP schedule (20, 40, or 60 milliseconds for AMR and AMR-WB codecs).  The other side of the RTP session receives RTP packets, decodes its payload to PCM notation, and collects this small portion of speech in jitter buffer. If all the packets of X(t) are received the POLQA algorithm is applied. Otherwise the Jitter Buffer algorithm is waiting for configured Jitter Buffer Size for the speech portions not arrived yet. After that time all the missed portions in Y(t) will be filled in by “Comfort Noise” and then the POLQA algorithm applies.

 

The ITU-T P.863 algorithm is known to produce accurate results for X(t) signals that meets the following requirements:

 

·         Each reference speech file should consist of two or more sentences separated by a gap of at least 1 s but not more than 2 s

·         The minimum amount of active speech in each file should be 3 s

·         Reference speech files should have a sufficient leading and trailing silence intervals to avoid clippings of the speech signal, e.g., 200 ms of silence each

 

I.e. minimal recommended speech size (X(t)) should be at least 3+1+2*.02 = 4.4 seconds.

 

In the name of the POLQA speech files

The TDF WAVE file library should meet this requirement. There are four POLQA Specific Speech Files in the TDF Library :

Even in ideal situation Reference (X(t)) and Degraded signal (Y(t)) encoded and then decoded by AMR-NB or AMR-WB  codec are not equal: there were no RTP packets lost and late arrival. The tables below shows the ideal MOS values for AMR & AMR-WB codecs, swept over Codec’s Bit rate, for the speech files of 6 & 8 second length given from the POLQA TDF library.

BrEnglish_NB_f1s4_f1s2_6s.wav

Codec Bit Rate (Kbps)

MOS-LQOn (Narrowband)

4.75

3.690

5.15

3.812

5.90

3.798

6.70

3.964

7.40

4.092

7.95

4.114

10.20

4.213

12.20

4.335

 

BrEnglish_WB_f1s4_f1s2_6s.wav

Codec Bit Rate (Kbps)

MOS-LQOn (Narrowband)

MOS-LQOsw (Superwideband)

6.60

3.637

3.341

8.85

3.999

3.676

12.65

4.296

3.981

14.25

4.368

4.121

15.85

4.370

4.114

18.25

4.446

4.182

19.85

4.481

4.230

23.05

4.500

4.214

23.85

4.473

4.502

 

 

 

To make POLQA possible - “Reserve Resources To Perform (POLQA)” @ “TS Configuration” .  If the Control is ON then the cores will be redistributed according the Table 3.

Requested License

Data Gen Performance

Number of

TS Processes

Number of cores per TS process

Core Available

Legacy

C100 Standard

1

1 Main

 1 Receive

1 POLQA

12

C100 Performance

3

8

C100 Extreme

3

8

Max Mode

C100 Standard

1

1 Main + 1 Receive +

1 POLQA

9

C100 Performance

3

1 Main + 3 Receive +

1 POLQA

0

C100 Extreme

3

0