About VoLTE VMAF Testing


 

VMAF (Video Multi-method Assessment Fusion) is FR (Full Reference) tool to calculate MOS and auxiliary metrics of Video data transmitting through WiFi or 4G/5G Mobile Networks. It calculates the metrics by applying Reference (Original) Video file (signal) and the Degraded Video signal to the VMAF algorithm. It means that the Reference file must be known to the side receiving video traffic. So, Two-armed testing model is expecting: that is either two Landslide (LS) emulating subscribers (if SIP proxy) or one subscriber + one endpoint (if LS emulates a VoLTE Nodal and IMS Node/Endpoint).

The entire VMAF Calculation Model is shown below:

For the Flowchart simplicity UE1 shown as data transmitter and UE2 receives the video data and performs VMAF calculation.  But the schema is fully symmetrical: both sides, UE1 as well as UE2, can transmit, receive, and accomplish VMAF Quality measurements.

    

The .AVI (rawvideo format) provides more accurate estimation of Video Quality. If rtpvideo DMF provides encoded video, then the Landslide decodes it to rawvideo presentation and uses it as the Reference Video Signal. It slightly lows the Quality metrics. However, usage of the encoded format makes sense as it significantly reduces input video file (50 times at least) against the rawvideo format. For the Video files of higher resolution (VGA, HD, and beyond) the encoded format is the only option as its rawvideo size are beyond 100 MB even for VGA. In fact, for rawvideo video of 640x480, fps 25, duration 8.8 seconds its size is 194MB.  For 1080i/1080p the rawvideo is of almost 600MB. So, the TAC Reports will be over-weighted and unmanageable. 

Video file format

In a purpose of VMAF Quality measurement the Landslide will support the following Video formats:

# Format Codec Comment
1 rawvideo N/A Applicable for CIF (352x288) and QCIF (176x144) video formats only. It uses as the Reference Video Signal.
2 encoded H.263

 

The video files will be decoded to rawvideo format, and it will be used as the Reference Video Signal. 

3 encoded H.264
4 encoded H.265
5 encoded VP8

 

In contrast to PEVQ the Media TDF in case of VMAF can be configured as rawvideo and an encoded one. The file qqq.mp4 is shown as an example of encoded one.  

VMAF Reference and Degraded/Distorted Video signal presentation.

VMAF algorithm input is Reference and Degraded/Distorted files in YUV format. This format is of rawvideo flavor, and the implementation uses it internally, not exposing to GUI or whatever level.

 

VMAF & FFMPEG

Landslide PEVQ implementation is widely using FFMPEG (FFmpeg - Wikipedia) for video encoding/decoding, packetization/depacketization. To make it possible FFMPEG is built along with the specific video codec support.

VMAF Library must be also added to FFMPEG over the same integration mechanism. Finally, FFMPEG executable becomes a universal tool performing not only video data transformation but also to accomplish VMAF Quality Measurements.

VoLTE or SIP capable Test Cases

GUI Provisioning

These are the following GUI components:

1)     CPU core reservation to perform VMAF calculation: “Test Server | Configuration” - About Test Server Configuration

2)     “System Status” that reports “Remaining VMAF ECs” - About the System Status Window

3)       Actual VMAF enabler - Media Qos Settings

 

TC Nodal side:

IMS Node side:

Additional Quality Metrics is introducing to give customer more control over VMAF execution time versus Metrics types:

VMAF Resource Reservation

POLQA and VMAF share the same Resource Reservation mechanism to allocate CPU core to perform POLQA and/or VMAF calculation. - About Test Server Configuration

 

 

System Status

System Status shall include “Remaining VMAF ECs” - About the System Status Window

 

Measurements

VMAF measurements are added to L5-7 Client | Video (traf) and L5-7 Server | Video (host): 

 

Name Variable (prefix = traf, host) Highlighting Value Range Description
RTP VMAF-MOS Successes prefixRtpVideoVmafMosSuccess   0-2^64-1 counter of successful measurements
RTP VMAF-MOS Failures prefixRtpVideoVmafMosFailed Yes 0-2^64-1

counter of failed measurements

RTP VMAF-MOS Overloaded prefixRtpVideoVmafMosOverloaded Yes 0-2^64-1 counter of measurements that cannot be performed because of the overloading condition: VMAF calculation for previous Video stream is not completed but current stream is fully collected and ready for VMAF. 
RTP VMAF-MOS Average prefixRtpVideoVmafMosValue   0.0-100.0

Average/Min/Max MOS: 0.0 is worst; 100.0 is best

Reported to GUI as (int64) (float-value*1000) 

RTP VMAF-MOS Minimum prefixRtpVideoVmafMosMin  
RTP VMAF-MOS Maximum prefixRtpVideoVmafMosMax  
RTP VMAF-MOS Duration Average (s) prefixRtpVideoVmafMosTimeValue   0.0-100.0

Average/Min/Max time (sec) consumed for VMAF MOS and supplemental metrics calculation.

Reported to GUI as (int64) (float-value*1000)

RTP VMAF-MOS Duration Minimum (s) prefixRtpVideoVmafMosTimeMin  
RTP VMAF-MOS Duration Maximum (s) prefixRtpVideoVmafMosTimeMax  
RTP VMAF PSNR-Y Average (dB) prefixRtpVideoVmafPsnrYValue   0.0-50.0

Average/Min/Max PSNR-Y (Luma).

Bad: 30.00-; Poor: 30.0-33.0; Fair: 33.0-38.0; Good/Excellent: 38.0+ 

Reported to GUI as (int64) (float-value*1000).

check box with check 8bit depth is presumed in the PSNR-Y calculation.

check box with check Usage of 12/16 bit depth may expand the value range to 100dB

RTP VMAF PSNR-Y Min (dB) prefixRtpVideoVmafPsnrYMin  
RTP VMAF PSNR-Y Max (dB) prefixRtpVideoVmafPsnrYMax  
RTP VMAF PSNR-Cb Average (dB) prefixRtpVideoVmafPsnrCbValue   0.0-50.0

Average/Min/Max PSNR-Cb (U - Blue Chroma)

Bad: 30.00-; Poor: 30.0-33.0; Fair: 33.0-38.0; Good/Excellent: 38.0+ 

Reported to GUI as (int64) (float-value*1000).

check box with check 8bit depth is presumed in the PSNR-Y calculation.

check box with check Usage of 12/16 bit depth may expand the value range to 100dB

RTP VMAF PSNR-Cb Min (dB) prefixRtpVideoVmafPsnrCbMin  
RTP VMAF PSNR-Cb Max (dB) prefixRtpVideoVmafPsnrCbMax  
RTP VMAF PSNR-Cr Average (dB) prefixRtpVideoVmafPsnrCrValue   0.0-50.0

Average/Min/Max PSNR-Cr (V - Red Chroma)

Bad: 30.00-; Poor: 30.0-33.0; Fair: 33.0-38.0; Good/Excellent: 38.0+ 

Reported to GUI as (int64) (float-value*1000).

check box with check 8bit depth is presumed in the PSNR-Y calculation.

check box with check Usage of 12/16 bit depth may expand the value range to 100dB

RTP VMAF PSNR-Cr Min (dB)  prefixRtpVideoVmafPsnrCrMin  
RTP VMAF PSNR-Cr Max (dB)  prefixRtpVideoVmafPsnrCrMax  
RTP VMAF SSIM Average prefixRtpVideoVmafSsimValue   -1.0 - +1.0

Average/Min/Max SIMM:

  • 1.0 indicates perfect similarity.
  • 0.0 indicates no similarity.
  • -1.0 indicates perfect anti-correlation.

All other values are within the range.

Reported to GUI as (int64) (float-value*1000) 

RTP VMAF SSIM Min prefixRtpVideoVmafSsimMin  
RTP VMAF SSIM Max prefixRtpVideoVmafSsimMax  

 

 

Performance/capacity

 

Actual Capacity/Performance of the system depends on:

Real number of concurrent UEs performing VMAF will be known upon the implementation completion, and we will be able to accomplish VMAF benchmark.

Keep in mind the following rule: if Video file duration time is Tvideo, and VMAF calculation time is Tvmaf then 'Number of UEs' = Tvideo / Tvmaf without Overload condition (see description to RTP VMAF-MOS Overloaded OM) being reported.

If the Overloading is okay, then number of UEs performing VMAF can be extended.

For example: if Tvideo = 9 seconds and Tvmaf = 3 seconds then

 

FFMPEG is a universal tool that accomplishes all needed video data transformation as well as getting of VMAF MOS score along with auxiliary metrics (PSNR, SSIM).  

Assume for simplicity:

Encoding of Reference Raw Video Video File (.AVI) 

ffmpeg -i Ref.avi -pix_fmt yuv420p -vcodec h264 -y Ref.mp4

Packetization of Reference Encoded file

ffmpeg -v trace -i Ref.mp4 -f rtp -vcodec copy -an rtp://127.0.0.1:7000?pkt_size=1024 1>Ref.sdp 

This packetization method assumes that the Landslide application listens on 127.0.0.1/7000 and receives video data in form of RTP packets.   

Transformation of received RTP packets to Degraded Encoded file.

fmpeg -v trace -stdin -threads 1 -buffer_size 500000 -reorder_queue_size 40 \
       -protocol_whitelist "file,crypto,udp,rtp" -i Ref.sdp \
       -pix_fmt yuv420p -local_rtpport 7000 -local_rtcpport 7001 \
        -y Deg.mp4 2>/dev/null

On the receive side RTP packets after Jitter Buffer delay are sending to 127.0.0.1/7000. Once all the RTP packets that constitute video file have been sent Landslide application sends RTCP BYE to FFMPEG signaling the completion. Deg.mp4 is the result of this step.

 

Decoding of Degraded encoded file to rawvideo form (.AVI) 

ffmpeg -v trace -i Deg.mp4 -vcodec h264 -pix_fmt YUV420P -y Deg.avi 2>/dev/null

After this the final step of getting the VMAF score can be accomplished.

Getting VMAF score and optional metrics.

ffmpeg -i Ref.avi -i Deg.avi -lavfi \
     " \
       [0:v]trim=start=0.080:duration=8.000,setpts=PTS-STARTPTS[reference]; \
       [1:v]trim=start=0.000:duration=8.000,setpts=PTS-STARTPTS[degraded]; \
       [reference][degraded]libvmaf='log_fmt=xml:log_path=vmafMeas.xml: \
       feature=name=psnr|name=ssim' \
     " -f null

Per video frame results as well as averaged values are produced in vmafMeas.xml file.