FFmpeg/libavcodec
Martin Storsjö f43079e11c aarch64: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.

These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.

The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.

Examples of runtimes vs the 32 bit version, on a Cortex A53:
                                       ARM  AArch64
vp9_inv_adst_adst_4x4_add_neon:       90.0     87.7
vp9_inv_adst_adst_8x8_add_neon:      400.0    354.7
vp9_inv_adst_adst_16x16_add_neon:   2526.5   1827.2
vp9_inv_dct_dct_4x4_add_neon:         74.0     72.7
vp9_inv_dct_dct_8x8_add_neon:        271.0    256.7
vp9_inv_dct_dct_16x16_add_neon:     1960.7   1372.7
vp9_inv_dct_dct_32x32_add_neon:    11988.9   8088.3
vp9_inv_wht_wht_4x4_add_neon:         63.0     57.7

The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.

Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
                                A57 gcc-5.3   neon
vp9_inv_adst_adst_4x4_add_neon:       152.2   60.0
vp9_inv_adst_adst_8x8_add_neon:       948.2  288.0
vp9_inv_adst_adst_16x16_add_neon:    4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon:         153.0   58.6
vp9_inv_dct_dct_8x8_add_neon:         789.2  180.2
vp9_inv_dct_dct_16x16_add_neon:      3639.6  917.1
vp9_inv_dct_dct_32x32_add_neon:     20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon:          91.0   49.8

The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.

This is an adapted cherry-pick from libav commit
3c9546dfafcdfe8e7860aff9ebbf609318220f29.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
..
2016-11-15 15:10:03 -05:00
2016-10-29 20:43:15 +02:00
2016-10-21 23:58:47 +02:00
2016-11-03 22:05:46 +01:00
2016-09-26 21:42:17 +02:00
2016-09-12 15:57:51 -05:00
2016-10-09 20:09:00 +02:00
2016-08-26 02:10:42 +02:00
2016-10-21 23:58:47 +02:00
2016-09-21 13:40:04 +02:00
2016-11-14 00:33:12 +01:00
2016-11-09 21:10:59 +01:00
2016-08-25 02:40:59 +02:00
2016-10-21 23:58:47 +02:00
2016-08-18 23:36:18 +02:00
2016-09-15 21:48:28 +02:00
2016-09-15 21:48:28 +02:00
2016-09-09 16:35:37 +02:00
2016-08-19 14:24:13 +02:00
2016-08-28 11:18:16 +02:00
2016-08-28 11:18:16 +02:00
2016-11-05 18:56:26 +01:00
2016-10-21 23:58:47 +02:00
2016-09-03 14:23:32 +02:00
2016-09-17 13:23:56 +01:00
2016-09-17 13:23:56 +01:00
2016-09-17 13:23:56 +01:00
2016-11-07 00:51:49 +01:00
2016-08-18 15:30:05 +02:00
2016-11-12 01:36:47 +01:00
2016-10-21 23:58:47 +02:00
2016-09-28 12:35:58 +01:00
2016-10-31 19:23:40 +00:00
2016-10-31 19:23:40 +00:00
2016-11-10 01:18:43 +01:00
2016-10-31 23:20:31 +01:00
2016-09-08 22:16:35 +02:00
2016-08-06 18:27:01 -03:00
2016-08-06 18:27:01 -03:00
2016-08-06 18:27:01 -03:00