Martin Storsjö
f43079e11c
aarch64: vp9: Add NEON itxfm routines
...
This work is sponsored by, and copyright, Google.
These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.
The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.
Examples of runtimes vs the 32 bit version, on a Cortex A53:
ARM AArch64
vp9_inv_adst_adst_4x4_add_neon: 90.0 87.7
vp9_inv_adst_adst_8x8_add_neon: 400.0 354.7
vp9_inv_adst_adst_16x16_add_neon: 2526.5 1827.2
vp9_inv_dct_dct_4x4_add_neon: 74.0 72.7
vp9_inv_dct_dct_8x8_add_neon: 271.0 256.7
vp9_inv_dct_dct_16x16_add_neon: 1960.7 1372.7
vp9_inv_dct_dct_32x32_add_neon: 11988.9 8088.3
vp9_inv_wht_wht_4x4_add_neon: 63.0 57.7
The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.
Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
A57 gcc-5.3 neon
vp9_inv_adst_adst_4x4_add_neon: 152.2 60.0
vp9_inv_adst_adst_8x8_add_neon: 948.2 288.0
vp9_inv_adst_adst_16x16_add_neon: 4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon: 153.0 58.6
vp9_inv_dct_dct_8x8_add_neon: 789.2 180.2
vp9_inv_dct_dct_16x16_add_neon: 3639.6 917.1
vp9_inv_dct_dct_32x32_add_neon: 20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon: 91.0 49.8
The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.
This is an adapted cherry-pick from libav commit
3c9546dfafcdfe8e7860aff9ebbf609318220f29.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2016-11-15 15:10:03 -05:00
..
2016-11-15 15:10:03 -05:00
2016-11-15 15:10:03 -05:00
2016-10-23 03:23:09 +02:00
2016-11-03 01:23:36 +01:00
2016-11-15 11:01:36 -05:00
2016-10-29 20:43:15 +02:00
2016-10-30 14:15:00 +01:00
2016-10-18 21:41:18 +01:00
2016-10-21 23:58:47 +02:00
2016-10-18 21:41:18 +01:00
2016-10-18 21:41:18 +01:00
2016-10-18 21:41:18 +01:00
2016-10-18 21:41:18 +01:00
2016-10-18 21:41:18 +01:00
2016-10-18 21:41:18 +01:00
2016-11-08 00:50:51 +00:00
2016-11-08 00:50:51 +00:00
2016-09-15 12:18:55 +02:00
2016-11-03 22:05:46 +01:00
2016-09-15 12:18:55 +02:00
2016-09-15 17:24:40 +02:00
2016-08-28 11:18:16 +02:00
2016-10-30 15:47:37 -07:00
2016-09-09 11:50:12 -03:00
2016-09-26 21:42:17 +02:00
2016-09-12 15:57:51 -05:00
2016-10-09 20:09:00 +02:00
2016-11-14 02:35:26 +01:00
2016-10-26 17:36:12 -03:00
2016-09-22 08:37:46 +02:00
2016-09-20 21:36:04 +02:00
2016-08-26 02:10:42 +02:00
2016-08-13 12:46:18 +02:00
2016-10-21 23:58:47 +02:00
2016-10-22 13:46:05 +02:00
2016-09-23 18:18:18 +02:00
2016-09-09 11:01:24 +02:00
2016-08-19 22:28:32 +02:00
2016-10-30 14:15:00 +01:00
2016-11-04 11:19:47 -08:00
2016-11-02 13:47:57 -07:00
2016-10-30 15:47:37 -07:00
2016-09-21 13:40:04 +02:00
2016-11-04 20:35:23 +01:00
2016-09-19 02:33:58 +02:00
2016-10-21 14:17:50 +02:00
2016-11-01 21:02:26 +01:00
2016-11-14 00:33:12 +01:00
2016-11-10 21:01:59 +01:00
2016-10-26 19:50:53 +02:00
2016-10-10 16:05:14 +02:00
2016-10-20 19:31:34 +02:00
2016-09-15 12:18:55 +02:00
2016-11-09 21:10:59 +01:00
2016-08-25 02:40:59 +02:00
2016-10-21 23:58:47 +02:00
2016-08-08 00:27:43 +02:00
2016-08-08 00:32:09 +02:00
2016-08-08 00:32:09 +02:00
2016-08-16 23:04:00 +02:00
2016-08-08 00:32:09 +02:00
2016-08-16 23:06:02 +02:00
2016-10-25 02:51:34 +02:00
2016-08-28 11:18:16 +02:00
2016-09-14 15:50:07 +02:00
2016-09-07 15:56:13 +02:00
2016-10-07 13:03:36 +02:00
2016-08-18 23:36:18 +02:00
2016-10-24 01:20:18 -05:00
2016-09-12 12:33:44 +02:00
2016-08-03 20:07:21 -07:00
2016-08-23 15:05:06 +02:00
2016-08-18 23:36:18 +02:00
2016-09-23 04:10:44 +02:00
2016-09-23 04:10:44 +02:00
2016-08-20 00:40:43 +02:00
2016-08-20 00:36:38 +02:00
2016-11-10 14:27:38 +00:00
2016-11-04 11:19:47 -08:00
2016-11-10 14:27:38 +00:00
2016-09-15 21:48:28 +02:00
2016-09-15 21:48:28 +02:00
2016-10-31 19:23:40 +00:00
2016-11-14 10:36:25 +01:00
2016-11-03 16:28:04 +01:00
2016-09-09 16:35:37 +02:00
2016-11-12 03:23:03 +01:00
2016-10-30 15:38:44 +01:00
2016-08-15 13:21:47 +02:00
2016-10-26 17:36:12 -03:00
2016-08-19 14:24:13 +02:00
2016-10-01 17:22:02 +02:00
2016-11-01 00:39:06 +01:00
2016-10-25 04:46:02 +02:00
2016-11-15 18:27:31 +01:00
2016-10-04 18:38:41 +01:00
2016-08-02 22:32:12 -07:00
2016-10-30 15:38:44 +01:00
2016-10-18 20:19:29 +02:00
2016-10-07 13:03:36 +02:00
2016-10-07 13:23:49 +02:00
2016-10-13 21:04:19 +02:00
2016-10-14 16:56:14 +02:00
2016-10-07 13:03:36 +02:00
2016-10-21 18:16:46 -07:00
2016-09-14 17:52:50 +02:00
2016-11-04 02:45:51 +01:00
2016-08-28 11:18:16 +02:00
2016-08-28 11:18:16 +02:00
2016-11-05 18:56:26 +01:00
2016-11-14 12:01:17 +01:00
2016-08-22 16:41:33 +02:00
2016-08-22 16:41:33 +02:00
2016-10-19 10:50:56 +02:00
2016-10-19 10:50:56 +02:00
2016-10-19 09:52:15 +02:00
2016-10-21 23:58:47 +02:00
2016-10-19 10:50:56 +02:00
2016-10-19 10:50:56 +02:00
2016-10-19 10:50:56 +02:00
2016-10-19 10:50:56 +02:00
2016-09-03 14:23:32 +02:00
2016-09-17 13:23:56 +01:00
2016-09-17 13:23:56 +01:00
2016-09-17 13:23:56 +01:00
2016-09-21 16:28:14 +02:00
2016-09-21 16:26:55 +02:00
2016-10-07 13:03:36 +02:00
2016-08-22 16:41:33 +02:00
2016-11-15 15:08:20 +01:00
2016-11-07 00:51:49 +01:00
2016-08-04 16:40:51 -03:00
2016-11-14 10:42:36 +01:00
2016-10-21 19:41:29 +02:00
2016-11-14 10:42:36 +01:00
2016-11-07 19:41:17 +01:00
2016-10-22 13:46:46 +02:00
2016-10-31 19:23:40 +00:00
2016-11-01 17:44:10 +01:00
2016-08-22 16:41:33 +02:00
2016-10-07 13:23:38 +02:00
2016-10-31 00:19:02 +01:00
2016-10-31 01:08:45 +01:00
2016-10-12 20:51:43 +02:00
2016-10-19 12:45:52 +02:00
2016-11-05 18:09:03 +11:00
2016-10-12 20:51:43 +02:00
2016-09-21 14:37:25 +02:00
2016-11-13 22:29:04 +01:00
2016-11-08 14:18:59 +00:00
2016-11-08 14:18:59 +00:00
2016-11-08 14:18:59 +00:00
2016-11-08 14:18:59 +00:00
2016-09-01 23:53:24 +02:00
2016-11-08 14:18:59 +00:00
2016-11-08 14:18:59 +00:00
2016-11-08 14:18:59 +00:00
2016-11-08 14:18:59 +00:00
2016-10-24 01:20:18 -05:00
2016-08-18 15:30:05 +02:00
2016-11-01 18:49:28 +01:00
2016-10-28 01:53:52 +02:00
2016-11-12 01:36:47 +01:00
2016-11-12 01:36:47 +01:00
2016-11-10 21:00:44 +01:00
2016-10-21 23:58:47 +02:00
2016-09-28 12:35:58 +01:00
2016-10-31 00:36:12 +01:00
2016-11-14 15:19:43 +01:00
2016-11-14 15:20:00 +01:00
2016-11-14 15:20:00 +01:00
2016-10-31 19:23:40 +00:00
2016-10-31 19:23:40 +00:00
2016-11-14 15:20:09 +01:00
2016-11-14 15:19:51 +01:00
2016-10-31 19:23:40 +00:00
2016-08-18 23:36:18 +02:00
2016-11-04 18:56:01 +00:00
2016-10-31 19:23:40 +00:00
2016-10-31 00:27:45 +01:00
2016-08-28 11:18:16 +02:00
2016-11-10 01:18:43 +01:00
2016-11-04 23:38:56 -03:00
2016-08-13 01:44:52 +02:00
2016-09-23 17:15:49 +02:00
2016-11-14 12:03:00 +01:00
2016-11-14 12:03:00 +01:00
2016-10-31 23:20:31 +01:00
2016-08-13 01:35:10 +02:00
2016-10-20 09:55:52 +02:00
2016-10-31 23:20:47 +01:00
2016-11-12 01:36:47 +01:00
2016-10-22 13:46:58 +02:00
2016-11-02 12:06:22 +01:00
2016-10-22 13:45:59 +02:00
2016-09-08 22:16:35 +02:00
2016-11-14 12:16:32 +01:00
2016-10-05 13:49:17 +02:00
2016-11-14 12:32:08 +01:00
2016-09-28 13:19:07 -03:00
2016-10-31 23:20:47 +01:00
2016-08-06 18:27:01 -03:00
2016-08-06 18:27:01 -03:00
2016-08-06 18:27:01 -03:00
2016-11-10 17:44:47 +01:00
2016-09-04 15:51:33 +02:00
2016-09-04 01:57:50 +02:00
2016-10-25 13:44:08 +02:00
2016-08-05 21:24:54 +01:00
2016-08-05 21:24:54 +01:00
2016-11-13 20:39:48 +00:00
2016-11-13 17:38:40 +01:00
2016-10-31 19:23:40 +00:00
2016-10-21 16:54:25 +02:00
2016-08-04 16:40:51 -03:00
2016-11-13 22:29:04 +01:00
2016-10-18 19:51:42 -04:00
2016-11-14 15:21:24 +01:00
2016-11-13 12:38:15 +01:00
2016-11-15 15:10:03 -05:00
2016-11-15 15:10:03 -05:00
2016-11-15 15:10:03 -05:00
2016-11-15 15:10:03 -05:00
2016-08-21 21:06:17 +02:00