From: Dave Rodgman Date: Wed, 5 Dec 2018 00:14:23 +0000 (+1100) Subject: lib/lzo: tidy-up ifdefs X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=fec83a3c62d7fff4f6af248b8f98c1e15457a427;p=users%2Fwilly%2Flinux.git lib/lzo: tidy-up ifdefs Patch series "lib/lzo: performance improvements", v4. This series introduces performance improvements for lzo. The previous version of this patchset is here: https://lkml.org/lkml/2018/11/21/625 This version tidies up the ifdefs as per Christoph's comment (although certainly more could be done, this is at least a bit more consistent with normal kernel coding style). On 23/11/2018 2:12 am, Sergey Senozhatsky wrote: >> The graph below shows the weighted round-trip throughput of lzo, lz4 and >> lzo-rle, for randomly generated 4k chunks of data with varying levels of >> entropy. (To calculate weighted round-trip throughput, compression performance >> is emphasised to reflect the fact that zram does around 2.25x more compression >> than decompression. > > Right. The number is data dependent. Not all swapped out pages can be > compressed; compressed pages that end up being >= zs_huge_class_size() are > considered incompressible and stored as it. > > I'd say that on my setups around 50-60% of pages are incompressible. So, just to give a bit more detail: the test setup was a Samsung Chromebook Pro, cycling through 80 tabs in Chrome. With lzo-rle, only 5% of pages increased in size, and 90% of pages compress to 75% of original size (or better). Mean compression ratio was 41%. Importantly for lzo-rle, there are a lot of low-entropy pages where it can do well: in total about 20% of the data is zeros forming part of a run of 4 or more bytes. As a quick summary of the impact of these patches on bigger chunks of data, I've compared the performance of four different variants of lzo on two large (~40 MB) files. The numbers show round-trip throughput in MB/s: Variant | Low-entropy | High-entropy Current lzo | 242 | 157 Arm opts | 290 | 159 RLE | 876 | 151 Arm opts + RLE | 1150 | 181 So both the Arm optimisations (8,16-byte copy & CTZ patches), and the RLE implementation make a significant contribution to the overall performance uplift. This patch (of 8): Modify the ifdefs in lzodefs.h to be more consistent with normal kernel macros (e.g., change __aarch64__ to CONFIG_ARM64). Link: http://lkml.kernel.org/r/20181127161913.23863-2-dave.rodgman@arm.com Signed-off-by: Dave Rodgman Cc: Herbert Xu Cc: David S. Miller Cc: Nitin Gupta Cc: Richard Purdie Cc: Markus F.X.J. Oberhumer Cc: Minchan Kim Cc: Sergey Senozhatsky Cc: Sonny Rao Cc: Greg Kroah-Hartman Cc: Matt Sealey Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h index 4edefd2f540c..497f9c9f03a8 100644 --- a/lib/lzo/lzodefs.h +++ b/lib/lzo/lzodefs.h @@ -15,7 +15,7 @@ #define COPY4(dst, src) \ put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst)) -#if defined(__x86_64__) +#if defined(CONFIG_X86_64) #define COPY8(dst, src) \ put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst)) #else @@ -25,12 +25,12 @@ #if defined(__BIG_ENDIAN) && defined(__LITTLE_ENDIAN) #error "conflicting endian definitions" -#elif defined(__x86_64__) +#elif defined(CONFIG_X86_64) #define LZO_USE_CTZ64 1 #define LZO_USE_CTZ32 1 -#elif defined(__i386__) || defined(__powerpc__) +#elif defined(CONFIG_X86) || defined(CONFIG_PPC) #define LZO_USE_CTZ32 1 -#elif defined(__arm__) && (__LINUX_ARM_ARCH__ >= 5) +#elif defined(CONFIG_ARM) && (__LINUX_ARM_ARCH__ >= 5) #define LZO_USE_CTZ32 1 #endif