From: Nishanth Menon Date: Wed, 7 Jul 2021 20:48:40 +0000 (-0500) Subject: scripts/spdxcheck.py: Strictly read license files in utf-8 X-Git-Tag: dma-mapping-5.15~146^2~4 X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=40751c6c9bea6a5cfede7c61ee5f3cb1ab857029;p=users%2Fhch%2Fdma-mapping.git scripts/spdxcheck.py: Strictly read license files in utf-8 Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license") unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text While python will barf at it with: FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128) Traceback (most recent call last): File "scripts/spdxcheck.py", line 244, in spdx = read_spdxdata(repo) File "scripts/spdxcheck.py", line 47, in read_spdxdata for l in open(el.path).readlines(): File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128) While it is indeed debatable if 'Licensor.' used in the license file needs unicode quotes, instead, force spdxcheck to read utf-8. Reported-by: Rahul T R Signed-off-by: Nishanth Menon Reviewed-by: Thomas Gleixner Link: https://lore.kernel.org/r/20210707204840.30891-1-nm@ti.com Signed-off-by: Jonathan Corbet --- diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py index 3e784cf9f401..ebd06ae642c9 100755 --- a/scripts/spdxcheck.py +++ b/scripts/spdxcheck.py @@ -44,7 +44,7 @@ def read_spdxdata(repo): continue exception = None - for l in open(el.path).readlines(): + for l in open(el.path, encoding="utf-8").readlines(): if l.startswith('Valid-License-Identifier:'): lid = l.split(':')[1].strip().upper() if lid in spdx.licenses: