update_test_checks.py is reusing the same TMP name for overlapping values #45296

qcolombet · 2020-05-15T23:22:34Z


Bugzilla Link	45951
Resolution	FIXED
Resolved on	Jun 15, 2021 03:41
Version	trunk
OS	All
Attachments	Reproducer
CC	@gregbedwell,@MaskRay,@LebedevRI,@RKSimon,@rotateright

Extended Description

I found a bug where the update script will reuse the same temporary name for two variables that overlap.

Essentially the test case looks like:
tmp1 = …
%1 = …
= %1
= %tmp1

And update_test_checks.py will assign TMP1 to both %1 and %tmp1.

To reproduce:
update_test_checks.py --opt-binary opt wrong_update.orig.ll

Result:
The generated REGEX will have something like:
; CHECK-NEXT: [[TMP1:%.]] = call float @eggs(<4 x float> getelementptr inbounds ([2 x <4 x float>], [2 x <4 x float>]* @global.5, i32 0, i32 0), <4 x float>* @global)
<...>
; CHECK-NEXT: [[TMP1:%.]] = call i32 @wibble(i16 2, i16 1, i16 -1, i16 0, i16 20, i16 0, i16 0)
; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP1]] to <4 x i8>
<...>
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0

This is wrong, since the second TMP1 use (happening on the definition of TMP9) should use the value from the first TMP1.

LebedevRI · 2020-05-16T09:09:12Z

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

RKSimon · 2020-05-16T09:40:38Z

At the very least we need to warn+bailout when there is clash in the use of %tmp* names

qcolombet · 2020-05-16T17:30:56Z

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

If that’s a prerequisite, couldn’t the script run it?

qcolombet · 2020-05-19T18:55:33Z

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

BTW, how is this supposed to work if the test case was instnamered but the optimization we're testing creates implicit variables.
Won't we have the same blending of named vs. implicit variables?

rotateright · 2020-05-20T12:56:14Z

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

BTW, how is this supposed to work if the test case was instnamered but the
optimization we're testing creates implicit variables.
Won't we have the same blending of named vs. implicit variables?

Yes, this has been terrible from the start, and it's probably my fault. Roman's suggestion in comment 1 includes manually doing "s/%tmp/%t/".

Here's a one-line fix that would reduce potential conflict practically to zero and might encourage better test writing. The only downside is noise when updating an existing test file, but that can be overcome with an NFC pre-commit.

diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index a3fca4905f3..630bd091dab 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -221,7 +221,7 @@ IR_VALUE_RE = re.compile(r'(\s+)%([\w.-]+?)([,\s()]|\Z)')

Create a FileCheck variable name based on an IR name.

def get_value_name(var):
if var.isdigit():

var = 'TMP' + var

var = 'NAMELESS' + var
var = var.replace('.', '')
var = var.replace('-', '')
return var.upper()

qcolombet · 2020-05-20T17:42:47Z

diff --git a/llvm/utils/UpdateTestChecks/common.py
b/llvm/utils/UpdateTestChecks/common.py
index a3fca4905f3..630bd091dab 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -221,7 +221,7 @@ IR_VALUE_RE =
re.compile(r'(\s+)%([\w.-]+?)([,\s()]|\Z)')

Create a FileCheck variable name based on an IR name.

def get_value_name(var):
if var.isdigit():

var = 'TMP' + var

var = 'NAMELESS' + var
var = var.replace('.', '')
var = var.replace('-', '')
return var.upper()

If we want to avoid noise, couldn't we check if the name is already used and if so bump the digit?
Note: I haven't look at the script so no idea how practical that is.

rotateright · 2020-05-21T13:30:17Z

If we want to avoid noise, couldn't we check if the name is already used and
if so bump the digit?
Note: I haven't look at the script so no idea how practical that is.

That would still cause some noise and potential confusion because then the FileCheck names could get out-of-sync with the IR names; for example, "[[TMP1]]" corresponds to "%0".

Here's a minimal example to show the problem:

define i8 @blam(i8 %x) {
; CHECK-LABEL: @blam(
; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP0:%.]] = add i8 [[X:%.]], 1
; CHECK-NEXT: [[TMP0:%.*]] = sub i8 [[X]], 1
; CHECK-NEXT: ret i8 [[TMP0]] <-- wrong!
;
bb:
%0 = add i8 %x, 1
%tmp0 = sub i8 %x, 1
ret i8 %0
}

The root problem is that FileCheck variable names have a naming limitation that is not in LLVM IR - FileCheck names can't start with a digit:
"string variable names can be formed with the regex [a-zA-Z_][a-zA-Z0-9_]*"

So we need to fix that in FileCheck or work around it in the script.

A quick look at the script didn't turn up an obvious solution for me. The script is sequentially parsing the file, but anonymous IR variable names don't necessarily occur in sequential order even in relatively "normal" IR files (consider phis and loops):

define void @small_tc(float* %A, float* %B) {
; CHECK-LABEL: @small_tc(
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[TMP2:%.]]
; CHECK: tmp2:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP2:%.]], [[TMP2]] ]
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.]], align 4
; CHECK-NEXT: [[TMP1:%.]] = load float, float* [[B:%.]], align 4
; CHECK-NEXT: [[ADD:%.]] = fadd fast float [[TMP0]], [[TMP1]]
; CHECK-NEXT: store float [[ADD]], float* [[A]], align 4
; CHECK-NEXT: [[TMP2]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i64 [[TMP2]], 8
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.]], label [[TMP2]]
; CHECK: for.end:
; CHECK-NEXT: ret void
;
entry:
br label %tmp2

tmp2:
; NAME CONFLICT BETWEEN NOT-YET-DEFINED VALUE AND LABEL HERE
%indvars.iv = phi i64 [ 0, %entry ], [ %2, %tmp2 ]
%0 = load float, float* %A, align 4
%1 = load float, float* %B, align 4
%add = fadd fast float %0, %1
store float %add, float* %A, align 4
%2 = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %2, 8
br i1 %exitcond, label %for.end, label %tmp2

for.end:
ret void
}

That also means we can't error out on a name conflict. The script would have to gain knowledge about IR semantics that it doesn't currently have. Even a warning would be too noisy IMO because it would fire on standard patterns like the loop above even if there was no "%tmp2" variable?

So I'd go with the hack and live with reduced name collision risk. :)
I can leave a code comment at least and link back to this bug report if someone wants to fix this better.

rotateright · 2020-05-21T14:19:01Z

To make this slightly less awful, we can add a warning as Simon suggested:

diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index a3fca4905f3..a2e9787253c 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -218,10 +218,12 @@ SCRUB_IR_COMMENT_RE = re.compile(r'\s*;.*')

spaces, commas, paren, or end of the string

IR_VALUE_RE = re.compile(r'(\s+)%([\w.-]+?)([,\s()]|\Z)')

+NAMELESS_PREFIX = "NAMELESS"
+

Create a FileCheck variable name based on an IR name.

def get_value_name(var):
if var.isdigit():

var = 'TMP' + var

var = NAMELESS_PREFIX + var
var = var.replace('.', '')
var = var.replace('-', '')
return var.upper()
@@ -243,6 +245,8 @@ def genericize_check_lines(lines, is_analyze, vars_seen):

into defs, and variables we have seen into uses.

def transform_line_vars(match):
var = match.group(2)

if NAMELESS_PREFIX.lower() in var.lower():

 warn("Change IR value name '%s' to prevent possible conflict with scripted FileCheck name." % (var,))

if var in vars_seen:
rv = get_value_use(var)
else:

rotateright · 2020-05-26T19:40:53Z

I almost fell back into the 'tmp' variable trap on a recent commit. I'll post the patch for review for more feedback.

rotateright · 2020-05-26T19:58:43Z

https://reviews.llvm.org/D80584

rotateright · 2020-05-31T14:57:48Z

Avoid the name collision with:
bfdc255

rotateright · 2021-06-15T10:41:04Z

https://reviews.llvm.org/D80584

Avoid the name collision with:
https://github.com/llvm/llvm-project/commit/
bfdc255

For the record, we effectively reverted the above patch with:
https://reviews.llvm.org/rGe5b8772756737e41cb1e8ee1a5a33cb3d8a25be6

But we kept a warning from the script and updated -instnamer to reduce odds for conflict. See the review for details.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update_test_checks.py is reusing the same TMP name for overlapping values #45296

update_test_checks.py is reusing the same TMP name for overlapping values #45296

qcolombet commented May 15, 2020

LebedevRI commented May 16, 2020

RKSimon commented May 16, 2020

qcolombet commented May 16, 2020

qcolombet commented May 19, 2020

rotateright commented May 20, 2020

qcolombet commented May 20, 2020

Create a FileCheck variable name based on an IR name.

rotateright commented May 21, 2020

rotateright commented May 21, 2020

rotateright commented May 26, 2020

rotateright commented May 26, 2020

rotateright commented May 31, 2020

rotateright commented Jun 15, 2021

update_test_checks.py is reusing the same TMP name for overlapping values #45296

update_test_checks.py is reusing the same TMP name for overlapping values #45296

Comments

qcolombet commented May 15, 2020

Extended Description

LebedevRI commented May 16, 2020

RKSimon commented May 16, 2020

qcolombet commented May 16, 2020

qcolombet commented May 19, 2020

rotateright commented May 20, 2020

Create a FileCheck variable name based on an IR name.

qcolombet commented May 20, 2020

Create a FileCheck variable name based on an IR name.

rotateright commented May 21, 2020

rotateright commented May 21, 2020

spaces, commas, paren, or end of the string

Create a FileCheck variable name based on an IR name.

into defs, and variables we have seen into uses.

rotateright commented May 26, 2020

rotateright commented May 26, 2020

rotateright commented May 31, 2020

rotateright commented Jun 15, 2021