Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_test_checks.py is reusing the same TMP name for overlapping values #45296

Closed
qcolombet opened this issue May 15, 2020 · 12 comments
Closed
Labels
bugzilla Issues migrated from bugzilla llvm-tools All llvm tools that do not have corresponding tag

Comments

@qcolombet
Copy link
Collaborator

Bugzilla Link 45951
Resolution FIXED
Resolved on Jun 15, 2021 03:41
Version trunk
OS All
Attachments Reproducer
CC @gregbedwell,@MaskRay,@LebedevRI,@RKSimon,@rotateright

Extended Description

I found a bug where the update script will reuse the same temporary name for two variables that overlap.

Essentially the test case looks like:
tmp1 = …
%1 = …
= %1
= %tmp1

And update_test_checks.py will assign TMP1 to both %1 and %tmp1.

To reproduce:
update_test_checks.py --opt-binary opt wrong_update.orig.ll

Result:
The generated REGEX will have something like:
; CHECK-NEXT: [[TMP1:%.]] = call float @​eggs(<4 x float> getelementptr inbounds ([2 x <4 x float>], [2 x <4 x float>]* @​global.5, i32 0, i32 0), <4 x float>* @​global)
<...>
; CHECK-NEXT: [[TMP1:%.]] = call i32 @​wibble(i16 2, i16 1, i16 -1, i16 0, i16 20, i16 0, i16 0)
; CHECK-NEXT: [[TMP4:%.
]] = bitcast i32 [[TMP1]] to <4 x i8>
<...>
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0

This is wrong, since the second TMP1 use (happening on the definition of TMP9) should use the value from the first TMP1.

@LebedevRI
Copy link
Member

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

@RKSimon
Copy link
Collaborator

RKSimon commented May 16, 2020

At the very least we need to warn+bailout when there is clash in the use of %tmp* names

@qcolombet
Copy link
Collaborator Author

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

If that’s a prerequisite, couldn’t the script run it?

@qcolombet
Copy link
Collaborator Author

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

BTW, how is this supposed to work if the test case was instnamered but the optimization we're testing creates implicit variables.
Won't we have the same blending of named vs. implicit variables?

@rotateright
Copy link
Contributor

This is expected.
One is supposed to run -instnamer on testcases beforehand,
and do s/%tmp/%t/.

BTW, how is this supposed to work if the test case was instnamered but the
optimization we're testing creates implicit variables.
Won't we have the same blending of named vs. implicit variables?

Yes, this has been terrible from the start, and it's probably my fault. Roman's suggestion in comment 1 includes manually doing "s/%tmp/%t/".

Here's a one-line fix that would reduce potential conflict practically to zero and might encourage better test writing. The only downside is noise when updating an existing test file, but that can be overcome with an NFC pre-commit.

diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index a3fca4905f3..630bd091dab 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -221,7 +221,7 @@ IR_VALUE_RE = re.compile(r'(\s+)%([\w.-]+?)([,\s()]|\Z)')

Create a FileCheck variable name based on an IR name.

def get_value_name(var):
if var.isdigit():

  • var = 'TMP' + var
  • var = 'NAMELESS' + var
    var = var.replace('.', '')
    var = var.replace('-', '
    ')
    return var.upper()

@qcolombet
Copy link
Collaborator Author

diff --git a/llvm/utils/UpdateTestChecks/common.py
b/llvm/utils/UpdateTestChecks/common.py
index a3fca4905f3..630bd091dab 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -221,7 +221,7 @@ IR_VALUE_RE =
re.compile(r'(\s+)%([\w.-]+?)([,\s()]|\Z)')

Create a FileCheck variable name based on an IR name.

def get_value_name(var):
if var.isdigit():

  • var = 'TMP' + var
  • var = 'NAMELESS' + var
    var = var.replace('.', '')
    var = var.replace('-', '
    ')
    return var.upper()

If we want to avoid noise, couldn't we check if the name is already used and if so bump the digit?
Note: I haven't look at the script so no idea how practical that is.

@rotateright
Copy link
Contributor

If we want to avoid noise, couldn't we check if the name is already used and
if so bump the digit?
Note: I haven't look at the script so no idea how practical that is.

That would still cause some noise and potential confusion because then the FileCheck names could get out-of-sync with the IR names; for example, "[[TMP1]]" corresponds to "%0".

Here's a minimal example to show the problem:

define i8 @​blam(i8 %x) {
; CHECK-LABEL: @​blam(
; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP0:%.]] = add i8 [[X:%.]], 1
; CHECK-NEXT: [[TMP0:%.*]] = sub i8 [[X]], 1
; CHECK-NEXT: ret i8 [[TMP0]] <-- wrong!
;
bb:
%0 = add i8 %x, 1
%tmp0 = sub i8 %x, 1
ret i8 %0
}

The root problem is that FileCheck variable names have a naming limitation that is not in LLVM IR - FileCheck names can't start with a digit:
"string variable names can be formed with the regex [a-zA-Z_][a-zA-Z0-9_]*"

So we need to fix that in FileCheck or work around it in the script.

A quick look at the script didn't turn up an obvious solution for me. The script is sequentially parsing the file, but anonymous IR variable names don't necessarily occur in sequential order even in relatively "normal" IR files (consider phis and loops):

define void @​small_tc(float* %A, float* %B) {
; CHECK-LABEL: @​small_tc(
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[TMP2:%.]]
; CHECK: tmp2:
; CHECK-NEXT: [[INDVARS_IV:%.
]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP2:%.]], [[TMP2]] ]
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[A:%.]], align 4
; CHECK-NEXT: [[TMP1:%.
]] = load float, float* [[B:%.]], align 4
; CHECK-NEXT: [[ADD:%.
]] = fadd fast float [[TMP0]], [[TMP1]]
; CHECK-NEXT: store float [[ADD]], float* [[A]], align 4
; CHECK-NEXT: [[TMP2]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i64 [[TMP2]], 8
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.
]], label [[TMP2]]
; CHECK: for.end:
; CHECK-NEXT: ret void
;
entry:
br label %tmp2

tmp2:
; NAME CONFLICT BETWEEN NOT-YET-DEFINED VALUE AND LABEL HERE
%indvars.iv = phi i64 [ 0, %entry ], [ %2, %tmp2 ]
%0 = load float, float* %A, align 4
%1 = load float, float* %B, align 4
%add = fadd fast float %0, %1
store float %add, float* %A, align 4
%2 = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %2, 8
br i1 %exitcond, label %for.end, label %tmp2

for.end:
ret void
}

That also means we can't error out on a name conflict. The script would have to gain knowledge about IR semantics that it doesn't currently have. Even a warning would be too noisy IMO because it would fire on standard patterns like the loop above even if there was no "%tmp2" variable?

So I'd go with the hack and live with reduced name collision risk. :)
I can leave a code comment at least and link back to this bug report if someone wants to fix this better.

@rotateright
Copy link
Contributor

To make this slightly less awful, we can add a warning as Simon suggested:

diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index a3fca4905f3..a2e9787253c 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -218,10 +218,12 @@ SCRUB_IR_COMMENT_RE = re.compile(r'\s*;.*')

spaces, commas, paren, or end of the string

IR_VALUE_RE = re.compile(r'(\s+)%([\w.-]+?)([,\s()]|\Z)')

+NAMELESS_PREFIX = "NAMELESS"
+

Create a FileCheck variable name based on an IR name.

def get_value_name(var):
if var.isdigit():

  • var = 'TMP' + var
  • var = NAMELESS_PREFIX + var
    var = var.replace('.', '')
    var = var.replace('-', '
    ')
    return var.upper()
    @@ -243,6 +245,8 @@ def genericize_check_lines(lines, is_analyze, vars_seen):

into defs, and variables we have seen into uses.

def transform_line_vars(match):
var = match.group(2)

  • if NAMELESS_PREFIX.lower() in var.lower():
  •  warn("Change IR value name '%s' to prevent possible conflict with scripted FileCheck name." % (var,))
    
    if var in vars_seen:
    rv = get_value_use(var)
    else:

@rotateright
Copy link
Contributor

I almost fell back into the 'tmp' variable trap on a recent commit. I'll post the patch for review for more feedback.

@rotateright
Copy link
Contributor

@rotateright
Copy link
Contributor

Avoid the name collision with:
bfdc255

@rotateright
Copy link
Contributor

https://reviews.llvm.org/D80584

Avoid the name collision with:
https://github.com/llvm/llvm-project/commit/
bfdc255

For the record, we effectively reverted the above patch with:
https://reviews.llvm.org/rGe5b8772756737e41cb1e8ee1a5a33cb3d8a25be6

But we kept a warning from the script and updated -instnamer to reduce odds for conflict. See the review for details.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm-tools All llvm tools that do not have corresponding tag
Projects
None yet
Development

No branches or pull requests

4 participants