21148 – Induction Variable Simplification Pass promotes i32 operations to i64

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 21148 - Induction Variable Simplification Pass promotes i32 operations to i64

Summary: Induction Variable Simplification Pass promotes i32 operations to i64

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Tom Stellard

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-10-03 09:13 PDT by Tom Stellard
Modified:	2016-03-24 20:17 PDT (History)
CC List:	6 users (show)

See Also:
Fixed By Commit(s):

Attachments
Test Case (1.06 KB, text/plain) 2014-10-03 09:13 PDT, Tom Stellard	Details
Test case with only one pointer (956 bytes, text/plain) 2014-10-03 09:26 PDT, Tom Stellard	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Stellard 2014-10-03 09:13:10 PDT

Created attachment 13130 [details]
Test Case

I've run into the issue where the Induction Variable Simplification Pass will
promote i32 operations to i64, which negatively impacts performance on AMDGPU targets.

The problem occurs when the induction variable is used for address calculations of pointers with different sizes.   For example:


kernel void test(global int *out, int a) {

  private int array[16];
...
  for (int i = 0; i < 16; i++)
    out[i * 3 + a] = array[i];
}

Here global pointers are 64-bit and private pointers are 32-bit.  In this case the indvars pass tries to promote the induction variable to 64 bits which causes the address calculations for the out pointer to be 64-bit operations.

I've attached an LLVM IR test case the demonstrates the issue.

Comment 1 Tom Stellard 2014-10-03 09:26:48 PDT

Created attachment 13131 [details]
Test case with only one pointer

I looked into this a little more, and it doesn't require two different sized pointers to hit this bug.  Here is a test case that uses a single 64-bit pointer.

Comment 2 Andrew Trick 2014-10-24 17:04:24 PDT

Sorry, I meant to respont to this long ago.

If you only really care about 64 vs 32 bit multiply, I think this can
be fixed without a target hook, because it will be generally good for
any architecture.

In isSimpleIVUser, just check if the SCEV expression is a multiply with more than one non-constant operand.

Just run LLVM test suite on x86 to verify no significant regressions.

 I hope that works...

Comment 3 Jingyue Wu 2014-10-27 14:59:31 PDT

Hi Andrew, 

I think checking if the SCEV is a multiply would work for my cases, and seems to work for Tom's example. It can be a quick work around. 

However, the general problem behind the scene still remains in the long term. Just think about a slight variation of my example:

__attribute__((global)) void foo(int n, int *output) {
  for (int i = 0; i < n; i += 3) {                    
    output[i + 2] = i + 4;                            
  }                                                   
}                                                     

"i" is not involved in any multiply, but widening it to i64 is still a bad idea for the NVPTX backend. We may still want the target-hook approach in a long term. 

Jingyue

Comment 4 Andrew Trick 2014-10-27 15:04:21 PDT

If your target never wants to widen induction variables, then a target hook makes perfect sense.

Checking for multiply by non-constant makes sense in general.

They're both good fixes.

Comment 5 Jingyue Wu 2014-10-27 15:08:16 PDT

Agreed. Are you sending out the patch soon? Sounds like it's ready already. Thanks for working on this!

Comment 6 Andrew Trick 2014-10-27 15:27:39 PDT

No, I don't have a patch. I think either fix would be easy though.

Comment 7 Tom Stellard 2014-10-28 00:38:31 PDT

(In reply to comment #4)
> If your target never wants to widen induction variables, then a target hook
> makes perfect sense.
> 

I would prefer adding this target hook, because I don't think we ever want to widen the induction variable for R600.

I'm curious though, in what scenarios is it beneficial to widen induction variables?


> Checking for multiply by non-constant makes sense in general.
> 
> They're both good fixes.

Comment 8 Jingyue Wu 2014-11-13 13:19:39 PST

Hi Tom, 

r221799 fixed the issue on the NVPTX side by having IndVarSimplify check the cost of the widen operation. Are you going to take it over and implement the getArithmeticInstrCost interface in AMDGPUTargetTransformInfo? I would be happy to do that but I don't have specific knowledge on the cost model of AMDGPU. 

Thanks,
Jingyue

Comment 9 Matt Arsenault 2016-03-24 20:17:26 PDT

Should be fixed by r264376