Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. #20

Open
wza13 opened this issue May 13, 2024 · 7 comments
Open

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. #20

wza13 opened this issue May 13, 2024 · 7 comments

Comments

@wza13
Copy link

wza13 commented May 13, 2024

D:\Users\12719\anaconda3\python.exe D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py
20%|██ | 20/100 [00:01<00:06, 12.66it/s, mse_loss=nan, reg_loss=nan]
Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
20%|██ | 20/100 [00:02<00:08, 9.82it/s, mse_loss=nan, reg_loss=nan]
Traceback (most recent call last):
File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 35, in
test_mul()
File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 29, in test_mul
optimizer.step(closure)
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\optimizer.py", line 459, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\lbfgs.py", line 320, in step
orig_loss = closure()
^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 18, in closure
y = kan(x, update_grid=(i % 20 == 0))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 272, in forward
layer.update_grid(x)
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 210, in update_grid
self.spline_weight.data.copy
(self.curve2coeff(x, unreduced_spline_output))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 131, in curve2coeff
solution = torch.linalg.lstsq(
^^^^^^^^^^^^^^^^^^^
RuntimeError: false INTERNAL ASSERT FAILED at "C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

@LIWEIDENG0830
Copy link

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

@Indoxer
Copy link

Indoxer commented May 13, 2024

D:\Users\12719\anaconda3\python.exe D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py 20%|██ | 20/100 [00:01<00:06, 12.66it/s, mse_loss=nan, reg_loss=nan] Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. 20%|██ | 20/100 [00:02<00:08, 9.82it/s, mse_loss=nan, reg_loss=nan] Traceback (most recent call last): File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 35, in test_mul() File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 29, in test_mul optimizer.step(closure) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\optimizer.py", line 459, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\lbfgs.py", line 320, in step orig_loss = closure() ^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 18, in closure y = kan(x, update_grid=(i % 20 == 0)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 272, in forward layer.update_grid(x) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 210, in update_grid self.spline_weight.data.copy(self.curve2coeff(x, unreduced_spline_output)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 131, in curve2coeff solution = torch.linalg.lstsq( ^^^^^^^^^^^^^^^^^^^ RuntimeError: false INTERNAL ASSERT FAILED at "C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

Sounds like KindXiaoming/pykan#170. changing driver in code may help.

@LIWEIDENG0830
Copy link

D:\Users\12719\anaconda3\python.exe D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py 20%|██ | 20/100 [00:01<00:06, 12.66it/s, mse_loss=nan, reg_loss=nan] Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. 20%|██ | 20/100 [00:02<00:08, 9.82it/s, mse_loss=nan, reg_loss=nan] Traceback (most recent call last): File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 35, in test_mul() File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 29, in test_mul optimizer.step(closure) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\optimizer.py", line 459, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\lbfgs.py", line 320, in step orig_loss = closure() ^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 18, in closure y = kan(x, update_grid=(i % 20 == 0)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 272, in forward layer.update_grid(x) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 210, in update_grid self.spline_weight.data.copy(self.curve2coeff(x, unreduced_spline_output)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 131, in curve2coeff solution = torch.linalg.lstsq( ^^^^^^^^^^^^^^^^^^^ RuntimeError: false INTERNAL ASSERT FAILED at "C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

Sounds like KindXiaoming/pykan#170. changing driver in code may help.

Hi Indoxer, thanks for your kind help! It looks like the same problem with in pykan. However, I try to change the driver in lstsq as solution = torch.linalg.lstsq( A, B, driver='gelsy' ).solution and run on CPU. It does not work in my situation.

@Xu-backup
Copy link

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

@LIWEIDENG0830
Copy link

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

Okkkk. Thanks Xu. It works!

@boxaio
Copy link

boxaio commented May 14, 2024

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

the above error happened when updating the grid, so how is this related to the explosion of B?

@Xu-backup
Copy link

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

the above error happened when updating the grid, so how is this related to the explosion of B?

I am not actually find why it happend. But i find B = y.transpose(0, 1) in the code, firstly y turns nan, so it maybe some places have been divided by a number close to 0. Because in high lr you may easily get a abnormal param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants