Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. #20

wza13 · 2024-05-13T09:09:51Z

D:\Users\12719\anaconda3\python.exe D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py
20%|██ | 20/100 [00:01<00:06, 12.66it/s, mse_loss=nan, reg_loss=nan]
Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
20%|██ | 20/100 [00:02<00:08, 9.82it/s, mse_loss=nan, reg_loss=nan]
Traceback (most recent call last):
File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 35, in
test_mul()
File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 29, in test_mul
optimizer.step(closure)
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\optimizer.py", line 459, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\lbfgs.py", line 320, in step
orig_loss = closure()
^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 18, in closure
y = kan(x, update_grid=(i % 20 == 0))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 272, in forward
layer.update_grid(x)
File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 210, in update_grid
self.spline_weight.data.copy(self.curve2coeff(x, unreduced_spline_output))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 131, in curve2coeff
solution = torch.linalg.lstsq(
^^^^^^^^^^^^^^^^^^^
RuntimeError: false INTERNAL ASSERT FAILED at "C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

LIWEIDENG0830 · 2024-05-13T12:28:52Z

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

Indoxer · 2024-05-13T17:26:30Z

D:\Users\12719\anaconda3\python.exe D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py 20%|██ | 20/100 [00:01<00:06, 12.66it/s, mse_loss=nan, reg_loss=nan] Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. 20%|██ | 20/100 [00:02<00:08, 9.82it/s, mse_loss=nan, reg_loss=nan] Traceback (most recent call last): File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 35, in test_mul() File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 29, in test_mul optimizer.step(closure) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\optimizer.py", line 459, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\lbfgs.py", line 320, in step orig_loss = closure() ^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 18, in closure y = kan(x, update_grid=(i % 20 == 0)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 272, in forward layer.update_grid(x) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 210, in update_grid self.spline_weight.data.copy(self.curve2coeff(x, unreduced_spline_output)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 131, in curve2coeff solution = torch.linalg.lstsq( ^^^^^^^^^^^^^^^^^^^ RuntimeError: false INTERNAL ASSERT FAILED at "C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

Sounds like KindXiaoming/pykan#170. changing driver in code may help.

LIWEIDENG0830 · 2024-05-14T03:14:57Z

D:\Users\12719\anaconda3\python.exe D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py 20%|██ | 20/100 [00:01<00:06, 12.66it/s, mse_loss=nan, reg_loss=nan] Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. 20%|██ | 20/100 [00:02<00:08, 9.82it/s, mse_loss=nan, reg_loss=nan] Traceback (most recent call last): File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 35, in test_mul() File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 29, in test_mul optimizer.step(closure) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\optimizer.py", line 459, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\optim\lbfgs.py", line 320, in step orig_loss = closure() ^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\tests\test_simple_math.py", line 18, in closure y = kan(x, update_grid=(i % 20 == 0)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 272, in forward layer.update_grid(x) File "D:\Users\12719\anaconda3\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 210, in update_grid self.spline_weight.data.copy(self.curve2coeff(x, unreduced_spline_output)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Users\12719\PycharmProjects\efficient-kan\src\efficient_kan\kan.py", line 131, in curve2coeff solution = torch.linalg.lstsq( ^^^^^^^^^^^^^^^^^^^ RuntimeError: false INTERNAL ASSERT FAILED at "C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

Sounds like KindXiaoming/pykan#170. changing driver in code may help.

Hi Indoxer, thanks for your kind help! It looks like the same problem with in pykan. However, I try to change the driver in lstsq as solution = torch.linalg.lstsq( A, B, driver='gelsy' ).solution and run on CPU. It does not work in my situation.

Xu-backup · 2024-05-14T04:09:52Z

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

LIWEIDENG0830 · 2024-05-14T04:18:50Z

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

Okkkk. Thanks Xu. It works!

boxaio · 2024-05-14T13:34:14Z

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

the above error happened when updating the grid, so how is this related to the explosion of B?

Xu-backup · 2024-05-14T14:18:49Z

Hi bro, do you solve this problem? I have the same output when running the test_simple_math.py.

This is because the learning rate is too high(lr = 1) in that example and B turns to Nan in learning. Try to turn it lower may help you fix it.

the above error happened when updating the grid, so how is this related to the explosion of B?

I am not actually find why it happend. But i find B = y.transpose(0, 1) in the code, firstly y turns nan, so it maybe some places have been divided by a number close to 0. Because in high lr you may easily get a abnormal param.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. #20

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. #20

wza13 commented May 13, 2024

LIWEIDENG0830 commented May 13, 2024

Indoxer commented May 13, 2024

LIWEIDENG0830 commented May 14, 2024

Xu-backup commented May 14, 2024

LIWEIDENG0830 commented May 14, 2024

boxaio commented May 14, 2024

Xu-backup commented May 14, 2024

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. #20

Intel oneMKL ERROR: Parameter 6 was incorrect on entry to SGELSY. #20

Comments

wza13 commented May 13, 2024

LIWEIDENG0830 commented May 13, 2024

Indoxer commented May 13, 2024

LIWEIDENG0830 commented May 14, 2024

Xu-backup commented May 14, 2024

LIWEIDENG0830 commented May 14, 2024

boxaio commented May 14, 2024

Xu-backup commented May 14, 2024