SolidSealion72

1 Question, 3 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

Eureka!

Questions 1
Answers 3

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Suggestion

Suggestion : ClearML to clear "/tmp" directory once in a while from it's own created files Explanation: We just encountered an error (very uninformative) "OS...

clearml

3 years ago

0 Suggestion

Hi AgitatedDove14
It appears that /tmp was not cleared, and in addition we upload many large artifacts through clearml.

I am not sure not if the /tmp was not cleared by clearml or pytorch. Since both seem to utilize the tmp folder for storing files. My error anyway was generated by Pytorch:
https://discuss.pytorch.org/t/num-workers-in-dataloader-always-gives-this-error/64718

The /tmp was full, and pytorch tried moving the /tmp to a local directory which is a network nfs drive, hence the...

3 years ago

0 Sometimes I Notice That At The End Of An Experiment Clearml Keeps Hanging (Something With Repository Detection?) And The Script Does Not End. Do More People See This? Especially In Our Continuous Integration Pipeline This Give Problems Because Tests Are G

AgitatedDove14 I managed to reproduce on Ubuntu (but not on Windows):
Not every run gets stuck, sometimes it's 1 in 10 runs that gets stuck.
https://github.com/maor121/clearml-bug-reproduction

3 years ago

AgitatedDove14 Also, I found out that adding "pool.join()" after pool.close() seem to solve the issue in the minimal example.

3 years ago