
Reputation
Badges 1
53 × Eureka!True, the Hyperparameters tab does show the environment (sorry, my bad). The repo information, or the uncommitted changes don't show up for me 😞 . What version did you use?
Try updating to 1.1.0?
ah nice, I'll try auto_connect_frameworks
(probably with {'joblib': False}
? - we don't use scikit-learn)
Yeah, I experienced the same issue. Training stopps / freezes at the end of the 10th, or 15th epoch. Using pytorch_lightning
as well.
I apologize 😳 . Turns out I have all those things. Again, my apologies 😞 .
Not sure how to check that tbh. Does this help:root@aea5d96a8ed3:/usr/agent# clearml-agent --version CLEARML-AGENT version 1.0.0
Would be nice to display this info maybe somewhere inhere:
Hi again. After looking into the matter a little bit, I realise I'd have liked having the option of using a StoreManager
ABC which I would implement myself using whatever storage provider I happen to use and whatever package versions happened to support it. To put it differently, instead of you implementing managers for gcs, azure, aws, etc, it would be a much nicer alternative (for me, and I suspect eventually for you too) for clearml's store manager to wrap whatever object the user pr...
Hi Jake, thanks for the reply. I've tried the account key method, works fine - but unfortunately clearml expects an old version of azure-storage-blob
(<2.1), which is incompatible with the recent versions (^12.). Any clues of how we could work around this one? Thanks again.
` radu on vm-aimbrain-01 in experiments/runners/all via 🐍 v3.8.5 via C volt
❯ git ls-remote --get-url github
github
radu on vm-aimbrain-01 in experiments/runners/all via 🐍 v3.8.5 via C volt
❯ git ls-remote --get-url
fatal: No remote configured to list refs from.
radu on vm-aimbrain-01 in experiments/runners/all via 🐍 v3.8.5 via C volt
❯ git --version
git version 2.17.1 `
there is - it's called "github"
Hi AgitatedDove14 , I deleted everything in /opt/clearml as per the docs. Should I delete anything else?
Hi @<1523701087100473344:profile|SuccessfulKoala55> ,
thanks for the pointers.
I didn't know that the plot data is stored in elasticsearch. Good to know. It relates to the rest of my questions in that I want to understand where everything is saved, all the parts of my experiments. The plots are actually the most important part, since I have direct access to the artifacts I save (like, say, models) but not to the plot data which helps me compare and rank experiments. I mention tensorboard be...
I didn't add that to the script since the effect is persistent (i.e. it only needs to be done once, right?) In any case, I checked that multiple times and it was as expected.
Must be something else foul at play here..
The UI shows the log as is (and as pasted above). In the console I'm getting correct output (a single tqdm progress line):
` [2021-09-17 13:29:51,860][pytorch_lightning.utilities.distributed][INFO] - GPU available: True, used: True
[2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - TPU available: False, using: 0 TPU cores
[2021-09-17 13:29:51,862][pytorch_lightning.utilities.distributed][INFO] - IPU available: False, using: 0 IPUs
[2021-09-17 13:29:51,866][pytorch_ligh...
Hi SweetBadger76 , thanks, I think I've made it work. The main point of confusion was between dealing with different type of Task
objects (i.e. clearml.backend_api.services.v2_13.tasks.Task
returned by get_all
, which don't have any of those methods).
Interestingly, set_parameters
didn't just work as expected, I had to flatten the dicts myself (which clearml apparently does on its own when I call set_parameters
on a new task.
Thank you all. 🙏
This is how the links to the artifacts looks like (the part I blurred out is is the last part of the secret, which is working fine since the task was able to upload those correctly to storage, I can check that):
Also I just tried the pytorch-lightning RichProgressBar
(not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.
Hi Martin, it is a tqdm parameter (the default ProgressBar
in pytorch lightning is unfortunately relying on tqdm). This is from the tqdm docs:dynamic_ncols : bool, optional If set, constantly alters
ncolsand
nrows` to the
environment (allowing for window resizes) [default: False].
nrows : int, optional
The screen height. If specified, hides nested bars outside this
bound. If unspecified, attempts to use environment...
Sorry to ping you @<1523701087100473344:profile|SuccessfulKoala55> , can you offer any ideas to the two questions from my reply (about the correct web app cloud access and the correct way to specify a blob storage in the clearml.conf
file? Thanks 🙏