Reputation
Badges 1
981 × Eureka!This is what I get with mprof on this snippet above (I killed the program after the bar reaches 100%, otherwise it hangs trying to upload all the figures)
on /data or /opt/clearml? these are two different disks
AgitatedDove14 I think it’s on me to take the pytorch distributed example in the clearml repo and try to reproduce the bug, then pass it over to you 🙂
Ho and also use the colors of the series. That would be a killer feature. Then I simply need to match the color of the series to the name to check the tags
AgitatedDove14 yes! I now realise that the ignite events callbacks seem to not be fired (I tried to print a debug message on a custom Events.ITERATION_COMPLETED) and I cannot see it logged
even if I move the Github workers internally where they could have access to the prod server, I am not sure I would like that, because it would pile up test data in the prod server that is not necessary
Here is (left) the data disk (/opt/clearml) and right the OS disk
SuccessfulKoala55 Here is the trains-elastic error
The only thing that changed is the new auth.fixed_users.pass_hashed field, that I don’t have in my config file
(by console you mean in the dashboard right? or the terminal?)
No, I want to launch the second step after the first one is finished and all its artifacts are uploaded
Not sure about that, I think you guys solved it with your PipelineController implementation. I would need to test it before giving any feedback 🙂
so that any error that could arise from communication with the server could be tested
as it's also based on pytorch-ignite!
I am not sure to understand, what is the link with pytorch-ignite?
We're in the brainstorming phase of what are the best approaches to integrate, we might pick your brain later on
Awesome, I'd be happy to help!
Yes, I would like to update all references to the old bucket unfortunately… I think I’ll simply delete the old s3 bucket, wait or his name to be available again and recreate it where on the other aws account and move the data there. This way I don’t have to mess with clearml data - I am afraid to do something wrong and loose data
From https://discuss.pytorch.org/t/please-help-me-understand-installation-for-cuda-on-linux/14217/4 it looks like my assumption is correct: there is no need for cudatoolkit to be installed since wheels already contain all cuda/cudnn libraries required by torch
Ok, so what worked for me in the end was:config = task.connect_configuration(read_yaml(conf_path)) cfg = OmegaConf.create(config._to_dict())
I came up with the same code, thanks for the fast answer (yes having a setter for that would be cool!)
no, one worker (trains-agent-1) "forget from time to time" the current experiment he is running and picks another experiment on top of the one he is currently running
might be worth documenting 😄
So it seems like it doesn't copy /root/clearml.conf and it doesn't pass the environment variables (CLEARML_API_HOST, CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY)