Reputation
Badges 1
25 × Eureka!Sure thing! this feature is all you guys, ask and shall receive 🙂
Yes it does, but these files must be committed to begin with, basically think 'git diff' output is stored and then the agent applies it
GiganticTurtle0 is there any git redundancy on your network ? maybe you could configure a fallback server ?
If the manual execution (i.e. pycharm) was working it should have stored it on the Pipeline Task.
WickedGoat98 give me a minute, I'm not sure it is not ClearML related
You could change infrastructure or hosting, and now your data is associated with the wrong URL
Yeah that makes sense, so have it on a specific dns name? (this is usually the case with k8s deployments)
It does work about 50% of the times
EcstaticGoat95 what do you mean by "work about 50%" ? do you mean the other 50% it hangs ?
LOL EnormousWorm79 you should have a "do not show again" option, no?
that clearml-agent needs to be installed from system python mentioned anywhere in the docs, if not I suggest it gets added.
You are right, I will check and fix if not 🙂
Thank you so much for helping.
My pleasure
let me check
directly from the UI from the services queue?
Spin the agent with --service-mode
it will keep pulling jobs from the queue and spinning them (BTW, it will only start the next job after the first one finished the env setup, and you must be running with --docker mode 🙂
What probably happens is first torch is installed via "trains-agent", then it installs the other packages and they require a different version, so pip automatically replaces it.
CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.
Hi PompousParrot44
Well this kind of control is tricky. If you don't mind processes "fighting over cpu" you can just spin two trains-agents in cpu-mode. It will work as long as they have a different TRAINS_WORKER_NAME
The other option (might be a bit of an overkill) is to use K8s, which will set the CPU % for the entire agent.
What do you think?
TrickySheep9 you mean custom containers in clearml-session for remote development ?
Hi LazyTurkey38
What do you mean the git repo is not recognized? When execute_remotely leaves you should see on the task a ref to the git repo with the exact commit ID you have locally pulled, do you see it under the Execution tab?
Hi @<1673501379764686848:profile|VirtuousSeaturtle4>
What I dont get is that the example does not refer to a bucket path. What bucket path should I specify ?
you mean to store data?
ZanyPig66 is this reproducible? This sounds like a bug, whats the TB version and OS you rae using?
Is this example working for you (i.e. you see debug images)
https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future).
Hmm I see, wouldn't it make more sense to separate clients like a multi-tenant SAAS solution ?
My question is if there is an easy way to track gradients similar to
wandb.watch
@<1523705099182936064:profile|GrievingDeer61> not at the moment, but should be fairly easy to add.
Usually torch examples just use TB as a default logging, which would go directly to clearml
, but this is a great idea to add
Could probably go straight to the next version 🙂
wdyt?
They could, the problem by the time you set them,they have been read into the variables.
Maybe we should make it lazy loaded, it will also speedup the import.
basically the idea is you do not need to configure the Experiment manually, it is created when you actually develop the code / run/debug it, or you have the CLI taking everything from your machine and populating it
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)