Reputation
Badges 1
25 × Eureka!Hmm I just tested on the community version and it seems to work there, Let me check with frontend guys. Can you verify it works for you on https://app.community.clear.ml/ ?
Hi ExcitedCat13
Sure, download the plugin from the git repo (Install instructions in the repo).
Regarding remote debugging, are referring to ssh ?
The plugin itself is designed to make sure that when you work on a remote machine with pycharm clearml will log the local git repo and changes (as the .git folder is not synced to the remote machine)
WittyOwl57 could it be the EC2 instance is too small (i.e. not enough storage / memory) ?
I understand I can change the docker image for a component in the pipeline, but for the
it isnβt possible.
you can always to Task.current_task.connect() from the pipeline function itself, to connect more configuration arguments you basically add via the function itself, all the pipeline logic function arguments become pipeline arguments, it's kind of neat π regrading docker, the idea is that you use a very basic python docker (the default for services) queue for all...
StickyBlackbird93 the agent is supposed to solve for the correct version of pytorch based on the Cuda in the container. Sounds like for some reason it fails? Can you provide the log of the Task that failed? Are you running the agent in docker-mode , or inside a docker?
That's the right place but
like you would use hydra --override, which in your case I think it should be "accelerator.gpu" ,
You can also change allow_omegaconf_editin the UI to True, and then you could just edit the OmegaConf in the UI (if you do not changeallow_omegaconf_edit` then the edit in the UI is ignored)
SmarmySeaurchin8 regrading (2)
I'm not sure the current visualization supports it. I mean we can put "{}", but that would imply you can edit it, which then we have to support, possible but weird, and this is why:task.connect({'a':{},'b': {'nested': 'value}}will become'a' = '{}''b/nested' = 'value'
But then if you edit to:'a' = '{'nested': 'value'}''b/nested' = 'value'
you have two different ways of presenting the same type of structure...
Hi DepressedChimpanzee34
if you try to extend it more then the width of the column to the right, it doesn't do anything..
You mean outside of the window? or are you saying you cannot extend it?
Just verifying, we are talking about the latest version of clearml-server ?
Hmm I tested on chromium and it seemed to work, let me see if I can reproduce it...
Hi @<1558624430622511104:profile|PanickyBee11>
You mean this is not automatically logged? do you have a callback that logs it in HF?
Hi FierceFly22
Hi, does anyone know where trains stores tensorboard data
Tesnorboard data is stored wherever you point your file-writer to π
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...
Wait, how do I reproduce it on community server? Maybe it has something to do with number of columns ? Or whether it is already wider than the screen? What's your browser / OS ?
Hi WittyOwl57
That's actually how it works (original idea/design was borrowed from libclound), basically you need to create a Drive, then the storage manger will use it.
Abstract class here:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L51
Is this what you had in mind ?
I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything.
Sure man π no rush, I appreciate the gesture regardless of the outcome
Many thanks!
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
Hi ItchyJellyfish73
The behavior should not have changed.
"force_repo_requirements_txt" was always a "catch all option" to set a behavior for an agent, but should generally be avoided
That said, I think there was an issue with v1.0 (cleaml-server) where when you cleared the "Installed Packages" it did not actually cleared it, but set it to empty.
It sounds like the issue you are describing.
Could you upgrade the clearml-server and test?
Hi GiddyTurkey39
First, yes you can just edit the "installed packages" section and add any missing package (this is equal to requirements.txt)
I wonder why trains failed detecting the "bigquery" package in the first place... Any thoughts ?
With offline mode,
Later if you need you can actually import the execution (including artifacts etc.) you just need the zip file it creates when you are done.
OK, so if I've got, like, 2x16GB GPUs ...
You could do:clearml-agent daemon --queue "2xGPU_32gb" --gpus 0,1Which will always use the two gpus for every Task it pulls
Or you could do:clearml-agent daemon --queue "1xGPU_16gb" --gpus 0 clearml-agent daemon --queue "1xGPU_16gb" --gpus 1Which will have two agents, one per GPU (with 16gb per Task it runs)
Orclearml-agent daemon --queue "2xGPU_32gb" "1xGPU_16gb" --gpus 0,1Which will first pull Tasks from the "2xGPU_32gb" qu...
Set it on the PID of the agent process itself (i.e. the clearml-agent python process)
like this.. But when I am cloning the pipeline and changing the parameters, it is running on default parameters, given when pipeline was 1st run
Just making sure, you are running the cloned pipeline with an agent. correct?
What is the clearml version you are using?
Is this reproducible with the pipeline example ?
Hi JitteryCoyote63
experiments logs ...
You mean the console outputs ?
Thanks!
Hmm from here : None
Could it be you do not have privileges to the resource, or that you did not provide credentials ?
Did that autoscaler work before ?
@<1585078763312386048:profile|ArrogantButterfly10> could it be that in the "base task" of the pipeline step, you do not have any hyper-parameter ? (I mean the Task that the pipeline clones and is supposed to set new hyperparameters for...)
Hi PerplexedGoat65
it appears, in a practical sense, this means to mount the second drive, and then bind them in ClearMLβs configuration
Yes, the entire data folder (reason is, if you loose it, you loose all the server storage / artifacts)
Also, thinking about Docker and slower access speed for Docker mounts and such,
If the host OS is linux, you have nothing to worry about, speed will be the same.
Well done man!
Could I just build it and log these parameters using
task.set_parameters()
so that I call
task.get_parameters()
later?
instead of manually calling set/get, you call task.connect(some_dict_or_object) , it does both:
When running manually (i.e. without an agent) it logs the keys/values on the Task,
when running with an agents, it takes the values from the backend (Task) and sets them on the dict/object
Make sense ?
Hi ItchyHippopotamus18
The iteration reporting is automatically detected if you are using tensorboard, matplotlib, or explicitly with trains.Logger
I'm assuming there were no reports, so the monitoring falls back to report every 30seconds where the iterations are seconds from start" (the thing is, this is a time series, so you have to have an X axis...)
Make sense ?