Reputation
Badges 1
25 × Eureka!Yes this is definitely the issue, the agent assume the docker user is "root".
Let me check something
😞 It's working as expected for me...
That said I tested on Linux & pip,
Any specific req to test with? from the log I see this is conda on windows, are you using the base conda env or a venv inside conda?
Hi RoundMosquito25
How did you spin the agent (whats the cmd line? is it in docker mode or venv mode?)
From the console it seems the pip installation inside the container (based on the log this is what I assume) seems like it is stuck ?!
Maybe it's the Azure upload that has a weird size bug?!
hmm, yes it should create the queue if it's missing (btw you could work around that and create it in the UI). Any chance you can open a github issue in the clearml helm chart repo so we do not forget ?
this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash
So the encoding itself is done YAML style, and based on your example \b Has to be encoded to \b because this is string encoding, like \n will become "new line"
Make sense ?
Hi @<1797800418953138176:profile|ScrawnyCrocodile51>
Will the docker container / disk space (really I am more interested about the dataset that download by the task) get automatically clean up?
Yes, the agent is running the container with --rm 🙂
if they're mission critical, but rather the clearml cache folder?
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
My internet traffic looks wierd.I think this is because tensorboard logs too much data on each batch and ClearML send it to server. How can i fix it? My training speed decreased by 5-6 times.
BTW: ComfortableShark77 the network is being sent in background process, it should not effect the processing time, no?
DeliciousBluewhale87 out of curiosity , what do you mean by "deployment functionality" ? is it model serving ?
HandsomeCrow5 Seems like the right place would be in the artifacts, as a summary of the experiment (as opposed to on going reporting), is that the case?
If it is then in the Artifacts tab clicking on the artifact should open another tab with your summary, which sounds like what you were looking for (with the exception of the preview thumbnail 🙂
DeliciousBluewhale87 and is it working?
Sure thing, and I agree it seems unlikely to be an issue 🙂
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?
Worker just installs by name from pip, and it installs not my package!
Oh dear ...
Did you configure additional pip repositories in the Agent's clearml.conf ? https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L77 It might be that (1) is not enough, as pip will first try to search the package in the pip repository, and only then in the private one. To avoid that, in your code you can point directly to an https of your package` Ta...
Hmm that is odd.
Can you verify with the latest from GitHub?
Is this reproducible with the pipeline example code?
Hi @<1571308003204796416:profile|HollowPeacock58>
I'm assuming this is the arm support (i,e, you are running on new mac) fix we released in one one of the last clearml-agent versions. could you update to the latest clearml-agent?
pip3 install clearml-agent==1.6.0rc2
Seems like settings on the clearml-server disappeared (specifically default queue tag?!)
Hi HealthyStarfish45
Funny just today I had a similar discussion on slurm:
https://allegroai-trains.slack.com/archives/CTK20V944/p1603794531453000
Anyhow, when you say "[scale up agents]" are you referring to a machine constantly running an agent pulling jobs from the queue, where the machine itself (aka the resource) is managed as a slurm job?
parser.add_argument( "--dataset_mean", type
=
float, nargs
=
"+", default
=
0.5)
I think providing nargs='+ ' assumes the type is a list. nonetheless we should be able to support it. Could you please add a GitHub issue so we do not forget ?
on the side note, is there any way to automatically give more meaningful names to the running docker containers?
What do you mean by that? running where? and where will you see them ?
Yes, it recreates the venv (or fetches it from cache) if you need your dataset, use Dataset class (it will cache it persistently, so no need to re-download)
DeliciousBluewhale87 not on the opensource, for some reason it is not passed 😞
Could you explain the use case ?