CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.
Are they expanded in the "api_server" ? (I verified on a linux machine, same error, the env in the api_server is not being resolved)
actually no it is not, alpine is Not a good baseline, is is very very slim missing a ton of stuff.
I would use bullseye or slim (depending how many aux things you need on the container)
https://hub.docker.com//python/tags?page=1&name=bullseye
https://hub.docker.com//python/tags?page=1&name=slim-bullseye
Hmm, let me see if you can somehow "signal" to the subprocess that it should not use the main process Task. (btw: are you forking or spawning a subprocess?)
Hi UnsightlyHorse88
Hmm, try adding to your clearml.conf file:agent.cpu_only = true
if that does not work try adding to the OS environmentexport CLEARML_CPU_ONLY=1
DisturbedWorm66 it does, I think there is an example here:
https://github.com/allegroai/nvidia-clearml-integration/tree/main/tlt
Yep, everything (both conda and pip)
Hi CloudySwallow27
Is there a way to still use the auto_connect but limit the amount of debug imgs?
Basically you can set the number of image it will store for you (per title/series combination)m the way it works it rotates the image names so essentially overriding old images (the UI is ware and will only show the last X of them)
See here on setting it:
https://github.com/allegroai/clearml/blob/81de18dbce08229834d9bb0676446a151046e6a7/docs/clearml.conf#L32
Hi JitteryCoyote63 ,
The easiest would probably be to list the experiment folder, and delete its content.
I might be missing a few things but the general gist should be:from trains.storage import StorageHelper h = StorageHelper('s3://my_bucket') files = h.list(prefix='s3://my_bucket/task_project/task_name.task_id') for f in files: h.delete(f)
Obviously you should have the right credentials 🙂
default is clearml data server
Yes the default is the clearml files server, what did you configure it to ? (e.g. should be something like None )
So you are saying it ignored everything after the bucket's "/" ?
Hi OddShrimp85
If you pass 'output_uri=True' to task init, it will upload the model automatically, or as you said manually with outputmodel class
Thanks TrickyRaccoon92
I think it's about time we remove the survey link anyhow 🙂
I'll make sure it happens ..,
Hi BoredHedgehog47
You mean like EFS for caching ?
Sure LazyTurkey38 here's a nice hack for that:
` # code here
task.execute_remotely(queue_name=None, clone=False, exit_process=False)
patch the Task and actually send it for execution
if Task.running_locally():
task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}})
# now to actually enqueue the Task
Task.enqueue(task, queue_name='default') You can also clear the git diff by passing
"diff": "" `
wdyt?
ConvolutedSealion94 if you do bash:cd ~/work/repo/code/ git status
what are you getting ?
(obviously if you have dependencies, they will be installed before, and then the correct torch will be installed over the previous version
GreasyLeopard35 from the implementation:
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/parameters.py#L215
Which basically returns the "self.base" (default) 10 to the power of the selected value:10**-3 = 0.001
So how would I get a negative value ?
Clearml automatically gets these reported metrics from TB, since you mentioned see the scalars , I assume huggingface reports to TB. Could you verify? Is there a quick code sample to reproduce?
do you have docker installed on all slurm agent/worker machines
Docker support?
Hi EnviousStarfish54
I remember this feature request, let me check where it stands..
And if you could also update the docs with all env vars possible to set up it would awesome!
Yes, I'll pass it on, that is a good point
Thanks! Yes, this could be great !
Could you please open a GitHub issue, so we remember to update the feature ?
Do you have a roadmap which includes resolving things like this
Security SSO etc. is usually out of scope for the open-source platform as it really makes the entire thing a lot harder to install and manage. That said I know that on the Enterprise solution they do have SSO and LDAP support and probably way more security features. I hope it helps 🙂
this issue on when trying to set up on our remote machines
You mean setting up the trains-server on remote machine?
I didn't realise that pickling is what triggers clearml to pick it up.
No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)
ExcitedFish86 that said if running in docker mode you can actually pass it on a Task basis with:-e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/venv/bin/python
as an additional docker container argument on the Task "Execution" tab itself.
but I was wondering if there's any limitation in creating an image with a non root user to use as the actual worker?
SarcasticSquirrel56 non-root pods (containers) are fully supported,
I would recommend using the latest agent RC (that simplified a few things)clearml-agent==1.4.0rc3
I see... because the problem it would be with permissions when creating artifacts to store in the "/shared" folder
You mean as output target for artifacts ?
especially for datasets (for th...