You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.
Hi SubstantialElk6
We try to push a fix the same day a HIGH CVE is reported, that said since the external API interface is relatively far away from DBs / OS, and since as a rule of thumb, authorized users are trusted (basically inherit agent code execution means they have to be), it is an exception to have a CVE that affects the system. I think even this high profile one, does not actually have an effect on the system as even if ELK is susceptible (which it is not), only authorized users co...
Hi SkinnyPanda43
cannot schedule new futures after interpreter shutdown
This seems like a strange exception...
What's the setup here ? jupyter notebook ? how is the interpreter down ?
I want to be able to access the data just avoid reporting the experiment results
Yes, you are correct π
If you just want to skip the logging you can always add an if to the Tasl.init call ?!
LazyTurkey38 configuration pushed to github :)
Hmm could it be this is on the "helper functions" ?
you can also increase the limit here:
https://github.com/allegroai/clearml/blob/2e95881c76119964944eaa0289549617e8afeee9/docs/clearml.conf#L32
Hi @<1692345677285167104:profile|ThoughtfulKitten41>
Is it possible to trigger a pipeline run via API?
Yes! a pipeline is at the end a Task, you can take the pipeline ID and clone and enqueue it
pipeline_task = Task.clone("pipeline_id_here")
Task.enqueue(pipeline_task, queue_name="services")
You can also monitor the pipeline with the same Task inyerface.
wdyt?
First that is awesome to hear PanickyFish98 !
Can you send the full exception? You might be on to something...
2. Actually we thought of it, but could not find a use case, can you expand?
3. I'm not sure I follow, do you mean you expect the first execution to happen immediately?
Hi DepressedChimpanzee34
if you try to extend it more then the width of the column to the right, it doesn't do anything..
You mean outside of the window? or are you saying you cannot extend it?
Just verifying, we are talking about the latest version of clearml-server ?
Thanks @<1694157594333024256:profile|DisturbedParrot38> !
Nice catch.
Could you open a github issue so that at least we output a more informative error?
If this is the case then the easiest is:from clearml.backend_api.session.client import APIClient client = APIClient() res = client.events.get_task_plots(task="<task-id>")
We should defiantly have a nice interface π
Also, on the ClearML dashboard, I can see the
clearml-agent
log:
Is the clearml-agent running in docker mode ?
Hi @<1691258563357315072:profile|ColorfulKitten60>
I think we need some context for this question π
The difference is whether you are only supplying a "minutes" or you are also passing hour/day etc.
See the examples:
Every 15 minutesadd_task(task_id='1235', queue='default', minute=15)
Every hour on minute 20 of the hour (i.e. 00:20, 01:20 ...)add_task(task_id='1235', queue='default', hour=1, minute=20)
I see now, give me a minute I'll check
Hi @<1704304350400090112:profile|UpsetOctopus60>
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_kubernetes_helm
Just use the helm charts. It's the easiest
Is there an easy way to add a docker argument in the python script?
On the task it self in the UI you can edit the docker arguments and add any missing flags
(task.set_base_docker will do the same from code)
You can also edit the configuration and always add this flag:
None
no, using the system drivers
You might need to play around a bit, it might be that StorageHelper.get(' gs://bucket ') and then helper.list('folder/*')
Let me know what worked π
docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.
Any chance they try to store the TensorBoard on this folder ? This could lead to "No such file or directory: 'runs'" if one is deleting it, and the other is trying to access, or similar scenarios
This means that in your "Installed packages" you should see the line:
Notice that this is not a pypi artifactory (i.e. a server to add to the extra index url for pip), this is a direct pip install from a git repository, hence it should be listed in the "installed packages".
If this is the way the package was installed locally, you should have had this line in the installed packages.
The clearml agent should take care of the authentication for you (specifically here, it should do nothing).
If ...
Thanks ShakyJellyfish91 ! please let me know what you come up with, I would love for us to fix this issue.
Change to add_missing_installed_packages=False,
here, and see if you end up with git diff
https://github.com/allegroai/clearml/blob/1f82b0c4010799be6157f5c845c7f6ac48e71c0c/clearml/backend_interface/task/populate.py#L158
SubstantialElk6
The ~<package name with first name dropped> == a.b.c
is a known conda/pip temporary install issue. (Some left over from previous package install)
The easiest way is to find the site-packages folder and delete the package, or create a new virtual environment
BTW:
pip freeze will also list these broken packages
PricklyRaven28 basically this is the issue:
python -m fastai.launch <script>
There are multiple copies of the script running, but they are Not aware of one another.
are you getting any reporting from the diff GPUs? I'm assuming there is a hidden OS environment that signals the "master" node, so all processes can communicate with it. This is what we should automatically capture. There is a workaround the fastai.launch, that is probably similar to this one:
Hi @<1526371965655322624:profile|NuttyCamel41>
I think that the only way to actually get huge number of api calls is with a lot of machines.
For example, regardless of the amount of console-logs you print, it will only be a single call, as these are packages every 2-10 seconds. The same with metric reporting etc.
On the free tier you cal already test the amount of API calls, I think the mechanism is exactly the same
fyi: I would put this question in the channel
Yep it is the scale π and yes it should appear once you upgrade
I think I found something,
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/storage/helper.py#L1442
What's the boto version you have installed?