Is it possible to get the folder with the artifacts/models? (edited)
You can directly get the artifacts/models url then deduce the foldertask = Task.get_task('my_task_id') print(task.artifacts['my artifact'].url)
That should work 🙂
BTW, you might play around with "clearml-agent execute --id <task_id_here>"
This will basically clone the code, create a venv with the python packages, apply uncommitted changes and will run the actual code. This could be a replacement for your bash. (notice it means that you need to clone the Task in the UI, then you can Change parameters, then the run the agent manually in SLURM and it will take the params from the UI.)
Where did you add the Task.init call ?
Hi @<1544853721739956224:profile|QuizzicalFox36>
Sure just change the ports on the docker compose
Any updates on trigger and schedule docs
I think examples are already pushed, docs still in progress.
BTW: pipeline v2 examples are also out:
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
https://github.com/allegroai/clearml/blob/master/examples/pipeline/full_custom_pipeline.py
Why do you ask? is your server sluggish ?
Maybe the only thing to worry about is making sure the IP address is stable, so if k8s replaces the node, you do not have to reconfigure the clients 🙂
Hi JuicyDog96
The easiest way at the moment (apologies for still lack of RestAPI documentation, it is coming:)
Is actually the code (full docstring doc)
https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
You can access it all with an easy Pythonic interface, for example:from trains.backend_api.session.client import APIClient client = APIClient() tasks = client.tasks.get_all()
So you mean 1.3.1 should fix this bug?
Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0
ERROR: Could not install packages due to an EnvironmentError:
[Errno 28] No space left on device
BTW: @<1523703080200179712:profile|NastySeahorse61> this sounds like docker out of space on the Main disk '/var/` where it stores all the images and temp file systems
This will cause you code to fail as any runtime change to the container file system will raise this out of disk space error
If you set the package_manager to peotry then it will only use the lock files
https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L53
If you clear the "Installed Packages" section, it will just use the "requirements.txt" in the repository itself.
What's the specific use case, and the problem we are trying to solve?
I can't seem to find a difference between the two, why would matplotlib get listed and pandas does not... Any other package that is missing?
BTW: as an immediate "hack" , before your Task.init
call add the following:Task.add_requirements("pandas")
SmarmySeaurchin8 checks the logs, maybe you can find something there
Yep, this will run the pipeline controller itself on the clearml-server (or any other machine running clearml-agent services mode)
you can also check
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
Which will stop a local execution of a Task and re-launch it on a remote machine
Where do you store those ?
Hi ReassuredTiger98
So let's assume we call:logger.report_image(title='training', series='sample_1', iteration=1, ...)
And we report every iteration (keeping the same title.series names). Then in the UI we could iterate back on the last 100 images (back in time) for this title / series.
We could also report a second image with:logger.report_image(title='training', series='sample_2', iteration=1, ...)
which means that for each one we will have 100 past images to review ( i.e. same ti...
Hi @<1523701868901961728:profile|ReassuredTiger98>
Anyone here with any idea why my service tasks get aborted when going to sleep?
I think I understand the issue, clearml==1.4.0
try running with the latest clearml (1.10.x)
It will keep pinging the backend "Im alive" so the backend does not think this process is dead (which I suspect what happened, and after 2 hours the backend basically set the Task to aborted because it "thought" it was killed)
function and just seem to be getting an "isadirectory" error?
Can you post here what you are getting ? which clearml version are you using ?!
also tried manually adding
leap==0.4.1
in the task UI which didn't work.
That has to work, if it did not, can you send the log for the failed Task (or the Task that did not install it)?
The environment in the logs does show that leap is being installed potentially from a cache?
- leap @ file:///opt/keras-hannd...
Hi ElegantCoyote26
is there a way to get a Task's docker container id/name?
you mean like Task.get_task("task_id_here").get_base_docker()
?
ow a Task's results page also has a plot for this, but I guess it's at the machine level and not the task level?
This is actually on the container level, meaning checked from inside the container. It should be what you are looking for
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling
Task.init
on those scripts.
Correct, and allow users to more easily create Tasks from code.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload ...
For setting trains-server I would recommend the docker-compose, it is very easy to setup, and you just need a single fixed compute instance, details https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md With regards to the "low prio clusters", are you asking how they could be connected with the trains-agent
or if running code that uses trains
will work on them?
` param = {'arg': value}
task.connect(param, section='new section')
create pipeline here
pipeline `
There seems to be a problem with multiprocessing: Although I stopped the task,
You mean you "aborted the task" from the UI?
- There is a memory leak somewhere, please see the screenshot of datadog memory consumptionI'm assuming from the leftover processes ?
Python 3.8/Pytorch 1.11/clearml-sdk 1.9.0/clearml-agent 1.4.1
From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)
Hi AbruptCow41
I just want them to be able to write in them without them appear nor in their clearml.conf nor in their environmental variables.
So where would they put them ? (or is it pre baked into the docker?)
But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
that depends, how would the sibling packages get to a remote machine ?
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
EFS get downloaded to the k8 pod local volume?
EFS is an Amazon service that mounts a persistent FS into ec2 instances, I believe they have support for k8s as a service as well, which would make it kind of like a PV only as a service.
Does that make sense ?