Yes, because when a container is executed, the agent creates a new venv and inherits from the system wide installed packages, but it cannot inherit or "understand" there is an existing venv, and where it is.
correct, you can pass it as keys on the "task_filter" argument, e.g:Task.get_tasks(..., task_filter={'status': ['failed']})
LazyTurkey38 configuration pushed to github :)
Python3.8 I can quickly check, give me a minute
GloriousPenguin2 could you open a GitHub issue on it? Just making sure this will actually get fixed 🙂
With pleasure 😊
As long as you import clearml on the main script, it should work. Regarding the Nvidia container, it should not interfere with any running processes, the only issue is memory limit. BTW any reason not to spin an agent on a dedicated machine? What is the gpu used for in the ckearml server machine?
MuddySquid7 I might have found something, and this is very very odd, it seems it will Not upload any new images post the history size, which is very odd considering the number of users actively using this feature...
Do you want to try a hack to see if it solved your issue ?
Okay found it, ElegantCoyote26 the step name is changed but the Task name remains the same ... 😞
I'll make sure we fix it on the next version
CooperativeFox72 this is indeed sad news 😞
When you have the time, please see if you can send a code snippet to reproduce the issue. I'd like to have it fixed
I think I found something relating to the issue on the subprocess not logging. Let me check if we can share something quickly
Yep, this will run the pipeline controller itself on the clearml-server (or any other machine running clearml-agent services mode)
you can also check
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
Which will stop a local execution of a Task and re-launch it on a remote machine
UpsetTurkey67 are you saying there is a sym link in the original repository, and when it copies it, it breaks the symlink ?
AbruptHedgehog21 the bucket and the full link are registered on the model object itself, you can see them in the ui, under the models tab. The only thing you actually need to pass inside is the credentials. Make sense?
Hi
, It works if I dont specify the project name and just give the task name
But now it searches for it globally , which is not very stable:
Let me check why it fails to find the project...
Hi @<1709015393701466112:profile|ScatteredPeacock14>
I get 3 tasks created in total. Any ideas?
Could it be an old instance of the same Task?
Notice the for loop starts from 1 so it does include the master node:
None
Hi JuicyDog96
The easiest way is:from trains.backend_api.session.client import APIClient client = APIClient() client.projects.get_all()
You can just run it from a python console and check what you are getting.
Full API is https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
This is already part of the docker-compose file,
https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml
DeliciousBluewhale87
You could also just upload the data (i.e do not call close). Then you will be able to change it later obviously, this will make in intractable.
BTW: the clearml-data stores delta changes, so if you only change a few files it will only store those.
Is there a way to capture uncommited changes withÂ
Task.create
 just likeÂ
Task.init
 does? Actually, I would like to populate the repo, branch and packages automatically...
You can pass a local repo path to Task create I "think" it will also store the uncommitted changes.
I start my main task like this:Â
python my_script.py --myarg "myargs"
. How are the arguments captured?
At runtime when argparse is called.
You can use ` clea...
Hmm yeah I can see why...
Now that I think about it, at least in theory the second process that torch creates, should inherit from the main one, and as such Task.init is basically "ignored"
Now I wonder why your first version of the code did not work?
Could it be that we patched the argparser on the subprocess and that we should not have?
So this can be translated to
CLEARML__SDK__AZURE__STORAGE__CONTAINERS__0__ACOUNT_NAME=abcd
My only point is, if we have no force_git_ssh_port
or force_git_ssh_user
we should not touch the SSH link (i.e. less chance of us messing with the original URL if no one asked us to)
All in all, seems like it will be fairly easy to add JupyterHub to clearml-session, and that would solve your issue, no?
(and it seems from implementation perspective, this will not be a lot of work)
wdyt?
Hi RotundHedgehog76
I think it should work out of the box, I mean at the end both spin jupyter notebooks, which is what clearml interacts with. Are you getting any errors?
Hi @<1729309120315527168:profile|ShallowLion60>
Clearml in our case installed on k8s using helm chart (version: 7.11.0)
It should be done "automatically", I think there is a configuration var in the helm chart to configure that.
What urls are you urls seeing now, and what should be there?
DeliciousBluewhale87
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there..
Hmm that means it is working...
Do you see there a *.conf files? What do they contain? (it point to the correct clearml-server config)
Hi @<1797800418953138176:profile|ScrawnyCrocodile51>
Will the docker container / disk space (really I am more interested about the dataset that download by the task) get automatically clean up?
Yes, the agent is running the container with --rm
🙂