Hi AgitatedDove14 , This isnt the issue. With or without specifying the queue, I have this error when I do the "Create version" as compared to the "Init version".
I wonder whether this is some issue with using the Create version together with execute_remotely() ..
Hi SuccessfulKoala55
I have set it True some time back already
Well, you need to make sure the agent runs with the same settings - as these are in the agent
section, they won't affect your locally running SDK
Hi DeliciousBluewhale87
You can achieve the same results programmatically with Task.create
https://github.com/allegroai/clearml/blob/d531b508cbe4f460fac71b4a9a1701086e7b6329/clearml/task.py#L619
I just downloaded the logs from the Failed task. Seem I have set the agent.package_manager.system_site_packages: true
in the agent as well.
Do this by mapping a clearml.conf
file to the agent
Using clearml-task, I am able to pass in the exact requirements.txt file, I am not sure how we can accomplish that when you using the Python train_it.py and execute_remotely() option.
AgitatedDove14
Hi AgitatedDove14 , Attached my create version compared to init version..
When I enqueue both the init and create version into my clearmlQueue, it seems the create version doesnt execute at all.
It just mentions "2021-05-26 16:02:13,053 - clearml - WARNING - Terminating local execution process" and says it has completed successfully.
The above screenshot is from my local settings... My agents run in the k8s system (like in a pod)
The warning just let's you know the current processes stopped and itis being launched on a remote machine.
What am I missing? Is the agent failing to run the job that you create manually ?
(notice that when creating a job manually, there is no "execute_remotely", you just enqueue it, as it is not actually "running locally")
Make sense ?
Hi DeliciousBluewhale87 ,
In your agent configuration file, set agent.package_manager.system_site_packages: true
That's in the agent, not in your local settings when running the task, right?
Notice that in your execute_remotely() you did not specify a queue to put the current Task into
What it does is it stops the current running code and it puts the newly created task into the specified queue, if you do not specify a queue , it will just abort it, and wait for you to Manually enqueue it.
To solve it:task.execute_remotely(queue_name='my_queue')