Reputation
Badges 1
981 × Eureka!I followed https://github.com/NVIDIA/nvidia-docker/issues/1034#issuecomment-520282450 and now it seems to be setting up properly
btw SuccessfulKoala55 the parameter is not documented in https://allegro.ai/clearml/docs/docs/references/clearml_ref.html#sdk-development-worker
Hi PompousParrot44 , you could have a Controller task running in the services queue that periodically schedules the task you want to run
Thanks SuccessfulKoala55 ! So CLEARML_NO_DEFAULT_SERVER=1 by default, right?
yes, exactly: I run python my_script.py , the script executes, creates the task, calls task.remote_execute(exit_process=True) and returns to bash. Then, in the bash console, after some time, I see some messages being logged from clearml
Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)
The task I cloned from is not the one I though
You are right, thanks! I was trying to move /opt/trains/data to an external disk, mounted at /data
, causing it to unregister from the server (and thus not remain there).
Do you mean that the agent actively notifies the server that it is going down? or the server infers that the agent is down after a timeout?
The clean up service is awesome, but it would require to have another agent running in services mode in the same machine, which I would rather avoid
Also maybe we are not on the same page - by clean up, I mean kill a detached subprocess on the machine executing the agent
SuccessfulKoala55 I want to avoid writing creds in plain text in the config file
Something was triggered, you can see the CPU usage starting right when the instance became unresponsive - maybe a merge operation from ES?
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
Will the from clearml import Task raise an error if no clearml.conf exists? Or only when actual features requiring to define the server (such as Task.init ) will be called
Sure! Here are the relevant parts:
` ...
Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):
...
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 ...
How exactly is the clearml-agent killing the task?
Ok AgitatedDove14 SuccessfulKoala55 I made some progress in my investigation:
I can exactly pinpoint the change that introduced the bug, it is the one changing the endpoint "events.get_task_log", min_version="2.9"
In the firefox console > Network, I can edit an events.get_task_log and change the URL from ā¦/api/v2.9/events.get_task_log to ā¦/api/v2.8/events.get_task_log (to use the endpoint "events.get_task_log", min_version="1.7" ) and then all the logs are ...
Hi SuccessfulKoala55 , super thatās what I was looking for
SuccessfulKoala55 I found the issue thanks to you: I changed a bit the domain but didnāt update the apiserver.auth.cookies.domain setting - I did it, restarted and now it works š Thanks!
The task is created using Task.clone() yes
AgitatedDove14 , my āuncommitted changesā ends with... if __name__ == "__main__": task = clearml.Task.get_task(clearml.config.get_remote_task_id()) task.connect(config) run() from clearml import Task Task.init()