Reputation
Badges 1
979 × Eureka!I think clearml-agent tries to execute /usr/bon/python3.6 to start the task, instead of using the python used to start clearml-agent
The fileย /tmp/.clearml_agent_out.j7wo7ltp.txt
ย does not exist
I execute the clearml-agent this way:/home/machine/miniconda3/envs/py36/bin/python3 /home/machine/miniconda3/envs/py36/bin/clearml-agent daemon --services-mode --cpu-only --queue services --create-queue --log-level DEBUG --detached
in clearml.conf:agent.package_manager.system_site_packages = true agent.package_manager.pip_version = "==20.2.3"
this is the last line, same a before
AgitatedDove14 So Iโll just replace task = clearml.Task.get_task(clearml.config.get_remote_task_id())
with Task.init()
and wait for your fix ๐
Yes, not sure it is connected either actually - To make it work, I had to disable both venv caching and set use_system_packages to off, so that it reinstalls the full env. I remember that we discussed this problem already but I don't remember what was the outcome, I never was able to make it update the private dependencies based on the version. But this is most likely a problem from pip that is not clever enough to parse the tag as a semantic version and check whether the installed package ma...
AgitatedDove14 So what you are saying is that since I have trains-server 0.16.1, I should use trains>=0.16.1? And what about trains-agent? Only version 0.16 is released atm, this is the one I use
yes, in setup.py I have:..., install_requires= [ "my-private-dep @ git+
", ... ], ...
AppetizingMouse58 the events_plot.json template misses the plot_len
declaration, could you please give me the definition of this field? (reindexing with dynamic: strict
fails with: "mapping set to strict, dynamic introduction of [plot_len] within [_doc] is not allowed
)
Sorry both of you, my problem was actually lying somewhere else (both buckets are in the same region) - thanks for you time!
trains-elastic container fails with the following error:
in the controller, I want to upload an artifact and start a task that will query that artifact and I want to make sure that the artifact exists when the task will try to retrieve it
not really, because it is in the middle of the controller task, there are other things to be done afterwards (retrieving results, logging new artifacts, creating new tasks, etc)
nvm, bug might be from my side. I will open an issue if I find any easy reproducible example
awesome ๐
Maybe then we can extend task.upload_artifact
?def upload_artifact(..., wait_for_upload: bool = False): ... if wait_for_upload: self.flush(wait_for_uploads=True)
yes, done! Is there something more to take into account than what I shared?
The simple workaround I imagined (not tested) at the moment is to sleep 2 minutes after closing the task, to keep the clearml-agent busy until the instance is shutted down:self.clearml_task.mark_stopped() self.clearml_task.close() time.sleep(120) # Prevent the agent to pick up new tasks
and saved locally, which is why the second task, not executed in the same machine, cannot access the file
Thanks a lot, I will play with that!
I tried removing type=str but I got same problem ๐
I checked the commit date anch and went to all experiments, and scrolled until finding the experiment
Mmmh unfortunately not easilyโฆ I will try to debug deeper today, is there a way to resume a task from code to debug locally?
Something like replacing Task.init
with Task.get_task
so that Task.current_task
is the same task as the output of Task.get_task
AgitatedDove14 Up ๐ I would like to know if I should wait for next release of trains or if I can already start implementing azure support
Ho I wasn't aware of that new implementation, was it introduced silently? I don't remember reading it in the release notes! To answer your question: no, for gcp I used the old version, but for azure I will use this one, maybe send a PR if code is clean ๐
Thanks for the hint, Iโll check the paid version, but Iโd like first to understand how much efforts it would be to fix the current situation by myself ๐
It worked like a charm ๐ฑ Awesome thanks AgitatedDove14 !
So I changed ebs_device_name = "/dev/sda1"
, and now I correctly get the 100gb EBS volume mounted on /
. All good ๐