Reputation
Badges 1
979 × Eureka!I am using 0.17.5, it could be either a bug on ignite or indeed a delay on the send. I will try to build a simple reproducible example to understand to cause
yes -> but I still don't understand why the post_packages didn't work, could be worth investigating
Sure π Opened https://github.com/allegroai/clearml/issues/568
Nice, the preview param will do π btw, I love the new docs layout!
AgitatedDove14 awesome! by "include it all" do you mean wizard for azure and gcp?
Isn't it overkill to run a whole ubuntu 18.04 just to run a dead simple controller task?
AgitatedDove14 any chance you found something interesting? π
Hi AnxiousSeal95 , I hope you had nice holidays! Thanks for the update! I discovered h2o when looking for ways to deploy dashboards with apps like streamlit. Most likely I will use either streamlit deployed through clearml or h2o as standalone if ClearML won't support deploying apps (which is totally fine, no offense there π )
I found, the filter actually has to be an iterable:Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type=["training"])))
with 1.1.1 I getUser aborted: stopping task (3)
Configuration:
` {
"resource_configurations": {
"v100": {
"instance_type": "g4dn.2xlarge",
"availability_zone": "us-east-1a",
"ami_id": "ami-05e329519be512f1b",
"ebs_device_name": "/dev/sda1",
"ebs_volume_size": 100,
"ebs_volume_type": "gp3",
"key_name": "key.name",
"security_group_ids": [
"sg-asd"
],
"is_spot": false,
"extra_configura...
AgitatedDove14 ok, but this happens in my local machine, not in the agent
but if the task is now running on an agent, isnβt is possible source of conflict? I would expect that after calling Task.enqueue(exit=True), the local task is closed and no processes related to it is running
yes, so it does exit the local process (at least, the command returns), but another process is still running on the background and is logging things from time to time (such as:)ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
yes, exactly: I run python my_script.py
, the script executes, creates the task, calls task.remote_execute(exit_process=True)
and returns to bash. Then, in the bash console, after some time, I see some messages being logged from clearml
Alright, thanks for the answer! Seems legit then π
AgitatedDove14 Yes exactly! it is shown in the recording above
I also tried setting ebs_device_name = "/dev/sdf"
- didn't work
AgitatedDove14 Unfortunately no, I already had the problem before using the function, I added it hoping it would fix the issue but it didnβt
Is there one?
No, I rather wanted to understand how it worked behind the scene π
The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders
Thatβs awesome!
I also discovered https://h2oai.github.io/wave/ last week, would be awesome to be able to deploy it in the same manner
What is latest rc of clearml-agent? 1.5.2rc0?
Yes, but I am not certain how: I just deleted the /data folder and restarted the server
You are right, thanks! I was trying to move /opt/trains/data to an external disk, mounted at /data
but most likely I need to update the perms of /data as well
So I created a symlink in /opt/train/data -> /data
Ok, now I would like to copy from one machine to another via scp, so I copied the whole /opt/trains/data folder, but I got the following errors: