I created a snapshot of both disks
I will try to isolate the bug, if I can, I will open an issue in trains-agent π
Now I'm curious, what did you end up doing ?
in my repo I maintain a bash script to setup a separate python env. then in my task I spawn a subprocess and I don't pass the env variables, so that the subprocess properly picks up the separate python env
The task with id a445e40b53c5417da1a6489aad616fee
is not aborted and is still running
So actually I donβt need to play with this limit, I am OK with the default for now
CostlyOstrich36 good enough, I will fallback to sorting by updated, thanks!
the api-server shows when starting:clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic host
`
clearml-apiserver | [2021-07-13 11:09:34,552] [9] [INFO] [clearml.es_factory] Using override elastic port 9200
...
clearml-apiserver | [2021-07-13 11:09:38,407] [9] [WARNING] [clearml.initialize] Could not connect to ElasticSearch Service. Retry 1 of 4. Waiting for 30sec
clearml-apiserver | [2021-07-13 11:10:08,414] [9] [WARNING] [clearml.initia...
I am using 0.17.5, it could be either a bug on ignite or indeed a delay on the send. I will try to build a simple reproducible example to understand to cause
yes -> but I still don't understand why the post_packages didn't work, could be worth investigating
Sure π Opened https://github.com/allegroai/clearml/issues/568
Nice, the preview param will do π btw, I love the new docs layout!
AgitatedDove14 awesome! by "include it all" do you mean wizard for azure and gcp?
Isn't it overkill to run a whole ubuntu 18.04 just to run a dead simple controller task?
AgitatedDove14 any chance you found something interesting? π
Hi AnxiousSeal95 , I hope you had nice holidays! Thanks for the update! I discovered h2o when looking for ways to deploy dashboards with apps like streamlit. Most likely I will use either streamlit deployed through clearml or h2o as standalone if ClearML won't support deploying apps (which is totally fine, no offense there π )
with 1.1.1 I getUser aborted: stopping task (3)
Configuration:
` {
"resource_configurations": {
"v100": {
"instance_type": "g4dn.2xlarge",
"availability_zone": "us-east-1a",
"ami_id": "ami-05e329519be512f1b",
"ebs_device_name": "/dev/sda1",
"ebs_volume_size": 100,
"ebs_volume_type": "gp3",
"key_name": "key.name",
"security_group_ids": [
"sg-asd"
],
"is_spot": false,
"extra_configura...
AgitatedDove14 ok, but this happens in my local machine, not in the agent
but if the task is now running on an agent, isnβt is possible source of conflict? I would expect that after calling Task.enqueue(exit=True), the local task is closed and no processes related to it is running
yes, so it does exit the local process (at least, the command returns), but another process is still running on the background and is logging things from time to time (such as:)ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
yes, exactly: I run python my_script.py
, the script executes, creates the task, calls task.remote_execute(exit_process=True)
and returns to bash. Then, in the bash console, after some time, I see some messages being logged from clearml
Alright, thanks for the answer! Seems legit then π
I also tried setting ebs_device_name = "/dev/sdf"
- didn't work
AgitatedDove14 Unfortunately no, I already had the problem before using the function, I added it hoping it would fix the issue but it didnβt
Is there one?
No, I rather wanted to understand how it worked behind the scene π
The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders
Thatβs awesome!
I also discovered https://h2oai.github.io/wave/ last week, would be awesome to be able to deploy it in the same manner
What is latest rc of clearml-agent? 1.5.2rc0?
Yes, but I am not certain how: I just deleted the /data folder and restarted the server