Reputation
Badges 1
979 × Eureka!I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
Nice, the preview param will do π btw, I love the new docs layout!
The task is created using Task.clone() yes
Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300
basically:
` from trains import Task
task = Task.init("test", "test", "controller")
task.upload_artifact("test-artifact", dict(foo="bar"))
cloned_task = Task.clone(task, name="test", parent=task.task_id)
cloned_task.data.script.entry_point = "test_task_b.py"
cloned_task._update_script(cloned_task.data.script)
cloned_task.set_parameters(**{"artifact_name": "test-artifact"})
Task.enqueue(cloned_task, queue_name="default") `
Here is the minimal reproducable example.
Run test_task_a.py - It will register a dummy artifact, create a new task, set a parameter in that task and enqueue it test_task_b will try to retrieve parameter from parent task and fail
AgitatedDove14 I cannot confirm at 100%, the context is different (see previous messages) but it could be the same bug behind the scene...
What is weird is:
Executing the task from an agent: task.get_parameters() returns an empty dict Calling task.get_parameters() from a local standalone script returns the correct properties, as shown in web UI, even if I updated them in UI.So I guess the problem comes from trains-agent?
Thanks for your inputs, I will try that! For completion, here is how I retrieve the parameters:
` from trains import Task
task = Task.init("test", "test")
parent_task = Task.get_task(task.parent)
task.get_logger().report_text(task.get_parameters())
artifact_name = task.get_parameter("General/artifact_name")
artifact = parent_task.artifacts[artifact_name].get() `
So in my minimal reproducable example, it does work π€£ very frustrating, I will continue searching for that nasty bug
very cool, good to know, thanks SuccessfulKoala55 π
Hi SuccessfulKoala55 , super thatβs what I was looking for
Indeed, I actually had the old configuration that was not JSON - I converted to json, now works π
AgitatedDove14 yes but I don't see in the docs how to attach it to the logger of the earlystopping handler
Very nice! Maybe we could have this option as a toggle setting in the user profile page, so that by default we keep the current behaviour, and users like me can change it π wdyt?
and I didn't have this problem before because when cu117 wheels were not available, the agent was trying to get the wheel with the closest cu version and was falling back to 1.11.0+cu115, and this one was working
no it doesn't! 3. They select any point that is an improvement over time
Thanks!3. I don't know, I never used Highcharts π
I am not using hydra, I am reading the conf with:config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))
But I am not sure it will connect the parameters properly, I will check now
Doing it the other way around works:
` cfg = OmegaConf.create(read_yaml(conf_yaml_path))
config = task.connect(cfg)
type(config)
<class 'omegaconf.dictconfig.DictConfig'> `
but then why do I have to do task.connect_configuration(read_yaml(conf_path))._to_dict()
?
Why not task.connect_configuration(read_yaml(conf_path))
simply?
I mean what is the benefit of returning ProxyDictPostWrite
instead of a dict?
Same, it also returns a ProxyDictPostWrite
, which is not supported by OmegaConf.create
I mean, inside a parent, do not show the project [parent] if there is nothing inside
correct, you could also use
Task.create
that creates a Task but does not do any automagic.
Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."
Because it lives behind a VPN and github workers donβt have access to it
No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?
Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomlytrains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |