Reputation
Badges 1
979 × Eureka!basically:
` from trains import Task
task = Task.init("test", "test", "controller")
task.upload_artifact("test-artifact", dict(foo="bar"))
cloned_task = Task.clone(task, name="test", parent=task.task_id)
cloned_task.data.script.entry_point = "test_task_b.py"
cloned_task._update_script(cloned_task.data.script)
cloned_task.set_parameters(**{"artifact_name": "test-artifact"})
Task.enqueue(cloned_task, queue_name="default") `
Here is the minimal reproducable example.
Run test_task_a.py - It will register a dummy artifact, create a new task, set a parameter in that task and enqueue it test_task_b will try to retrieve parameter from parent task and fail
AgitatedDove14 I cannot confirm at 100%, the context is different (see previous messages) but it could be the same bug behind the scene...
What is weird is:
Executing the task from an agent: task.get_parameters() returns an empty dict Calling task.get_parameters() from a local standalone script returns the correct properties, as shown in web UI, even if I updated them in UI.So I guess the problem comes from trains-agent?
Thanks for your inputs, I will try that! For completion, here is how I retrieve the parameters:
` from trains import Task
task = Task.init("test", "test")
parent_task = Task.get_task(task.parent)
task.get_logger().report_text(task.get_parameters())
artifact_name = task.get_parameter("General/artifact_name")
artifact = parent_task.artifacts[artifact_name].get() `
So in my minimal reproducable example, it does work ๐คฃ very frustrating, I will continue searching for that nasty bug
very cool, good to know, thanks SuccessfulKoala55 ๐
Hi SuccessfulKoala55 , super thatโs what I was looking for
Indeed, I actually had the old configuration that was not JSON - I converted to json, now works ๐
Very nice! Maybe we could have this option as a toggle setting in the user profile page, so that by default we keep the current behaviour, and users like me can change it ๐ wdyt?
no it doesn't! 3. They select any point that is an improvement over time
Thanks!3. I don't know, I never used Highcharts ๐
I am not using hydra, I am reading the conf with:config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))
But I am not sure it will connect the parameters properly, I will check now
Doing it the other way around works:
` cfg = OmegaConf.create(read_yaml(conf_yaml_path))
config = task.connect(cfg)
type(config)
<class 'omegaconf.dictconfig.DictConfig'> `
but then why do I have to do task.connect_configuration(read_yaml(conf_path))._to_dict()
?
Why not task.connect_configuration(read_yaml(conf_path))
simply?
I mean what is the benefit of returning ProxyDictPostWrite
instead of a dict?
Same, it also returns a ProxyDictPostWrite
, which is not supported by OmegaConf.create
I mean, inside a parent, do not show the project [parent] if there is nothing inside
Because it lives behind a VPN and github workers donโt have access to it
No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?
Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomlytrains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |
Why is it required in the case where boto3 can figure them out itself within the ec2 instance?
it actually looks like I donโt need such a high number of files opened at the same time
because at some point it introduces too much overhead I guess
mmmh it fails, but if I connect to the instance and execute ulimit -n
, I do see65535
while the tasks I send to this agent fail with:OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'
and from the task itself, I run:import subprocess print(subprocess.check_output("ulimit -n", shell=True))
Which gives me in the logs of the task:b'1024'
So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work