Reputation
Badges 1
981 × Eureka!now I can do nvcc --version and I getCuda compilation tools, release 10.1, V10.1.243
awesome π
Maybe then we can extend task.upload_artifact ?def upload_artifact(..., wait_for_upload: bool = False): ... if wait_for_upload: self.flush(wait_for_uploads=True)
I killed both trains-agent and restarted one to have a clean start. This way it correctly spin up docker containers for services tasks. So probably the bug comes when a bug occurs while setting up a task, it cannot go back to the main task. I would need to do some tests to validate that hypothesis though
ok, what is the 3.8 release? a server release? how does this number relates to the numbers above?
AgitatedDove14 Yes I have the xpack security disabled, as in the link you shared (note that its xpack.security.enabled: "false" with brackets around false), but this command throws:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
That would be awesome π
Is there one?
No, I rather wanted to understand how it worked behind the scene π
The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders
Thatβs awesome!
there is no error from this side, I think the aws autoscaler just waits for the agent to connect, which will never happen since the agent wonβt start because the userdata script fails
I think this is because this API is not available in elastic 5.6
If I remove security_group_ids and just let subnet_id in the configuration, it is not taken into account (the instances are created in a the default subnet)
AgitatedDove14 I was able to redirect the logger by doing so:clearml_logger = Task.current_task().get_logger().report_text early_stopping = EarlyStopping(...) early_stopping.logger.debug = clearml_logger early_stopping.logger.info = clearml_logger early_stopping.logger.setLevel(logging.DEBUG)
Would be very cool if you could include this use case!
Would adding a ILM (index lifecycle management) be an appropriate solution?
mmmh it fails, but if I connect to the instance and execute ulimit -n , I do see65535while the tasks I send to this agent fail with:OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'and from the task itself, I run:import subprocess print(subprocess.check_output("ulimit -n", shell=True))Which gives me in the logs of the task:b'1024'So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work
Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300
ok, thanks SuccessfulKoala55 !
here is the function used to create the task:
` def schedule_task(parent_task: Task,
task_type: str = None,
entry_point: str = None,
force_requirements: List[str] = None,
queue_name="default",
working_dir: str = ".",
extra_params=None,
wait_for_status: bool = False,
raise_on_status: Iterable[Task.TaskStatusEnum] = (Task.TaskStatusEnum.failed, Task.Ta...
And I am wondering if only the main process (rank=0) should attach the ClearMLLogger or if all the processes within the node should do that
Very good job! One note: in this version of the web-server, the experiments logo types are all blank, what was the reason to change them? Having a color code in the logos helps a lot to quickly check the nature of the different experiments tasks, isnt it?
Hi NonchalantHedgehong19 , thanks for the hint! what should be the content of the requirement file then? Can I specify my local package inside? how?
I did change the replica setting on the same index yes, I reverted it back from 1 to 0 afterwards
yes what happens in the case of the installation with pip wheels files?
Sure π Opened https://github.com/allegroai/clearml/issues/568
Alright I have a followup question then: I used the param --user-folder β~/projects/my-projectβ, but any change I do is not reflected in this folder. I guess I am in the docker space, but this folder is not linked to my the folder on the machine. Is it possible to do so?