Reputation
Badges 1
979 × Eureka!The parent task is a data_processing task, therefore I retrieve it so that I can then data_processed = parent_task.artifacts["data_processed"]
GrumpyPenguin23 yes, it is the latest
AgitatedDove14 , what I was looking for was: parent_task = Task.get_task(task.parent)
Yes, actually thats what I am doing, because I have a task C depending on tasks A and B. Since a Task cannot have two parents, I retrieve one task id (task A) as the parent id and the other one (ID of task B) as a hyper-parameter, as you described π
AgitatedDove14 In my case I'd rather have it under the "Artifacts" tab because it is a big json file
No, they have different names - I will try to update both agents to the latest versions
Thanks! I will investigate further, I am thinking that the AWS instance might have been stuck for an unknown reason (becoming unhealthy)
Hi, /opt/clearml is ~40Mb, /opt/clearml/data is about ~50gb
Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300
SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that
I am running on bare metal, and cuda seems to be installed at /usr/lib/x86_64-linux-gnu/libcuda.so.460.39
Hi SuccessfulKoala55 , Yes itβs for the same host/bucket - Iβll try with a different browser
did you try with another availability zone?
AgitatedDove14 I was able to redirect the logger by doing so:clearml_logger = Task.current_task().get_logger().report_text early_stopping = EarlyStopping(...) early_stopping.logger.debug = clearml_logger early_stopping.logger.info = clearml_logger early_stopping.logger.setLevel(logging.DEBUG)
I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt? (edited)
That sounds great!
I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
Something was triggered, you can see the CPU usage starting right when the instance became unresponsive - maybe a merge operation from ES?
I have a mental model of the clearml-agent as a module to spin my code somewhere, and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)
Could you please point me to the relevant component? I am not familiar with typescript unfortunately π
As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)
That doesnβt work unfortunately
That said, v1.3.1 is already out, with what seems like a fix:
So you mean 1.3.1 should fix this bug?
(Btw the instance listed in the console has no name, it it normal?)
Sorry, I didn't get that π
SuccessfulKoala55 Could you please point me to where I could quickly patch that in the code?