Reputation
Badges 1
979 × Eureka!CostlyOstrich36 yes, when I scroll up, a new events.get_task_log is fired and the response doesnโt contain any log (but it should)
In the comparison the problem will be the same, right? If I choose last/min/max values, it wonโt tell me the corresponding values for others metrics. I could switch to graphs, group by metric and look manually for the corresponding values, but that becomes quickly cumbersome as the number of experiments compared grow
So it looks like it tries to register a batch of 500 documents
now I can do nvcc --version
and I getCuda compilation tools, release 10.1, V10.1.243
Hi CostlyOstrich36 , I mean insert temporary access keys
So when I create a task using `task = Task.init(project_name=config.get("project_name"), task_name=config.get("task_name"), task_type=Task.TaskTypes.training, output_uri=" s3://my-bucket ") locally, the artifact is correctly logged remotely, but when I create the task remotely (from an agent) the artifact is logged locally (in the agent machine, not on s3)
So previous_task
actually ignored the output_uri
thanks for your help!
My use case it: in a spot instance marked for termination after 2 mins by aws, I want to close a task and prevent the clearml-agent to pick up a new task after.
The cloning is done in another task, which has the argv parameters I want the cloned task to inherit from
The weird thing is that the second experiment started immediatly, correctly in a docker container, but failed with User aborted: stopping task (3)
at some point (while installing the packages). The error message is suprizing since I did not do anything. And then all following experiments are queued to services queue and stuck there.
Ok, this I cannot locate
Yes, it did spin two instances for the same task
continue_last_task
is almost what I want, the only problem with it is that it will start the task even if the task is completed
So get_registered_artifacts()
only works for dynamic artifacts right? I am looking for a download_artifacts()
which allows me to retrieve static artifacts of a Task
Hi AgitatedDove14 , coming by after a few experiments this morning:
Indeed torch 1.3.1 does not support cuda, I tried with 1.7.0 and it worked, BUT trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101. But in the experiment log, the agent correctly reported the cuda version (111). I then replaced the torch==1.7.0 with the direct https link to the torch wheel for cuda 110, and that worked (I also tried specifyin...
Thanks TimelyPenguin76 and AgitatedDove14 ! I would like to delete artifacts/models related to the old archived experiments, but they are stored on s3. Would that be possible?
Thanks SuccessfulKoala55 ๐
AgitatedDove14 Yes I have the xpack security disabled, as in the link you shared (note that its xpack.security.enabled: "false"
with brackets around false), but this command throws:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
Thats how I would do it, maybe guys from allegro-ai can come up with a better approach ๐
in the UI the value is correct one (not empty, a string)
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
I am running on bare metal, and cuda seems to be installed at /usr/lib/x86_64-linux-gnu/libcuda.so.460.39