Reputation
Badges 1
606 × Eureka!Wait, nvm. I just tried it again and now it worked.
Mhhm, now conda env creation takes forever since it probably resolves conflicts. At least that is what is happening when I tried to manually install my environment
Is this not something completely different?
This will just change the way to local repository is analyzed, but nothing about the agent.
It could be that either the clearml-server has bad behaviour while clean up is ongoing or even after.
Okay, it seems like it just takes some time to delete and to reflect in the WebUI. So when I try to delete again, actually a deletion process seems already to be running in the background.
Maybe deletion happens "async" and is not reflected in parts of clearml? It seems that if I try to delete often enough at some point it is successfull
I created an github issue because the problem with the slow deletion still exists. https://github.com/allegroai/clearml/issues/586#issue-1142916619
SuccessfulKoala55 So what happens is, that always when/after the cleanup_service runs, clearml will throw these kind of errors
[root@dc01deffca35 elasticsearch]# curl
`
{
"cluster_name" : "clearml",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 10,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_nu...
I restarted it after I got the errors, because as everyone knows, turning it off and on usually works 😄
Yea, the clearml-data is immutable, but not the underlying data if I just store a pointer to some location.
When I add the file the to repo it works fine just like you said.
I will try again tomorrow. It s getting late! Thank you for helping so far!
Mhhm, then maybe it is not clear 😂 to me how clearml.Task is meant to be used. I thought of it as being a container for all the information regarding a single experiment that is reflected on the server-side and by this in the WebUI. Now I init() a Task and it will show in the WebUI. I thought after initialization I can still update the task to my liking, i.e. it being a documentation of my experiment.
Then I could also do this:# My custom very special use case task = Task() task = task.load_statedict(await Task.load_or_create(task_name)) await task.synchronize() await run_code_analysis() task.add_requirement("myreq") await task.synchronize()
I think I still don't get how clearml is supposed to work/be used. Why wouldn't the following work currently?
Example:
` task = Task.init(...)
if not running_remotely:
task_dict = task.export_task()
requirements = task_dict["script"]["requirements"]["pip"].splitlines()
requirement_torch = [r for r in requirements if r.startswith("torch==")]
requirements.remove(requirement_torch[0])
requirements.append("torch >= 1.8.1")
task_dict["script"]["requirements"]["pip"] = "\n"....
And how do I specify this in the output_uri
? The default file server is specified by passing True
. How would I specify to use the second?