Reputation
Badges 1
611 × Eureka!What exactly does this mean? The environment is set after the script is started?
Or maybe a different question: What is not
Artifacts and Models. debug samples (or anything else the Logger class creates)
?
Also it is not possible to use multiple files server? E.g. log tasks on different S3 buckets without changing clearml.conf
Thank you very much, good to know!
But it is not possible to aggregate scalars, right? Like taking the mean, median or max of the scalars of multiple experiments.
So it seems to be definitely a problem with docker and not with clearml. However, I do not get, why it works for you but on none of my machine (all Ubuntu 20.04 with docker 20.10)
No. Here is a better example. I have two types of workstations: Type X can execute tasks of type A and B. Type Y can execute tasks of type B. This could be the case if type X workstations have for example more VRAM, newer drivers, etc...
I have two queues. Queue A and Queue B. I submit tasks of type A to queue A and tasks of type B to queue B.
Here is what can happen:
Enqueue the first task of type B. Workstations of type X will run this task. Enqueue the second task of type A. Workstation ...
[2021-05-07 10:53:00,566] [9] [WARNING] [elasticsearch] POST ` [status:N/A request:60.061s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
So I just tried again, but with manual deleting via Web UI.
That I understand. But I think (old) pip versions will sometimes not resolve a package. Probably not the case the other way around.
Yes, from the documentation:
Creates a new Task (experiment) if:
The Task never ran before. No Task with the same task_name and project_name is stored in ClearML Server.
The Task has run before (the same task_name and project_name), and (a) it stored models and / or artifacts, or (b) its status is Published , or (c) it is Archived.
A new Task is forced by calling Task.init with reuse_last_task_id=False.
Otherwise, the already initialized Task object for the same task_nam...
Wow, thank you very much. And how would I bind my code to task? Should I still use Task.init and it will just use the file it is called in as entrypoint or should I create a task using Task.create and specify the script?
I am wondering where to put my experiment logic, so that it gets lazily executed and not at task definition time (i.e. in get_task_experiment() how to get my experiment logic in there without running it)
Alright, that s unfortunate. But thank you very much!
With remote_execution it is command="[...]" , but on local it is command='train' like it is supposed to be.
But here is the funny thing:
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0
Installs GPU
SuccessfulKoala55 I just had the issue again. The logs show nothing of interest. It looks like OOM to me, but I will test this again with way larger SWAP, so the server only slows down, but does not kill something. Unfortunately, kernel logs also do not show much (maybe I have my server logs misconfigured, I am no expert).
What is interesting though is that docker only showed my nginx, minio and docker-registry to have exited, while all the clearml containers were still running. I restarted ...
Any idea why deletion of artifacts on my second fileserver does not work?
fileserver_datasets: networks: - backend - frontend command: - fileserver container_name: clearml-fileserver-datasets image: allegroai/clearml:latest restart: unless-stopped volumes: - /opt/clearml/logs:/var/log/clearml - /opt/clearml/data/fileserver-datasets:/mnt/fileserver - /opt/clearml/config:/opt/clearml/config ports: - "8082:8081"
ClearML successfu...
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
Mhhm, good hint! Unfortunetly I can see nowhere in logs when the server creates a delete request
Okay, thanks for explaining!
Yea, but doesn't this feature make sense on a task level? If I remember correctly, some dependencies will sometimes require different pip versions. And dependencies are on task basis.
Can you tell me how I create tasks correctly? The PipelineController.add_step takes the task-id/task-name, but I would rather just define a function that returns the task directly, since the base-task may not be already on the clearml-server.
Nvm. I forgot to start my agent with --docker . So here comes my follow up question: It seems like there is no way to define that a Task requires docker support from an agent, right?
And how to specify this fileserver as output_uri ?