DefeatedOstrich93 can you verify lightning actually only stored once ?
GentleSwallow91 notice this part:
Hi Martin. Sorry - missed your reply.
Yeap I am aware that docker_internal_mounts is inside agent section.
'-v', '/tmp/ssh-XXXXXXnfYTo5/agent.8946:/tmp/ssh-XXXXXXnfYTo5/agent.8946', '-e', 'SSH_AUTH_SOCK=/tmp/ssh-XXXXXXnfYTo5/agent.8946',
It is creating a copy of the ssh folder and setting the SSH_AUTH_SOCK env to it. You can just map the entire ssh folder automatically by un-setting SSH_AUTH_SOCK before running the agent.SSH_AUTH_SOCK= clearml-agent ...
This is the prerequisites of the docker service installed on the host machine (where the agent is running)
Basically follow: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
https://docs.docker.com/compose/gpu-support/
Could not locate channel name 'gg_clearml'
CheerfulGorilla72 these are the permissions:
https://github.com/allegroai/clearml/blob/427b98270cc846b5d7e4af49f9732e3eb8d7d3ae/examples/services/monitoring/slack_alerts.py#L13channels:join channels:read chat:write
My use case is when I have a merge request for a model modification I need to provide several informations for our Quality Management System one is to show that the experiment is a success and the model has some improvement over the previous iteration.
Sounds likes good approach 🙂
Obviously I don't want the reviewer to see all ...
Maybe move publish the experiment and move it to a dedicated folder ? Then even if they see all other experiments, they are under "development" p...
Hi @<1523703472304689152:profile|UpsetTurkey67>
I circumvented the problem by putting timestamp in task name, but I don't think this is necessary.
Just pass reuse_last_task_id=False
to Task.init, it will never try to reuse them 🙂
None
TenseOstrich47 it's based on free "index" so the first index not in used will be captured, but if you remove agents, then the order will change e.g. you take down worker #1 , the next worker you spin will be #1 becuase it is not taken)
So could it be that pip install --no-deps .
is the missing issue ?
what happens if you add to the installed packages "/opt/keras-hannd" ?
Ok, so it doesn't follow the exact same rules as
Task.init
?
Correct
I was afraid all the logs and outputs of a hyperparameter optimization task would be deleted just because no artifacts were created. (edited)
Should not happen 🙂
Okay the type is inferred from the default value of the function step itself, that means that both:data_frame = step_one(pickle_url, extra=1337)
anddata_frame = step_one(pickle_url, 1337)
Will pass extra as int
.
That said if the default value of the argument is missing, it will revert to str
In order to use the type hints as casting hint, we actually need to improve the task.connect
to support the type casting (they are stored)
Ohh, the controller task itself holds the artifacts ?
I think the main difference is that I can see a value of having access to the raw format within the cloud vendor and not only have it as an archive
I see it does make sense.
Two options, one, as you mentioned use the ClearML StorageManager to upload the files, then register them as external links with Dataset.
Two, I know the enterprise tier has HyperDatasets, that are essentially what you describe, with version control over the "metadata" and "raw storage" on the GCP, including the ab...
I thought about the fact that maybe we need to write everything in one place
It will be in the same place, under the main Task
Should work out of the box
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.
I think in the pipeline it was the original default, but it turns out for a lot of users this was not their defualt use case ...
Anyhow you can also pass repo="."
which will load + detect the repo in the execution environemtn and automatically fill it in
it will only if oom killer is enabled
true, but you will still get OOM (I believe). I think the main issue is the even from inside the container, when you query the memory, you see the entire machine's memory... I'm not sure what we can do about that
The docker-compose full logs?
Actually unless you specifically detached the matplotlib automagic, any plt.show() will be automatically reported.
BTW: which clearml version are you using ?
(I remember there was a change in the last one, or the one before, making the config loading differed until accesses)
Quick update, I found the issue, working on a fix 🙂
ShortElephant92 yep, this is definitely enterprise feature 🙂
But you can configure user/pass on the open source, even store as hasedh the passwords if you need.
I will TIAS, but maybe worthwhile to also mention if it has to be the absolute path or if relative path is fine too!
Good point! (absolute but you can use ~, and I "think" also $ENV )
however setting up the interpertier on pycharm is different on mac for some reason, and the video just didnt match what I see
MiniatureCrocodile39 Are you running on a remote machine (i.e. PyCharm + remote ssh) ?
I am logging debug images via Tensorboard (via
add_image
function), however apparently these debug images are not collected within fileserver,
ZanyPig66 what do you mean not collected to the file server? are you saying the TB add_image is not automatically uploading images? or that you cannot access the files on your files server?
Hi CrookedWalrus33
the python version is auto detected and register in "manual execution" time (i.e. when you run your code on your machine).
That said this is a suggestion for the agent, and only if it can actually find the matching Python version it will use it, otherwise it will use whatever is
available (i.e. Look through the PATH environment for a matching pythonX.Y
executable)
The easiest way to support would just make sure the python binary's path is added to the PATH env.
Does...
Hi DrabCockroach54
... and no logs for python script.
what do you mean by "no logs" , is it clearml logs? or k8s pod logs ?
Ohh I see now, okay there are two entries on an artifact, the actual artifact (link to file somewhere) and the text preview of the artifact . I think the "preview" is the issue