Reputation
Badges 1
25 × Eureka!Any chance you can open a GitHub issue so we do not forget this feature ?
Hi JealousParrot68
This is the same as:
https://clearml.slack.com/archives/CTK20V944/p1627819701055200
and,
https://github.com/allegroai/clearml/issues/411
There is something odd happening in the files-server as it replaces the header (i.e. guessing the content o fthe stream) and this breaks the download (what happens is the clients automatically ungzip the csv).
We are working on a hit fix to he issue (BTW: if you are using object-storage / shared folders, this will not happen)
Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)
Could it be that clone has to be False? (I assume the reasoning is the cloning feature)
With k8s glue going, want to finally look at clearml-session and how people are using it.
If used with k8s glue, you will have to run the glue with --ports-mode, then the clearml session will know how to connect to container itself, since at runtime the container will register the gateway + port for the learml-session client to connect to
TrickySheep9 you mean custom containers in clearml-session for remote development ?
Still this issue inside a child thread was not detected as failure and the training task resulted in "completed". This error happens now with the Task.init inside theΒ
if name == "main":
Β as seen above in the code snippet.
I'm not sure I follow, the error seems like your internal code issue, does that means clearml works as expected ?
I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
The docker image for the service-agent is not root ?
Hi SuperficialGrasshopper36
You are diffidently onto a bug π
It seems that with the new poetry , we fail to set the target venv (basically it decides for itself), from that point, the execution f the actual code is not running inside the correct venv.
Could you please open a GitHub issue?
I want to make sure this will be addressed π
BTW: I suspect this is the main issue:
https://github.com/python-poetry/poetry/issues/2179
I wonder if this hack would work
Assume you upload an artifact/model to ' s3://storage.yandexcloud.net:443/clearml-models ' notice the port is added. Would that trigger a popup in the UI?
Also what happens if you add tge credential manually in the profile page?
Wait, with the Port it does not work?
Notice that since this is external S3 you have to have the port specified so it Knows this is not an AWS S3 but a different compatible service
Hi ClumsyElephant70
Any idea how to get the credentials in there?
How about to map it into the docker with -v
you can set it here:
https://github.com/allegroai/clearml-agent/blob/0e7546f248d7b72f762f981f8d9033c1a60acd28/docs/clearml.conf#L137extra_docker_arguments: ["-v", "/host/folder/cred.json:/gcs/cred.json"]
Let me check what's the subsampling threshold
SubstantialElk6 This seems to be the issuecp: failed to access '/root/default_clearml.conf': Permission denied clearml_agent: ERROR: Could not find task id=024a421c0e174650a1c7ff64af756c26 (for host: )
Notice it seems it just cannot read the clearml.conf
, wdyt?
Hmm, I think the issue is here (the docker command mount)'-v', '/tmp/.clearml_agent.de0n48pm.cfg:/root/clearml.conf'
Hi SubstantialElk6
I can't see that is was removed, could you send the full log ?
I suspect it failed to create one on the host and then mount into the docker
Hi SubstantialElk6
Could you test with the latest RC6 ?
pip install clearml==0.17.5rc6
Yes, that sounds like the issue, is the file actually there ?
Hi @<1715900788393381888:profile|BitingSpider17>
Notice that you need __ (double underscore) for converting "." in the clearml.conf file,
this means agent.docker_internal_mounts.sdk_cache
will be CLEARML_AGENT__AGENT__DOCKER_INTERNAL_MOUNTS__SDK_CACHE
None
Hey IntriguedRat44 ,
Is this what you are after?
https://github.com/allegroai/trains/issues/181
Long story short, work in progress.
BTW: are you referring to manual execution or trains-agent
?
Hi @<1720249421582569472:profile|NonchalantSeaanemone34>
Is it possible to read data directly from server w/o using get_local_copy()?
do you mean an artifact ? what is direct here?
Let me know if you managed to get it working, then we can see if we can detect it automatically.
I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)task.get_offline_mode_folder()
You can also have a soft link of the offline folder (if you are working on a linux machine:ln -s myoffline_folder ~/.trains/cache/offline
Hi SmugTurtle78
Unfortunately there is no actual filtering for these logs, because they are so important for debugging and visibility. I have to ask, what's the use case to remove some of them ?