
Reputation
Badges 1
25 × Eureka!You need to use tf.summary.image and not summary_ops_v2.image
Fixed on main branch (see github issue), RC later today
Image needs to be in range [0, 1] and not [0, 255] (matplotlib and tensorboard can handle either one)
Is there a code to reproduce ?
So it seems to get the "hint" from the type:
This will worktf.summary.image('toy255', (ex * 255).astype(np.uint8), step=step, max_outputs=10)
wdyt, should it actually check min/max and manually cast it ?
ReassuredTiger98 I think it is using moviepy
for the encoding... No?
actually no it is not, alpine is Not a good baseline, is is very very slim missing a ton of stuff.
I would use bullseye or slim (depending how many aux things you need on the container)
https://hub.docker.com//python/tags?page=1&name=bullseye
https://hub.docker.com//python/tags?page=1&name=slim-bullseye
Hi OutrageousSheep60
Is there a way to instantiate a
clearml-task
while providing it a
Dockerfile
that it needs to build prior to executing the task?
Currently not really, as at the aned the agent does need to pull a container,
But you can cheive basically the same by adding the "dockerfile" script as --docker_bash_setup_script
Notice of course that this is an actual bash script not Docker script, so no need for "RUN" prefix.
wdyt?
And actually the slack thing is actually a good workaround this since people can just comment easily
Any reference for similar integration between Slack and other platforms ?
I'm thinking maybe the easiest way is Slack bot to you can @ task id ?
Could it be in a python at_exit event ?
Hi @<1631102016807768064:profile|ZanySealion18>
I'm using SSH for authentication, however, known_hosts doesn't seem to be passed to the docker so it prompts for authentification/fingerprint. Any ideas?
Hmm it is supposed to automatically mount your ~/.ssh folder into the docker to solve for that.
First try to set force_git_ssh_protocol: true
None
If that does not he...
Added -v /home/uname/.ssh:/root/.ssh and it resolved the issue. I assume this is some sort of a bug then?
That is supposed to be automatically mounted the SSH_AUTH_SOCK defined means that you have to add the mount to the SSH_AUTH_SOCK socket so that the container can access it.
Try to run when you undefine SSH_AUTH_SOCK and keep the force_git_ssh_protocol
(no need to manually add the .ssh mount it will do that for you)
Wait, is "SSH_AUTH_SOCK" defined on the host? it should auto mount the SSH folder as well?!
DAG which get scheduled at given interval and
Yes exactly what will be part of the next iteration of the controller/service
an example achieving what i propose would be greatly helpful
Would this help?from trains.automation import TrainsJob job = TrainsJob(base_task_id='step1_task_id_here') job.launch(queue_name='default') job.wait() job2 = TrainsJob(base_task_id='step2_task_id_here') job2.launch(queue_name='default') job2.wait()
so firs yes, I totally agree. This is why the clearml-serving
has a dedicated statistics module that creates histograms over time, then we push it into Prometheus and connect grafana to it for dashboards and alerts.
To be honest, I would just use it instead of reporting manually, wdyt?
Hi @<1724960475575226368:profile|GloriousKoala29>
Is there a way to aggregate the results, such as defining an iteration as the accuracy of 100 samples
Hmm, i'm assuming what you actually want is to store it with the actual input/output and a score, is that correct?
Thanks SolidSealion72 !
Also, I found out that adding "pool.join()" after pool.close() seem to solve the issue in the minimal example.
This is interesting, I'm pretty sure it has something to do with the subprocess not "closing" properly (or too fast or something)
Let me see if I can reproduce
GreasyPenguin14
In the process MyProcess other processes are created via a ProcessPoolExecutor.
Hmm that is interesting, the sub-process has an additional ProcessPoolExecutor inside it ?
GrittyKangaroo27 if you can help with reproducible code that will be great (or any insight on reproducing the issue)
EcstaticGoat95 I can see the experiment but I cannot access the notebook (I get Binder inaccessible
)
Is this the exact script as here? https://clearml.slack.com/archives/CTK20V944/p1636536308385700?thread_ts=1634910855.059900&cid=CTK20V944
It does work about 50% of the times
EcstaticGoat95 what do you mean by "work about 50%" ? do you mean the other 50% it hangs ?
GreasyPenguin14 GrittyKangaroo27 the new release contains a fix, could you verify it solves the issue in your scenario as well (there is now a smart timeout to detect the inconsistent state, that means the close/exit procedure might be delayed (10sec) instead of hanging in these specific rare scenarios)
SolidSealion72 EcstaticGoat95 I'm hoping the issue is now resolved 🤞
can you verify with ?pip install git+
Hmm so if I understand what's going on, convert_test.py
needs to have the test.json
, since it creates the test.json but it does not call git add
on it, the test.json will not be part of the git diff
hence missing when executing remotely by the agent.
If test.json is relatively small (i.e. not 10s of MB) you could store it as configuration on the Task. for example:
` local_copy_of_test_json = task.connect_configuration('/path/to/test.json', name='test config')
print(...
I mean, can you install it with something like ?pip install git+
Basically the agent will install main repository, and any git submodules. But it cannot install multiple repositories, as the directory structure might be too much.
wdyt?
Hi ConvolutedChicken69
but when running the script it only clones the repo the clearml task is on, how can it get the other repo also?
Do you have a wheel or a git you can install it from ?
Is there a way to do this using ssh keys?
the .ssh of the host machine should be automatically mounted, you can force it by setting force_git_ssh_protocol: true
None
It is still not working for me. Are you using Linux, windows or macos?
should work for linux mac and windows, what are you using ?
Hi @<1801424298548662272:profile|ConvolutedOctopus27>
I am getting errors related to invalid git credentials. How do I make sure that it's using credentials from local machine?
configure the git_user/git_pass (app key) inside your clearml.conf on the machine with the agent:
None
Hi @<1523711619815706624:profile|StrangePelican34>
You can either report on the Model itself:
None
or you can force it on the Task:
task = Task.get_task("task id here")
task.mark_started(force=True)
task.get_logger().report_scalar(...)
task.mark_completed(force=True)
Could it be that this is the callback that causes it?
None
Hi @<1707565838988480512:profile|MeltedLizard16>
Maybe I'm missing something but gust add to your YOLO code :
from clearml import Dataset
my_files_folder = Dataset.get("dataset_id_here").get_local_copy()
what am I missing?