CostlyOstrich36 did you manage to reproduce it?
I tried conda w/ python3.9 on a clean Windows VM , and it worked as expected ....
And can you see your promethues in your grafana?
Hi JumpyPig73 , I think it was synced to github. You can already test with: git install git+ https://github.com/allegroai/clearml.git
Check the examples on the github page, I think this is what you are looking for 🙂
https://github.com/allegroai/trains-agent#running-the-trains-agent
Hi VexedCat68
Could it be the python version is not the same? (this is the only reason not to find a specific python package version)
AstonishingWorm64 can you share the full log (In the UI under Results/Console there is a download button)?
Hi JitteryCoyote63
The new pipeline is almost ready for release (0.16.2),
It actually contains this exact scenario support.
Check out the example, and let me know if it fits what you are looking for:
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
Hmm I think you are correct:param auto_create: Create new dataset if it does not exist yet
it should have created it, this seems like a bug, I'll make sure to pass along 🙂
Are you saying that in the UI you do not see "confusion matrix" at all, only on the GS bucket ?
Actually what my service do is to collect
stdout/stderr
from the Docker socket
That's exactly how the agent works, it cannot really filter it, it logs everything by default for full visibility ...
Hi @<1697056701116583936:profile|JealousArcticwolf24> just saw the reply
Image look okay?! what what is the query? basically I'm truing to understand if grafana is connected to the Prometheus, and if the Prometheus has any data in it
Secondly, just to make sure, kafka service should be able to connect directly to the the container running the actual inference
Noooooooooo, it is still working 🙂
StraightDog31 how did you get these ?
It seems like it is coming from maptplotlib, no?
the error for uploading is weird
wait, are you still getting this error?
(Caused by SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac
Where is the code running (agent) GCP instance ? your machine ?
LuckyRabbit93 We do!!!
Yes please, just to verify my hunch.
I think that somehow the docker mounts the agent is creating are (for some reason) messing it up.
Basically you can just run the following (it will do everything automatically) (replace the <TASK_ID_HERE> with the actual one)
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig ...
... the one for the last epoch and not the best one for that experiment,
well
Now we realized there is an option tu use
"min_global"
on the sign, is this what we need?
Yes 🙂 (or max_global)
Hi @<1687653458951278592:profile|StrangeStork48>
secrets manager per se,
Quick question, are you running the trains-server over http or https ?
Great!
I'll make sure the agent outputs the proper error 🙂
Weird issue, I'll make sure we fix compatibility with python 3.9
I think what you are looking for is clearml-agent daemon
https://clear.ml/docs/latest/docs/clearml_agent
https://clear.ml/docs/latest/docs/getting_started/video_tutorials/agent_remote_execution_and_automation
Notice the error code:Action failed <400/401: tasks.create/v1.0 (Invalid project id: id=first_attempt)>
If that is the case, The project ID is incorrect (project id is not the project name)
Kind of as it tries to do "apt-get install"...
what did you have in mind ?
FreshKangaroo33 you can:from time import time Task.query_tasks(..., task_filter=dict(started=['<{}'.format(datetime.utcfromtimestamp(time())), ]))
I think this should work
We are here if you need further help 🙂
HI @<1687643893996195840:profile|RoundCat60>
Are you running on AWS ?