Reputation
Badges 1
25 × Eureka!ReassuredTiger98 can you send the full log?
Also, what's the clearml-agent version?
fyi: we fixed an issue where the default order of the conda repositories cause pytorch to be installed form the conda forge instead of the pytorch repo, making it the cpu version instead of the gpu version:
This is the correct conda repo orderL
https://github.com/allegroai/clearml-agent/blob/cb6bdece39751eaef975287609b8bab603f116e5/docs/clearml.conf#L66
Simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Ohhh I missed that. What is the speed you get for uploading the artifacts to the server? (you can test it with simple toy artifact upload code) ?
Hi GreasyRaven35
You should set the output_uri, in Task init, it will auto upload the model, and register the remote location URLtask = Task.init(..., output_uri=True)
You can also specify a target bucket, if you configured credentials (e.g. output_uri=" s3://bucket ")
Tried context provider for Task?
I guess that would only make sense inside notebooks ?!
That said, you might have accessed the artifacts before any of them were registered
Just making sure, after the pipe
object is created, you can call Task.current_task() , is that correct?
PS. I just noticed that this function is not documented. I'll make sure it appears in the doc-string.
WickedGoat98 what's the clearml version you are using?
481.2130692792125 seconds
This is very slow.
It makes no sense, it cannot be network (this is basically http post, and I'm assuming both machines on the same LAN, correct ?)
My guess is the filesystem on the clearml-server... Are you having any other performance issues ?
(I'm thinking HD degradation, which could lead to a slow write speeds, which would effect the Elastic/Mongo as well)
Hi @<1707565838988480512:profile|MeltedLizard16>
Maybe I'm missing something but gust add to your YOLO code :
from clearml import Dataset
my_files_folder = Dataset.get("dataset_id_here").get_local_copy()
what am I missing?
(since you are using venv mode, if the cuda is not detected at startup time, it will not install the GPU version, as it has no CUDA support)
btw: I'm assuming that args
is not the ArgParser object, as the ArgParser is automatically "connected" ?
Notice Optuna will do TPE & hyper band Bayesian optimization to find the best combination
(I suspect you are correct, but I'm missing some information in order to understand where the problem is)
WackyRabbit7 can you send mock code that explains how you create the pipeline ?
My bad I wrote refresh and then edited it to the correct "reload" π
Can you fix locally, just to verify ?
I cannot test it at the moment, hence my question.
JuicyFox94 any chance you can blindly approve ?
Merged, is it working for you now?
Yes! Thanks so much for the quick turnaround
My pleasure π
BTW: did you see this (it seems like the same bug?!)
https://github.com/allegroai/clearml-helm-charts/blob/0871e7383130411694482468c228c987b0f47753/charts/clearml-agent/templates/agentk8sglue-configmap.yaml#L14
in order to work with ssh cloning, one has to manually install openssh-client to the docker image, looks like that
Correct, you have to have SSH inside the container so that git can use it.
You can always install with the following setup inside your agent's clearml.conf:extra_docker_shell_script: ["apt-get install -y openssh-client", ]
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L145
Hi MelancholyElk85
I have strong deja vu feeling. Credentials are OK. How to solve this? If you need the full log, how to share the full log without sharing private information? I'm fed up with this shit
Is this coming from the agent ?
I don't think so. it is solved by installing openssh-client to the docker image or by adding deploy token to the cloning url in web ui
You can also have the token (token==password) configured as the defauylt user/pass in your agent's clearml.conf
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L19
I execute theΒ
clearml-session
Β withΒ
--docker
Β flag.
This is to control the docker image the agent will spin for you (think dev enviroment you want to work in, like nvidia pytorch container already having everything you need)
DilapidatedDucks58 I see ...
This might be more complicated that one would imagine, a simple solution might be to store a snapshot of the values every-time we reach a new maximum, a quick hack might be to add it as text on one of the task's parameters or properties (that we can later add to the table as custom column).
wdyt?
server-->agent is fast, but agent-->server is slow.
Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)
No sure I follow, you mean to launch it on the kubernretes cluster from the ClearML UI?
(like the clearml-k8s-glue ?)