
Reputation
Badges 1
25 × Eureka!First let's try to test if everything works as expected. Since 405 really feels odd to me here. Can I suggest following one of the examples start to end to test the setup, before adding your model?
DefiantHippopotamus88 you are sending the curl to the wrong port , it should be 9090 (based on what remember from the unified docker compose) on your setup
EnviousPanda91 this seems like a specific issue with the clearml-task
cli, could that be ?
Can you send a full clearml-task command-line to test ?
Yes, you are too quick for the resource monitoring 🙂
What's the error you are getting ?
SubstantialElk6 could you post "Installed packaged" section under Execution of this specific Task?
Check the log to see exactly where it downloaded the torch from. Just making sure it used the right repository and did not default to the pip, where it might have gotten a CPU version...
See if this helps
Please send the full log, I just tested it here, and it seems to be working
BTW is it cheaper than ec2 instance? Why not use the aws autoscaler ?
doing some extra "services"
what do you mean by "services" ? (from the system perspective any Task that is executed by an agent that is running in "services-mode" is a service, there are no actual limitation on what it can do 🙂 )
MinuteGiraffe30 if you are running the following command while your current directory is where you code is, what are you getting?
$ git ls-remote --get-url origin
Thanks @<1569496075083976704:profile|SweetShells3> for bumping it!
Let me check where it stands, I think I remember a fix...
Hi ShakyJellyfish91
It seems clearml is using a single connection, that takes a long time download
Hmm, I found this one:
https://github.com/allegroai/clearml/blob/1cb5dbb276026644ae20fef63d58256cdc887818/clearml/storage/helper.py#L1763
Does max_connections=10
mean 10 concurrent connections ?
ssh: Could not resolve hostname
: Name or service not known
@<1542316991337992192:profile|AverageMoth57> so is this the main issue? this seems unrelated to the Gerrit thing, just missing configuration of the .ssh on the agent machine, is that correct?
Hi LazyLeopard18
I remember someone deploying , specifically on the AZURE k8s (can't remember now how they call it).
What is exactly the feedback you are after?
YEYYYYYYyyyyyyyyyyyyyyyyyy
GiddyTurkey39 do you mean to delete them from the server?
YummyFish22 can you point to the huggingface example you are using?
Hi @<1715175986749771776:profile|FuzzySeaanemone21>
and then run "clearml-agent daemon --gpus 0 --queue gcp-l4" to start the worker.
I'm assuming the docker service cannot spin a container with GPU access, usually this means you are missing the nvidia docker runtime component
I think you have it on the workers and queues page when you click on the worker you have its detials
Hi PanickyLion56
Yep savefig also works, you can also do,from clearml import Logger Logger.current_logger().report_matplotlib_figure(title="My Plot Title", series="My Plot Series", iteration=10, figure=plt)
https://github.com/allegroai/clearml/blob/0c5d12b830987aa9bb8d44d81e92ff9198008f29/examples/frameworks/matplotlib/matplotlib_example.py#L25
Hmm that is odd, but at least we have a workaround 🙂
What's the matplotlib backend ?
Hi BoredSquirrel45
as of today, my required packages aren't being recognized in cloned
Are you saying you are editing the code directly in the cloned Task, then enqueue the Task an the agent does not "auto recognize" the package ?
Hi SubstantialElk6
I think you are absolutely correct, it seems the glue pops all the arguments, when in fact it should maybe process them a,d convert the --env/-e
What do you think?
Aloso I assume if these are the default arguments they should actually be part of the k8s apply.yaml template no ?
Hi @<1533619725983027200:profile|BattyHedgehong22>
Can you elaborate ? what do you mean params file ?
Is this something like:
Task.current_task().connect_configuration('my_conf.json', name="my conf file")
Sure, in that case, wait until tomorrow, when the github repo is fully synced
main clearml repo?
Yep that sounds right 🙂 thank you!
Nice, that seems to be the issue. Any chance you can open a GitHub issue, so we do not loose track of it ?
Hi AverageBee39
It seems the json is corrupted, could that be ?