Reputation
Badges 1
25 × Eureka!GiddyTurkey39
as others will also be running the same scripts from their own local development machine
Which would mean trains ` will update the installed packages, no?
his is why I was inquiring about theΒ
requirements.txt
Β file,
My apologies, of course this is supported π
If you have no "installed packages" (i.e. the field is empty in the UI) the trains-agent will revert to installing the requirements.txt from the git repo itself, then it...
Hi @<1569858449813016576:profile|JumpyRaven4>
What's the clearml-serving version you are running ?
This happens even though all the pods are healthy and the endpoints are processing correctly.
The serving pods are supposed to ping "I'm alive" and that should verify the serving control plan is alive.
Could it be no requests are being served ?
maybe we should add some ENV setting it? (I'm not sure we should disable SSL for all S3 connections... so somehow specify the mino it should use http with)
multiple machines and reporting to the same task.
Out of curiosity , how do you launch it on multiple machines?
reporting to the same task.
So the "funny" think is, they all report on on top (overwriting) the other...
In order for them to report individually, it might be that you need multiple Tasks (i.e. one per machine)
Maybe we could somehow have prefix with rank on the cpu/network etc?! or should it be a different "title", wdyt?
Oh if this is the case, then by all means push it into your Task's docker_setup_bash_script
It does not seem to have to be done after the git clone, the only part the I can see is setting the PYTHONPATH to the additional repo you are pulling, and that should work.
The main hurdle might be passing credentials to git, but if you are using SSH it should be transparent
wdyt?
Should be fairly easy to add no?
Okay let me check if we can reproduce, definitely not the way it is supposed to work π
. I was just wondering if instead of using local subprocesses, several agents could serve the same purpose (running several pipelines concurrently)
wouldn't --service-mode (read as multiple simultaneous Tasks on the same agent) solve the issue?
(BTW: if you set the pipeline component target queue to "services" , this is exactly what will happen)
ShinyPuppy47 the code that is being launched, does it call task.init?
Hmm interesting ...
Any chance you create an Issue on GitHub with this feature suggestion,
If we have some support we could accelerate the implementation
We abuse the object description here to store the desired file path.
LOL, yep that would work, I'm assuming you have some infrastructure library that does this hack for you, but really cool way around it π
And last but not least, for dictionary for example, it would be really cool if one could do:
Hmm what you will end up now is the following behaviour,my_other_config['bar'] will hold a copy of my_config , if you clone the Task and change "my_config" it will hav...
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
Is it also possible to specify different user/api_token for different hosts? For example I have a github and a private gitlab that I both want to be able to access.
ReassuredTiger98 my apologies I just realize you can use ~/.git-credentials for that. The agent will automatically map the host .git-credentials into the docker :)
Hi @<1658281099807166464:profile|SmallCamel52>
Lack of authentication in all versions of the fileserver component
Are you leaving the fileserver open to the world ?
but when the dependencies are installed, the git creds are not taken in account
I have to admit, we missed that use case π
A quick fix will be to use git ssh, which is system wide.
but I want know to switch to git auth using Personal Access Token for security reasons)
Smart move π
As for the git repo credentials, we can always add them, when you are using user/pass. I guess that would be the behavior you are expecting, unless the domain is different......
Okay I'll dig into it π
trains-agent should be deployed to GPU instances, not the trains-server.
The trains-agent purpose is for you to be able to send jobs to a GPU (at least in most cases) instance.
The "trains-server" is a control plane , basically telling the agent what to run (by storing the execution queue and tasks). Make sense ?
Itβs only on this specific local machine that weβre facing this truncated download.
Yes that what the log says, make sense
Seems like this still doesnβt solve the problem, how can we verify this setting has been applied correctly?
hmm exec into the container? what did you put in clearml.conf?
Hi @<1661542579272945664:profile|SaltySpider22>
Basically you need to put all of these files into a repository , which is always a good practice.
The reason is that the pipeline (and for that matter any Task on the system) can store wither a single script or a git reference, but not multiple scripts.
Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
Nice!
btw:
in the clearml.conf we put this:
for future reference, you are missing the sdk section:
sdk.http.timeout: 300
. notation works as well as {}
ngrok to connect to the remote server at the office?
That makes sense, I guess this is the equivalent of using a VPN, from that point onward clearml-session can directly access the remote machine, right?
Oh in that case add --remote-gateway <external_ip> It will connect to the provided address instead of the local one. (you can also add --public-ip which will automatically resolve the public IP of the server