Reputation
Badges 1
25 × Eureka!It has to be alive so all the "child nodes" could report to it....
Many thanks LazyLeopard18 ! π
BeefyHippopotamus73 this error seems like it is coming from boto3, are you sure the credentials are properly configured and that you have read permission ?
ReassuredTiger98 All that said, how about opening an Issue on GitHub (feature request)? if we get a bit of support from users, we could definitely add it
ZanyPig66 you are correct in your assumptions. What exactly do you have in the Task? If there is no git repo the entire script should be under "uncommitted changes. What is your case?
Ohh so you are saying you can store it properly, but only editing in the UI is limited ? (Maybe this is just a UI thing)
How so? Installing a local package should work, what am I missing?
I think we should open a GitHub Issue and get some more feedback, maybe we should just add support in the backend side ?
Thanks GiganticTurtle0 !
I will try to reproduce with the example you provided. regardless I already took a look at the code, and I'm pretty sure I know what the issue is. We will be pushing a few fixes after the weekend, I'm hoping this one will be included as well π
Hi UpsetTurkey67
repository discovery stores github repo in the form:
...
while for others
git@github.com:...
Yes that depends on how they locally cloned the repo (via SSH or user/pass/token)
Interestingly in the former case the ssh config is ignored and cloning repository breaks on the worker
If you have passed git user/pass to the agent it should use them not SSH, how did you configure the agent ?
BTW: get_tasks has project_name argument, I would just use it π
GrotesqueDog77 one issue with this design, in order to run a sub-component, the call must be done from the parent component, does that make sense?
` def step_one(data):
return data
def step_two(path):
return model
def both_steps()
path = step_one("stuff")
return step_two(path)
def pipeline():
both_steps() Which would make
both_steps ` a component and step_one and step_two sub-components
wdyt?
. Looking at this example here, it looks like it only works with tasks:
Aha! Pipeline is a Task π (a specific type of Task, nonetheless a Task)
Just use the pipeline ID, and make sure you push it into the services queue, voila
JitteryCoyote63 instead of _update_requirements, call the following before Task.init:Task.add_requirements('torch', '1.3.1') Task.add_requirements('git+
')
I guess I would need to put this in the extra_vm_bash_script param of the auto-scaler, but it will reboot in loop right? Isnβt there an easier way to achieve that?
You can edit the extra_vm_bash_script
which means the next time the instance is booted you will have the bash script executed,
In the meantime, you can ssh to the running instance and change the ulimit manually, wdyt?
EnviousStarfish54 could you send the conda / pip environment?
Maybe that's the diff between machine A/B ?
OK - the issue was the firewall rules that we had.
Nice!
But now there is an issue with the
Setting up connection to remote session
OutrageousSheep60 this is just a warning, basically saying we are using the default signed SSH server key (has nothing to do with the random password, just the identifying key being used for the remote ssh session)
Bottom line, I think you have everything working π
Let me take a look, what's the clearml-server version and clearml python version?
Hi UnsightlySeagull42
Could you test with the latest RCpip install clearml==1.0.4rc0
Also could you provide some logs?
Hi John. sort of. It seems that archiving pipelines does not also archive the tasks that they contain so
This is correct, the rationale is that the components (i.e. Tasks) might be used (or already used) as cached steps ...
Hi @<1523701868901961728:profile|ReassuredTiger98>
The sdk.development.default_output_uri
is used for Artifacts and Models. debug samples (or anything else the Logger class creates) will use the api.file_server
On the Task itself, you have the "output destination" (in the Execution tab) which would override the "output_uri" on a Task level
Does that make sense ?
I'll make sure we add the reference somewhere on GitHub
Hi @<1624579015031394304:profile|JitterySeal56>
... and credentials in clearml.conf file on client side, but I have restrictions of aws keys expiring each hour
This means that you need to configure IAM role on your client machine, the data never goes through the server it is uploaded directly from the dev machine to the S3 bucket.
You can however just store the data on your clearml-files server ...
(only works for pyroch because they have diff wheeks for diff cuda versions)
and the step is "queued" or is it "queued" in the pipeline state (i.e. the visualization did not update) ?