
Reputation
Badges 1
42 × Eureka!Thanks! A followup question - can I make the steps in the pipeline use the latest commit in the branch?
Hey, I tried doing that but sadly it doesn't seem to work. As suggested by the ECR docs, I added:aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <ECR URI>
To the extra_vm_bash_script
in the config file. I even added a docker pull
which I think worked (because it took much longer for the instances to spin up), but I still got the same error message 😞 Is there any way to debug these sessions through clearml? Thanks!
AgitatedDove14 is there any update on the open issue you talked about before? I think it's this one: https://github.com/allegroai/clearml/issues/214
ok, hopefully last question on this subject 🙂
I want to use Jenkins for some pipelines. What I would like to do is have one set of credentials saved on Jenkins. Then whenever a user triggers a pipeline - this is the user that will be marked as the task's user.
If I understand the options you suggested, I'll currently need either to (1) have some mapping between users and their credentials and have all the credentials saved on Jenkins; or, (2) have each user manually add 2 environment varia...
Sure, redacted most of the params as they are sensitive:
` run_experiment {
base_task_id = "478cfdae5ed249c18818f1c50864b83c"
queue = null
parents = []
timeout = null
parameters {
# Redacted the parameters
}
executed = "d1d361d1059c4f0981200f59d7683773"
}
segment_slides {
base_task_id = "ae13cc979855482683474e9d435895bb"
queue = null
parents = ["run_experiment"]
timeout = null
parameters {
Args/param = """
[
#...
python -m
http://script.as .a.module first_arg second_arg --named_arg value
<- something like that
right, of course 🙂 so just to make sure I'm running it correctly. I ran python aws_autoscaler.py --run
on my laptop and I see the Task on ClearML. Then took a completed task, cloned it and enqueued to the queue defined on the autoscaler. That should spin up an instance, right? (it currently doesn't, and I'm not sure where to debug)
yeah, maybe as an option in the Task.init
so no magic "username" key? 😛
Hey AgitatedDove14 thanks, that works! The docker is now up and running, great success.
I have a follow up, maybe you can help debug. Now for some reason git clone
doesn't work through the agent, but if I login myself to the machine and run the same command I see that fails in the log it works. The error I see is:
` cloning: git@gitlab.com:<repo_path>
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Reposito...
And for some reason this clone is marked as completed. Not sure why, as it failed
legit, I was thinking only about task tracking, less about user based credentials. good point
Hooray! That works AND the feature works!
Quick follow up question, is there any way to abort a pipeline and all of the tasks it ran?
No, I use an SSH connection which worked with the regular clearml-agent
, we prefer to work with SSH instead of creating a git user.
So apparently the NVIDIA AMI https://aws.amazon.com/marketplace/pp/prodview-e7zxdqduz4cbs
doesn't have the aws-cli
installed. So I install it in the extra_vm_bash_script
and now it wants a configuration. Is there any way to get that from the ENV vars you create? Do you think I should create my own AMI just for this?
I have access to the machine using SSH from my computer.
There doesn't seem to be any other error in the debug mode.
` Remote machine is ready
Setting up connection to remote session
Starting SSH tunnel
SSH tunneling failed, retrying in 3 seconds
Starting SSH tunnel `
I did not, I see that there's a field for extra_trains_conf
, but couldn't find clear documentation on how to use it. Is it just a reference to a trains_conf
(maybe clearml_conf
?)?
I just want to use auth0 (which we already use in the company) in order to manage the users...
I looked there, but couldn't find it. I'm currently experimenting with your free hosted server
Sounds promising, any ETA for the next version?
Also, tried the continue_pipeline option, didn't work as it couldn't parse the previous step that run...ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found
yeah, totally. Are there any services OOB like this?
nope, only port 22 is open for SSH. Is there anyway to set that as the port for clearml-session?
when I ran the script it autogenerated the YAML, so I should manually copy it to the remote services agents?
Is there an option to do this from a pipeline, from within the add_step
method? Can you link a reference to cloning and editing a task programmatically? nope, it works well for the pipeline when not I don't choose to continue_pipeline