what should I paste here to diagnose it?
can't remember, I just restarted everything so I don't have this info now
I don't have ifconfig
-_- why there isn't a link to source on the docs?
So once I enqueue it is up? Docs says I can configure the queues that the auto scaler listens to in order to spin up instances, inside the auto scale task - I wanted to make sure that this config has nothing to do to where the auto scale task was enqueued to
First of all I wasn't aware that was an option - but I think it's preferable to be able to do it through the command line. Because I'm developing the pipeline to be executed remotely, but for debugging I run it locally.
Using what you showed I can obviously write it, and delete it once it is ready, and rewrite it when I'm debugging or adding features - but I think DX-wise it would be nicer to be able to trigger this functionality through the command line
sorry I think it trimmed it
I tried what you said in the previous response, setting sdk.aws.s3.key
and sdk.aws.s3.secret
to the ones in my MINIO. Yet when I try to download an object, i get the following
` >>> result = manager.get_local_copy(remote_url="s3://*******:9000/test-bucket/test.txt")
2020-10-15 13:24:45,023 - trains.storage - ERROR - Could not download s3://*****:9000/test-bucket/test.txt , err: SSL validation failed for https://*****:9000/test-bucket/test.txt [SSL: WRONG_VERSION_NU...
and when looking at the running task, I still see the credentials
I guess not many tensorflowers running agents around here if this wasn't brought up already
Will try this out and report
I assume it has nothing to do with my client version
and the machine I have is 10.2.
I also tried nvidia/cuda:10.2-base-ubuntu18.04 which is the latest
AgitatedDove14 since this is a powerful feature, I think this should be documented. I'm at a point where I want to use the AWS autoscaler and i'm not sure how.
I see in the docs that I need to supply the access+secret keys, which are associated with an IAM, but nowhere does it say what permissions does this IAM need in order to execute.
Also using the name "AWS Autoscaler" immediately suggests that behind the scene, trains uses the https://docs.aws.amazon.com/autoscaling/ec2/userguide/wha...
If the credentials don't have access tothe autoscale service obviously it won't work
but remember, it didnt work also with the default one (nvidia/cuda)
(I'm working with maria)
essentially, what maria says is when she has a script with uncomitted changes, when executing remotely, the script that actually runs on the remote machine is without the uncomitted changes
e.g.:
Her git status
is clean, she makes some changes to script.py
and executes it remotely. What gets executed remotely is the original script.py
and not the modified version she has locally
AgitatedDove14
So I couldn't kill the service agent myself (permission denied, I'm not sudo). What I did is I docker-compose down
ed, commented out only the environment variable of GOOGLE_APPLICATION_CREDENTIALS
from the clearml services agent service and upped the docker-compose again. I enqueued the Cleanup Service and now it works. Really weird, looks like the setting of GOOGLE_APPLICATION_CREDENTIALS
causes an error when set even though I'm 100% is it not used for storag...
🤔 is the "installed packages" part editable? good to know
Isn't it a bit risky manually changing a package version? what if it won't be compatible with the rest?