
Reputation
Badges 1
533 × Eureka!Gotcha, didn't think of an external server as Service Containers are part of Github's offering, I'll consider that
it will return a Config
object right?
Continuing on this line of thought... Is it possible to call task.execute_remotely
on a CPU only machine (data scientists' laptop for example) and make the agent that fetches this task to run it using GPU? I'm asking that because it is mentioned that it replicates the running environment on the task creator... which is exactly what I'm not trying to do 😄
Yes, I'll prepare something and send
whatttt? I looked at config_obj
didn't find any set
method
AgitatedDove14
So nope, this doesn't solve my case, I'll explain the full use case from the beginning.
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" arti...
Committing that notebook with changes solved it, but I wonder why it failed
Do i need to copy this aws scaler task to any project I want to have auto scaling on? what does it mean to enqueue hte aws scaler?
and in the UI configuration I didn't understand where does permission management came into play
Trains docs have at no point any mention on what should I do on the AWS interface... So I'm not sure at what point I should encounter this wizard
I'm going to play with it a bit and see if I can figure out how to make it work
how do I run this wizard? is this wizard train's or aws's?
I mean I don't get how all the pieces add up
Okay, so let me get this straight
The autoscaling is basically an ever-running task (lets say on the services
queue). Now, the actual auto scaling and which queues exist have nothign to do with that, and are configured in the auto scale task?
its like ps
+ grep
together 😄
it seems that only the packages that are on the script are getting installed
` alabaster==0.7.12
appdirs==1.4.4
apturl==0.5.2
attrs==21.2.0
Babel==2.9.1
bcrypt==3.1.7
blinker==1.4
Brlapi==0.7.0
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
chrome-gnome-shell==0.0.0
clearml==1.0.5
click==8.0.1
cloud-sptheme==1.10.1.post20200504175005
cloudpickle==1.6.0
colorama==0.4.3
command-not-found==0.3
cryptography==2.8
cupshelpers==1.0
cycler==0.10.0
Cython==0.29.24
dbus-python==1.2.16
decorator==4.4.2
defer==1.0.6
distlib==0.3.1
distro==1.4.0
distro-info===0.23ubuntu1
doc...
I'll check the version tomorrow, about the current_task call, I tried before and after - same result
TimelyPenguin76 , this can safely be set to s3://
right?
Is there a more elegant way to find the process to kill? Right now I'm doing pgrep -af trains
but if I'll have multiples agents, I will never be able to tell them apart
Especially coming from the standpoint of a team leader or other kind of supervision (or anyone who wants to view the experiment which is not the code author), when looking at an experiment you want to see the actual code
Or should I change all three of them?
Worth mentioning, nothing has changed before we executed this, it worked before and now after the update it breaks
I think a good idea is to add to the error message when the clearml agent fails due to import error, a suggestion ot try out with pip freeze
AgitatedDove14 since this is a powerful feature, I think this should be documented. I'm at a point where I want to use the AWS autoscaler and i'm not sure how.
I see in the docs that I need to supply the access+secret keys, which are associated with an IAM, but nowhere does it say what permissions does this IAM need in order to execute.
Also using the name "AWS Autoscaler" immediately suggests that behind the scene, trains uses the https://docs.aws.amazon.com/autoscaling/ec2/userguide/wha...
even though I apply append