Reputation
Badges 1
533 × Eureka!no this is from the task execution that failed
Oh I get it, I thought it is only a UI issue... but it actually doesn't send it O_O
AgitatedDove14 just a reminder if you missed this question 😄
Okay, so if my python script imports some other scripts I've written - I must use git?
logger.report_table(title="Inference Data", series="Inference Values", iteration=0, table_plot=inference_table)
Okay so that is a bit complicated
In our setup, the DSes don't really care about agents, the agents are being managed by our MLops team.
So essentially if you imagine it the use case looks like that:
A data scientists wants to execute some CPU heavy task. The MLops team supplied him with a queue name, and the data scientist knows that when he needs something heavy he pushes it there - the DS doesn't know nothing about where it is executed, the execution environment is fully managed by the ML...
is it possible to access the children tasks of the pipeline from the pipeline object?
Mmm maybe, lets see if I get this straight
A static artifact is a one-upload object, a dynamic artifact is an object I can change during the experiment -> this results at the end of an experiment in an object to be saved under a given name regardless if it was dynamic or not?
I have a single IAM, my question is what kind of permissions I should associate with the IAM so that the autoscaler task will work
the output above is what the agent has as it seems... obviously on my machine I have it installed
Maybe something similar to dockers, that I could name each one of my trains agents and then refer to them by name something like
trains-agent daemon --name agent_1 ...
Thentrains-agent stop/start
I've dealt with this earlier today because I set up 2 agents, one for each GPU on a machine, and after editing configurations I wanted to restart only one of them (because the other was working) and then I noticed I don't know which one to kill
this is the df -h output
I'll just exclude .cfg files from the deletion, my question is how to recover, must i recreate the agents or there is another way?
TimelyPenguin76
Do i need to copy this aws scaler task to any project I want to have auto scaling on? what does it mean to enqueue hte aws scaler?
So dynamic or static are basically the same thing, just in dynamic, I can edit the artifact while running the expriment?
Second, why would it be overwritten if I run a different run of the same experiment? As I saw, each object is stored under a directory with the task ID which is unique per run, so I assume I won't be overriding artifacts which are saved under the same name in different runs (regardless of static or dynamic)
it will return a Config object right?
Is tehre anything specific about the logs we're looking for? Because if I just dumop them it will take me a while to see no sensitive data and naming is there
Couldn't find any logic on which tasks fail and why... all the lines are exactly the same, only different parameters
Increased to 20, lets see how long will it last 🙂
Yep, the trains server is basically a docker-compose based service.
All you have to do is change the ports in the docker-compose.yml file.
If you followed the instructions in the docs you should find that file in /opt/trains/docker-compose.yml and then you will see that there are multiple services ( apiserver , elasticsearch , redis etc.) and in each there might be a section called ports which then states the mapping of the ports.
The number on the left, is ...
when I specify --packages I shoudl manually list them all not?
Is there a way to do so without touching the config? directly through the Task object?