Is there a more elegant way to find the process to kill? Right now I'm doing pgrep -af trains but if I'll have multiples agents, I will never be able to tell them apart
sorry I think it trimmed it
I want to collect the dataframes from teh red tasks, and display them in the pipeline task
I think you are talking about separate problems - the "WARNING DIFF IS TOO LARGE" is only a UI issue, that you can't see hte diff in the UI - correct me if I'm wrong with this
Maria seems to be saying that the execution FAILS when she has uncomitted changes, which is not the expected behavior - am I right maria?
(I'm working with maria)
essentially, what maria says is when she has a script with uncomitted changes, when executing remotely, the script that actually runs on the remote machine is without the uncomitted changes
e.g.:
Her git status is clean, she makes some changes to script.py and executes it remotely. What gets executed remotely is the original script.py and not the modified version she has locally
The only way to change it is to convert apiserver_conf to a dictionary object ( as_plain_ordered_dict() ) and edit it
Could be, my message is that in general, the ability to attach a named scalar (without iteration/series dimension) to an experiment is valuable and basic when looking to track a metric over different experiments
the Task object has a method called Task.execute_remotely
Look it up here:
https://allegro.ai/docs/task.html#trains.task.Task.execute_remotely
So just to be clear - the file server has nothing to do with the storage?
AgitatedDove14 permanent. I want to start with a CLI interface that allows me add users to the trains server
Gotcha, didn't think of an external server as Service Containers are part of Github's offering, I'll consider that
I might, I'll look at the internals later cause at a glance I didn't really get the logic inside get_local_copy ... the if there is ending with if ... not cached_file: return cached_file which from reading doesn't make much sense
is it possible to access the children tasks of the pipeline from the pipeline object?
I set it to true and restarted by agent
I don't know, I'm the one asking the question 😄
AgitatedDove14 I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
To be clearer - how to I refrain from using the built in file-server altogether - and use MINIO for any storage need?
yeah but I see it gets enquequed to the default which I don't know what it is connected to
If I execute this task using python .....py will it execute the machine I executed it on?
and also in the extra_vm_bash_script variables, I ahve them under export TRAINS_API_ACCESS_KEY and export TRAINS_API_SECRET_KEY
AgitatedDove14 all I did was to cerate this metric as "last" and then turned on the "max" and "min" and then turned them off
I can't reproduce it now but:
I restarted the services and it didn't help I deleted the columns, and created them again after a while and it helped
to fix it, I excluded this var entirely from the docker-compose
I'm not, just want to be very precise an consice about them when I do ask... but bear with me, its coming 🙂
the worst part of debugging this is waiting for the docker to install tensorflow each time over and over again 😞
Even assuming it suspects me, why doesn't the captcha prove my innocence? Isn't it what it is for O_O
When you are inside a project, the search bar searches for experiments
so if you want to search inside a specific project, go to that project and use the search bar, if you want to search all over, go to the project called "All Experiments" and search there
