Reputation
Badges 1
25 × Eureka!Hi GiddyTurkey39
Are you referring to an already executed Task or the current running one?
(Also, what is the use case here? is it because the "installed packages are in accurate?)
Hi @<1556812486840160256:profile|SuccessfulRaven86>
it does not when I run a flask command inside my codebase. Is it an expected behavior? Do you have some workarounds for this?
Hmm where do you have your Task.init ?
(btw: what's the use case of a flask app tracking?)
Then I deleted those workers,
How did you delete those workers? the autoscaler is supposed to spin the ec2 instances down when they are idle, in theory there is no need for manual spin down.
Hi PerplexedWalrus3
you should get something like the following on the console :ClearML Task: created new task id=1ca59ef1f86d44bd81cb517d529d9e5a 2021-07-25 13:59:09 ClearML results page:
2021-07-25 13:59:16
ExcitedFish86
How do I set the config for this agent? Some options can be set through env vars but not all of themΒ
Hmm okay if you are running an agent inside a container and you want it to spin "sibling" containers, you need to se the following:
mount the docker socket to the container running the agent itself (as you did), basically adding " --privileged -v /var/run/docker.sock:/var/run/docker.sock
" Allow the host to mount cache and configuration from the host into the siblin...
No I mean configure the files_server
in the clearml.conf
JitteryCoyote63 did you use the new implementation:
https://github.com/allegroai/trains/blob/master/trains/automation/aws_auto_scaler.py
Hmm, I still wonder what is the "correct" answer for most people, is empty string in argparse redundant anyhow? will someone ever use it?
I have to admit, I haven't had the time π
Trying to get pip to be twice as fast π€
https://github.com/pypa/pip/pull/8215
Please keep pinging me, I would really like to follow on it.
Hi @<1523701066867150848:profile|JitteryCoyote63>
I found a memory leak
in
Logger.report_matplotlib_figure
Are you sure this is not Matplotlib leak but the Logger's fault ? I'm trying to think how we could create such a mem leak
wdyt?
Hi WackyRabbit7
First always check the functions on the Task object, they are the most straight forward access to the system.
Then if you need general purpose API calls, currently they are only documented in the doc-string of the API schema (that said it should be quite documented)
You can check all the endpoints https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
And finally if you want to easily use the RestAPI :
` from trains.backend_api.session.client impo...
Hi MysteriousBee56 ,
Yes this is permissions issue, the docker creates all folders as root (as it is the root user running inside the docker), Then when you execute in venv mode, you are running it from your user, which obviously cannot change root created folders.
MysteriousBee56 , The agent is not running on the "server" it's running on its machine.
The server just reflects the fact he agent is up..
To actually take it down you need to SSH (or connect to that machine) and stop the actual trains-agent process.
What is exactly the scenario you had in mind?
Hi @<1566596960691949568:profile|UpsetWalrus59>
Could it be the two experiments have the exact name ?
(I sounds like a bug in the UI, but I'm trying to make sure, and also understand how to reproduce)
What's your clearml-server version ?
JitteryCoyote63 I meant to store the parent ID as another "hyper-parameter" (under its own section name) not the data itself.
Makes sense ?
Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init
and when you are done call Task.close
. This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent
which could help you spin those experiments on as many machines as you l...
So that means your home folder is always mapped to ~/ on any machine you ssh to ?
C will be submitted to a different queue and I donβt care as much
Is there a way to define βtask affinityβ in this way?
Hi RoughTiger69 ,
when you say Task affinity, you mean, I want C to be executed next to A/B ? Affinity as a concept doesn't really exist, it can be abstracted to a queue, where you have agents pulling from multiple queues. Then C can be pushed to one the the queues (in theory you might be able to programmtically control the Queue of C), wdyt?
Hi JumpyDragonfly13 , just making sure, do you have an agent running on a remote machine ?
Can you have a direct TCP connection to the remote machine (the default port it will use is 10022)
Hi @<1547028116780617728:profile|TimelyRabbit96>
Notice that if running with docker compose you can pass an argument to the clearml triton container an use shared mem. You can do the same with the helm chart
PompousParrot44 these are the default plotly colors. You can change any of the layout properties with the
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/logger.py#L600
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent
RC. Could you verify please ?
(with matplotlib 3.2+ I get no warning, let me check with 3.1)
some dependencies will sometimes require different pip versions.
none π maybe setuptools, but not pip
version
(pip is just a utility to install packages, it will not be a dependency of one)
Thank you, I would love to make sure we fix it
Hi UnsightlySeagull42
But now I need the hyperparameters in every python file.
You can always get the Task from anywhere?main_task = Task.current_task()
is no agent listening to the "k8s_scheduler"
There should not be one, this is purely "virtual" , so users understand the k8s cluster is spinning their pod (sometimes it takes time, imagine EKS etc. , just visibility)
unfortunately I can't get info from the cluster
You should be able the pod in the cluster no?!
What's the Task Info panel say, can you share a screen shot ?
Ok no it only helps if as far as I don't log the figure.
you mean if you create the natplotlib figure and no automagic connect you still see the mem leak ?