
Reputation
Badges 1
25 × Eureka!I still do not get why this leads to some 0.5 values when in my plot there should only be 0 and 1.
Smart sub-sampling (lowpass filter before, aka averaging on a window)
Hi EagerOtter28
I think the replacement should happen here:
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/clearml_agent/helper/repo.py#L277
Where would I put these credentials? I don't want to expose them in the logs as environmental variable or hard code them.
Hi GleamingGrasshopper63
So basically you need a vault, to store those credentials...
Unfortunately the open-source version does not contain vault support, but the paid tiers scale/enterprise do.
There you can have an environment variable defined in the vault, that each time the agent runs your code, it will pull it from the vault and set it on your process. wdyt ?
Hi ElegantCoyote26
is there a way to get a Task's docker container id/name?
you mean like Task.get_task("task_id_here").get_base_docker()
?
ow a Task's results page also has a plot for this, but I guess it's at the machine level and not the task level?
This is actually on the container level, meaning checked from inside the container. It should be what you are looking for
Hi, I was expecting to see the container rather then the actual physical machine.
It is the container, it should tunnels directly into it. (or that's how it should be).
SSH port 10022
where is it persisted? if I have multiple sessions I want to persist, is that possible?
On the file server, yeah it should be support that, you can specify the --continue-session to continue a previously used one.
Notice it does delete older "snapshots" (i.e. previous workspace) when you are continuing a session (use --disable-session-cleanup
to disable it)
@<1699955693882183680:profile|UpsetSeaturtle37> can you try with the latest clearml-session (0.14.0) I remember a few improvements there
The remote machine is in Azure behind the load-balancer, we are using docker images, so directly connecting to pods.
yeah LB in the middle might be introducing SSH hiccups, first upgrade to the latest clearml-session it better ocnfigures the SSH client/server to support longer timeout connection, if that does not work try the -- keepalive=true
Le...
Thank you for saying ! π
worker nodes are bare metal and they are not in k8s yet
By default the agent will use 10022 as an initial starting port for running the sshd that will be mapped into the container. This has nothing to do with the Host machine's sshd. (I'm assuming agent running in docker mode)
yes, i see no more than 114 plots in the list on the left side in full screen modeβjust checked and the behavior exists on safari and chrome
Let me check with front-end guys π
Hi NastyFox63
What do you mean not all of them are shown?
Do they have diff series/titles, are they plots or scalars ? How are you reporting them ?
Let me rerun the code and check
Thanks for checking NastyFox63
I double checked with both front/backend , there should not be any limit...
Could you maybe provide a toy demo to reproduce the issue ?
I think I found something, let me test my theory
Okay, this is odd the request returned exactly 100 out 100.
It seems not all of them were reported?!
Could you post the toy code, I'll check what's going on.
that does make more sense π
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking π
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing π
- SageMaker job is a container, which means for ...
but I have no idea what's behingΒ
1
,Β
2
Β andΒ
3
Β compare to the first execution
This is why I would think multiple experiments, since it will store all the arguments (and I think these arguments are somehow being lost.
wdyt?
I think it would make sense to have one task per run to make the comparison on hyper-parameters easier
I agree. Could you maybe open a GitHub issue on it, I want to make sure we solve this issue π
So the naming is a by product of the many TB created (one per experiment), if you add different naming ot the TB files, then this is what you'll be seeing in the UI. Make sense ?
Hi SmallDeer34
Hmm I'm not sure you can, the code will by default use rglob
with the last part of the path as wildcard selection
π
You can of course manually create a zip file...
How would you change the interface to support it ?
GloriousPanda26 Are you getting multiple Tasks or is it a single Task ?
Programmatically before , importing the package, set os.environ['TRAINS_CONFIG_FILE']='~/my_new_trains.conf'
BTW: What's the use case for doing so?
thanks for helping again
My pleasure :)
Hi GrotesqueOctopus42 ,
BTW: is it better to post the long error message on a reply to avoid polluting the channel?
Yes, that is appreciated π
Basically logs in the thread of the initial message.
To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)
Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!
Hi @<1528908687685455872:profile|MassiveBat21>
However
no useful
template
is created for down stream executions - the source code template is all messed up,
Interesting, could you provide the code that is "created", or even better some way to reproduce it ? It sounds like sort of a bug? or maybe a feature support that is missing.
My question is - what is a best practice in this case to be able to run exported scripts (python code not made availa...