Reputation
Badges 1
25 × Eureka!'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.
This basically means there is no configuration on how to serve the mode, i.e. size/type of lower (input) layer and output layer.
You can wither store the configuration on the creating Task, like is done here:
https://github.com/allegroai/clearml-serving/blob/b5f5d72046f878bd09505606ca1147d93a5df069/examples/keras/keras_mnist.py#L51
Or you can provide it as standalone file when registering the mo...
Thank you WackyRabbit7 please feel free to remind me if it slips away during my night time (yes I do sleep , contrary to common belief :))
clearml doesnβt do any βmagicβ in regard to this for tensorflow, pytorch etc right?
No π and if you have an idea on how, that will be great.
Basically the problem is that there is no "standard" way to know which layer is in/out
Hi NonsensicalSeaanemone47
I'm assuming you mean k8s as compute cluster?
If so, then yes clearml adds priority scheduling on top of your existing kl8s cluster. It also allows you to reuse images as the k8s spins the base container image and then inside the container image the agent sets the environment of the experiment (clones code, apply diff, install missing python packages etc.)
It also gives visibility into the executed pods.
Make sense ?
DilapidatedDucks58
all our workers went down after starting the slack bot, is it expected?)
Oh dear... I can;t see any connection... What is the last log you have there?
Has anyone done this exact use case - updates to datasets triggering pipelines?
Hi TrickySheep9 seems like this is following a diff thread, am I missing something ?
ElegantKangaroo44 my bad π I missed the nuance in the description
There seems to be an issue in the web ui -> viewingΒ plots in "view in experiment table" doesn't respect the "scalars to display" one sets when viewing in "view in fullscreen".
Yes the info-panel does not respect the full view selection, It's on the to do list to add this ability, but it is still no implemented...
ElegantKangaroo44 definitely a bug, will be fixed in 0.15.1 (release in a week or so)
https://github.com/allegroai/trains/issues/140
Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81
Actually doesn't matter (systemd and init.d are diff ways to spin services on diff linux distros) you can pick whatever seems more continent for you, and whichever is supported by the linux you are running (in most cases both are) π
Try adding this environment variable:export TRAINS_CUDA_VERSION=0
JitteryCoyote63 good news
not trains-server error, but trains validation error, this is easily fixed and deployed
Please do, just so it wont be forgotten (it won't but for the sake of transparency )
Hi @<1689446563463565312:profile|SmallTurkey79>
This call is to set an existing (already created Task's requirements). Since it was just created it waits for the automatic package detection before overriding it.
What you want is " Task.force_requirements_env_freeze
" (notice Class level, that need to be called Before Task.init)
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
task = Task.init(...)
Is it being used to ssh to the instance?
It is used for the SSH client so it "knows" the SSH server (does that make sense) ?
Hi @<1734020162731905024:profile|RattyBluewhale45>
What's the clearml agent version? And could you verify with the latest RC?
Lastly how are you running the agent, docker mode? What's the bade container?
Make sense π
Just make sure you configure the git user/pass in the docker-compose so the agent has your credentials for the repo clone.
I assume so π Datasets are kind of agnostic to the data itself, for the Dataset it's basically a file hierarchy
ShakyJellyfish91 what exactly are you passing to Task.create?
Could it be you are only passing script=
and leaving repo=
None ?
1724924574994 g-s:gpu1 DEBUG WARNING:root:Could not lock cache folder /root/.clearml/venvs-cache: [Errno 9] Bad file descriptor
You have an issue with your OS / mount, specifically "/mnt/clearml/" is the base folder for all the cached stuff and it fails to create the lock files there either use a Local folder or try to understand what is the issue with the Host machine /mnt/ mounts (because it looks like a network mount)
(Not sure it actually has that information)
DeliciousBluewhale87 not on the opensource, for some reason it is not passed π
Could you explain the use case ?
Great to hear it got solved. BTW network drives are supported but you have to make sure the mount file system supports locks (NFS does)
SubstantialElk6 it seems the auto resolve of pytorch cuda failed,
What do you have in the "installed packages" section?
CooperativeFox72 btw, are you guys running those 20 experiments manually or through trains-agent ?
Weβd be using https in production
Nice π
@<1687653458951278592:profile|StrangeStork48> , I was reading this thread trying to understand what exactly is the security concern/fear here, and I'm not sure I fully understand. Any chance you can elaborate ?
I think your "files_server" is misconfigured somewhere, I cannot explain how you ended up with this broken link...
Check the clearml.conf on the machines or the env vars ?