okay this points to an issue with the k8s glue, I think it somehow failed to launch the pod. Can you send me the log of the clearml-k8s-glue ?
Hi TrickyRaccoon92
Yes please update me once you can, I would love to be able to reproduce the issue so we could fix for the next RC π
now i cant download neither of them
would be nice if address of the artifacts (state and zips) was assembled on the fly and not hardcoded into db.
The idea is this is fully federated, the server is not actually aware of it, so users can manage multiple storage locations in a transparent way.
if you have any tips how to fix it in the mongo db that would be great ....
Yes that should be similar, but the links would be in artifact property on the Tasks object
not exactly...
As we canβt create keys in our AWS due to infosec requirements
Hmmm
It may have been killed or evicted or something after a day or 2.
Actually the ideal setup is to have a "services" pod running all these service on a single pod, with clearml-agent --services-mode. This Pod should always be on and pull jobs from a dedicated queue.
Maybe a nice way to do that is to have the single Task serialize itself, then have the a Pod run the Task every X hours and spin it down
So I would like to to know what it send to the server to create the task/pipeline, ...
will my datasets be stored on the same machine that hosts the clearml server?
By default yes, they will be stored to the files-server (but you can change it, this is an argument for both the CLI and the python interface)
Ohh no I see, yes that makes sense, and I was able to reproduce m thanks!
Hi BoredHedgehog47 I'm assuming the nginx on the k8s ingest is refusing the upload to the files server
JuicyFox94 wdyt?
Hi FlutteringWorm14
Is there some way to limit that?
What do you mean by that? are you referring to the Free tier ?
So could it be that pip install --no-deps .
is the missing issue ?
what happens if you add to the installed packages "/opt/keras-hannd" ?
set the following:CLEARML_AGENT_DISABLE_SSH_MOUNT=1 clearml-agent daemon ...
The issue is, it will automatically mount the .ssh of the host into the container, so that if you are using SSH to clone git you have credentials, in your case, it also mounts the configuration, hence failing to login.
I will make sure we add it to the configuration file, so it is more visible
Hi ShortElephant92
No, this is opt-in, so other then checking for updates once in a while, no traffic at all
Hi DepressedChimpanzee34
if you try to extend it more then the width of the column to the right, it doesn't do anything..
You mean outside of the window? or are you saying you cannot extend it?
Just verifying, we are talking about the latest version of clearml-server ?
Ohh, the controller task itself holds the artifacts ?
I double checked with the guys this issue was fixed in 1.14 (of clearml server). It should be released tomorrow / weekend
Is ClearML combined with DataParallel
or DistributedDataParallel
officially supported / should that work without many adjustments?Yes it is suported, and should work
If so, would it be started via python ...
or via torchrun ...
?Yes it should, hence the request for a code snippet to reproduce the issue you are experiencing
What about remote runs, how will they support the parallel execution?Supported, You should see in the "script entry" something like "-m -m torch.di...
btw: you can also do cron
for that:
None
@reboot sleep 60 && clearml-agent daemon ...
Worker just installs by name from pip, and it installs not my package!
Oh dear ...
Did you configure additional pip repositories in the Agent's clearml.conf ? https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L77 It might be that (1) is not enough, as pip will first try to search the package in the pip repository, and only then in the private one. To avoid that, in your code you can point directly to an https of your package` Ta...
GrievingTurkey78 sure, aws autoscaler can do that:
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
Can my request be made as new feature so that we can tag same type of graphs under one main tag
Sure, open a Git Issue :)
SpotlessFish46 unless all the code is under "uncommitted changes" section, what you have is a link to the git repo + commit id
Hmm I tested on chromium and it seemed to work, let me see if I can reproduce it...
Nice!
is trainsConfig
pure text blob ?
What's the error you are getting ?
(open the browser web developer, see if you get something on the console log)
Ohh I see, could you copy paste what you put there (instead of the secret and key *** will do π )
OddAlligator72 sure thing π
This should sort it out:Task.init('examples', 'train', continue_last_task=True)
If you want to continue a specific Task:continue_last_task='task_id_here'
Getting the previous model:last_checkopoint = task.models['output'][-1]
What do you think?
Probably not the case the other way around.
Actually the other way around, new pip version uses new package dependency resolver that can concluded that a previous package setup is not supported (because of version conflicts) even though they worked...
It is tricky, pip is trying to get better at resolving package dependencies, but it means that old resolutions might not work which would mean old environments cannot be resorted (or "broken" env). This is the main reason not to move to p...
If you are using the "default" queue for the agent, notice you might need to run the agent with --services-mode
to allow for multiple pipeline components on the same machine