Reputation
Badges 1
25 × Eureka!Sorry, I mean a vault on the clearml-server holding the credentials per user, then agent pulls it based on the user, and it is transparent from the user perspective
Ohh then use the AWS autoscaler, basically it what you want, spin an EC2 and set an agent there, then if the EC2 goes down (for example if this is a spot), it will spin it up again automatically with the running Task on it.
wdyt?
As I suspected, from your log:agent.package_manager.system_site_packages = falseWhich is exactly the problem of the missing tensorflow (basically it creates a new venv inside the docker, but without the flag On, it does not inherit the docker preinstalled packages)
This flag should have been true.
Could it be that the clearml.conf you are providing for the glue includes this value?
(basically you should only have the sections that are either credentials or missing from the default, there...
feature request: tell me what gets passed along each edge of the pipeline graph
Nice! please feel free to add to GH issue π
Hmm yes that is odd, let me see if I can reproduce
Actually it cannot be differed, long story short when the agent is running the same code we have to verify and pass arguments at import time. I have to wonder, I'm expecting the env variables to be preset (I.e previously set for the entire environment) how come they are manually set inside the code (and wouldn't that break when running with an agent)?
Also, I just wanted to say thanks for the tool! I'm managing a small data science practice and it's going to be really nice to have a view of all of the experiments we've got and know our GPU utilization, all without having to give every data scientist access to each box where the workflows are run. Incredibly stoked.
β₯ β€ β₯
BTW: StickyMonkey98 if you feel like writing a few examples I think it will be easy to push into the docs, so that at least we improve iteratively...
Yes (Mine isn't and it is working π )
Hi IrritableGiraffe81
Yes it deploys all ClearML (including web).
ClearML-serving unfortunately is a bit more complicated to spin, as it needs actual compute nodes.
That said we are working on making it a lot easier π
Any updates on trigger and schedule docsΒ
I think examples are already pushed, docs still in progress.
BTW: pipeline v2 examples are also out:
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
https://github.com/allegroai/clearml/blob/master/examples/pipeline/full_custom_pipeline.py
Hi LazyTurkey38
What do you mean the git repo is not recognized? When execute_remotely leaves you should see on the task a ref to the git repo with the exact commit ID you have locally pulled, do you see it under the Execution tab?
Hi PompousParrot44
You can check the cleanup service example.
It sleeps for 24 hours then spins up and does its thing.
You can always launch this service tasks on the services queue, its purpose is to run those services on the trains-server as additional CPU services. They will also be registered as service nodes, so you have visibility into which service is running.
In order to clone a task and wait for its completion.
Use the TrainsJob https://github.com/allegroai/trains/blob/65a4a...
Hi @<1523701168822292480:profile|ExuberantBat52>
I am trying to execute a pipeline remotely,
How are you creating your pipeline? and are you referring to an issue with the pipeline logic or is it a component that needs that repo installed ?
Hi @<1552101447716311040:profile|SteadySeahorse58>
ValueError: Could not find queue named "services"
Did you set an agent / auto-scaler ? where is the pipeline and its components will be running ?
quick video of the search not working
Thank you! this is very helpful, passing along to front-end guys π
and ctrl-f (of the browser) doesnβt work as lines below not loaded (even when you scroll it will remove the other lines not visible, so you canβt ctrl-f them)
Yeah, that's because they are added lazily
But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
From the log it installed:cudatoolkit==11.1.1
based on the CUDA it found on the host machine: agent.cuda_version = 110
But for some reason it installed the pytorch from the conda "pytorch" repo without the cuda support.
@<1671689437261598720:profile|FranticWhale40> this one: None
What's the error you are getting ?
Maybe the configuration file changed?
None
The logic is if the name and project are the same, and there are no artifacts/models, and the last time it was created was under 72 hours, reuse the Task
Train Data Params/a = {} Train Data Params/b = ...Then maybe we could "hack" it so that if you edit it in the UI like so:Train Data Params/a = {'new': 'value'} Train Data Params/b = ...You end up withparam = {'a': {'new': 'value'}, 'b' : ... }What do you think?
JumpyPig73 I think fire was just added:
https://github.com/allegroai/clearml/pull/550
You can test with the latest RC:pip install clearml==1.2.0rc1Regrading not finding Hydra-core package, what do you have listed under Execution: "Installed Packages" (it should have auto detected that you are importing hydra and list it there)
s there any way to see datasets uploaded to ClearML Data without downloading them using ClearML Data?
Hi VexedCat68
Currently when you create datasets with clearml-data it has to repackage your files, i.e. upload them. That said we have received numerous requests on "registering data", and we are looking into it.
Here is the main technical hurdles we are facing, and I would love to get your perspective:
If the data is not available locally, we cannot calculate the hash of the conten...
Let me try to add some color to this process analysis process.
Basically clearml will try to statically analyze the code (i.e. look for import/from packages)
Then it will list them in a pip requirements.txt format under installed packages.
When running inside conda environment, it will check which packages were installed via "conda install" (instead of pip install) and mark them internally. This process ensures that when the clearml-agent is running with conda package manager, it "knows" whic...
Since this fix is all about synchronizing different processes, we wanted to be extra careful with the release. That said I think that what we have now should be quite stable. Plan is to have the RC available right after the weekend.
You can do that programatically, clone the pipeline Task (a pipeline is also a Task) and change the Args section of that Task, wdyt?
Example:
None
Thanks EnviousStarfish54 we are working on moving them there!
BTW, in the mean time, please feel free to open GitHub issue under train, at least until they are moved (hopefully end of Sept).
Where did you add the Task.init call ?
I think the ClearmlLogger is kind of deprecated ...
Basically all you need is Task.init at the beginning , the default tensorboard logger will be caught by clearml
Or can I enable agent in this kind of local mode?
You just built a local agent