Reputation
Badges 1
25 × Eureka!The fact is that I use docker for running clearml server both on Linux and Windows.
My question was on running the agent, is it running with --docker flag, i.e. docker mode
Also, just forgot to note, that I'm running clearml-agent and clearml processes in virtual environment - conda environment on Windows and venv on Linux.
Yep that answers my question above π
Does it make any sense to chdngeΒ
system_site_packages
Β toΒ
true
Β if I r...
Hi @<1687643893996195840:profile|RoundCat60>
You mean the clearml-server AMI ?
I also wonder is there any specific reason to store previous versions ?
@<1523702932069945344:profile|CheerfulGorilla72> use the following bucket name when you are configuring your files/output uri
s3://<iphere>:<porthere>/<bucket_here>
From there everything should work as expected
Hi BroadMole98 ,
what's the current setup you have? And how do you launch jobs to Snakemake?
-rw------- 1 1000 1000 0 Feb 28 23:41 config
Hi @<1572395184505753600:profile|GleamingSeagull15>
Is there an official place to report bugs and add feature requests for the app.clear.ml website?
GitHub issues is usually the place, or the
Assuming GitHub, but just making sure you don't have another PM tool you'd rather use.
Really appreciate asking! it is always hard to keep track π
Hi EnviousStarfish54
You mean the console output ? if that's the case, the Task.init call will monkey patch the sys.stdout/sys.stderr to report to clearml as well as the console
Hi SweetGiraffe8
could you try with the latest RCpip install 0.17.5rc2
GiganticTurtle0 , let me add some background. The idea is that at some point you had your code running on your machine (when developing it for example),
when you actually executed the code itself in development, you call 'task.init' (to track the development process for example). This Task.init call, did the analysis of the code and python package dependencies and stored in on the Task. Then when you clone the Task, it already lists all the python packages your code directly imports (see "In...
Hi CheekyElephant36
First you need to run it once on your machine, once this is done (only a few steps is enough), you can one it and enqueue it. Then to actually connect the aws autoscaler (the part that spins machines and runs tasks) go to applications and select the aqs autoscaler.
Btw i think the next video will be about YOLO + autoscaler
Thanks MagnificentSeaurchin79 !
Let me check what's the status with this one, could it be the same as this one?
https://github.com/allegroai/clearml/issues/322
VexedCat68 both are valid. In case the step was cached (i.e. already executed) the node.job will be None, so it is probably safer to get the Task based on the "executed" field which stores the Task ID used.
Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
@<1533619716533260288:profile|SmallPigeon24> as long as these are the Exact same instances you can have them runing simultaneously (think multi node training), that said each one should "know" not to report over the others, because of course it will overwrite the reports.
Back to your point on multiple agents:
You cannot have two Tasks in the same queue, that means that a single agen...
Ad1. yes, think this is kind of bug. Using _task to get pipeline input values is a little bit ugly
Good point, let;s fix it π
new pipeline is built from scratch (all steps etc), but by clicking "NEW RUN" in GUI it just reuse existing pipeline. Is it correct?
Oh I think I understand what happens, the way the pipeline logic is built, is that the "DAG" is created the first time the code runs, then when you re-run the pipeline step it serializes the DAG from the Task/backend.
Th...
What's the matplotlib version ? and python version?
Hi MiniatureCrocodile39
I would personally recommend the ClearML show π
https://www.youtube.com/watch?v=XpXLMKhnV5k
https://www.youtube.com/watch?v=qz9x7fTQZZ8
What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right?
I think they send some "odd" character instead of CR, otherwise I cannot explain the difference.
Can you point to a toy example demonstrating the same issue ?
Also I just tried the pytorch-lightningΒ
RichProgressBar
Β (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.
Yey!
Apologies on the typo ;)
There is also a global "running_remotely" but it's not on the task
and do you have import tensorflow in your code?
Hi RipeGoose2
Are you continuing the Task, i.e. passing Task.init(..., continue_last_task=True)
Hi @<1523701797800120320:profile|SteadySeagull18>
...the job -> requeue it from the GUI, then a different environment is installed
The way that it works is, in the "originating" (i.e. first manual) execution only the directly imported packages are listed (no derivative packages that re required by the original packages)
But when the agent is reproducing the job, it creates a whole clean venv for the experiment, installs the required packages, then pip resolves the derivatives, and ...
SmarmyDolphin68 What's the matplotlib version ? and python version?
hmm, yes it should create the queue if it's missing (btw you could work around that and create it in the UI). Any chance you can open a github issue in the clearml helm chart repo so we do not forget ?
I aborted the task because of a bug on my side
π
Following this one, is treating abort as failed a must feature for the pipeline (in your case) or is it sort of a bug in your opinion ?
you mean to spin a pod with the agent inside it (daemon in services mode).
Or connect the services queue to the k8s cluster (i.e. define the pod template that uses cpu with not a lot of ram)?
Update us if it solved the issue (for increased visibility)