yes, or (because I deployed clearml using helm in kubernetes) from the same machine, but multiple pods (tasks).
Oh now I see, long story short, no 😞 the correct way of doing that is every node/pod creates it's own dataset,
then when you are done, you create a new version with the X datasets that you created as parents, the newly created version is just "meta" it basically tells the system how to combine the previously generated datasets (i.e. no data is actually re-uploa...
Hmm so you are saying you have to be logged out to make the link work? (I mean pressing the link will log you in and then you get access)
I think that clearml should be able to do parameter sweeps using pipelines in a manner that makes use of parallelisation.
Use the HPO, it is basically doing the same thing with some more sophisticated algorithm (HBOB):
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
For example - how would this task-based example be done with pipelines?
Sure, you could do something like:
` from clearml import Pi...
Could you right click on the failed experiment , select reset and send it again for execution?
Could that error be a random network issue ?
(Basically this seems like a generic network error not actually related to the trains-agent)
Is the trains-agent running in docker mode or venv mode?
Hi StraightCoral86
When I run an experiment usingÂ
Task.create()
 ,
Use Task.init 🙂
Task.create is meant to create an extranl Task (i.e. Job) ins the system, Not to auto-gernerate a job from the running code. Make sense ?
If I checkout/download dataset D on a new machine, it will have to download/extract 15GB worth of data instead of 3GB, right? At least I cannot imagine how you would extract the 3GB of individual files out of zip archives on S3.
Yes, I'm not sure there is an interface to extract only partial files from the zip (although worth checking).
I also remember there is a GitHub issue with uploading 50GB dataset, and the bottom line is, we should support setting chuck size, so that we can uploa...
SweetGiraffe8 Works when I'm using plotly...
Can you please copy paste the code with the plotly, it's probably something I'm missing
Maybe we should add it to Storage Manager? What do you think?
Hi BroadMole98 ,
what's the current setup you have? And how do you launch jobs to Snakemake?
GiganticTurtle0 , let me add some background. The idea is that at some point you had your code running on your machine (when developing it for example),
when you actually executed the code itself in development, you call 'task.init' (to track the development process for example). This Task.init call, did the analysis of the code and python package dependencies and stored in on the Task. Then when you clone the Task, it already lists all the python packages your code directly imports (see "In...
What's the matplotlib version ? and python version?
SmarmyDolphin68 What's the matplotlib version ? and python version?
Update us if it solved the issue (for increased visibility)
It should be autodetected, and listed in the installed packages with something like:keras-contrib @git+https://www.github.com/keras-team/keras-contrib.gitIs this what you are seeing?
If not you can add it manually with:Task.add_requirements('git+ ') Task.init(...)Notice to call before Task.init
each of it gets pushed as a separate Model entity right?
Correct
But there’s only one unique model with multiple different version of it
Do you see multiple lines in the Model repository ? (every line is an entity) basically if you store it under the same local file, it will override the model entry (i.e. reuse it and upgrade the file itself), otherwise you are creating a new model, "version" will be progress in time ?
you mean The Task already exists or you want to create a Task from the code ?
(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)
Basically the same restriction as manually launching two processes using the same GPU
I'm assuming your are looking for the AWS autoscaler, spinning EC2 instances up/down and running daemons on them.
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler
If this is the case, then we do not change the maptplotlib backend
Also
I've attempted converting theÂ
mpl
 image toÂ
PIL
 and useÂ
report_image
 to push the image, to no avail.
What are you getting? error / exception ?
BTW: is this on the community server or self-hosted (aka docker-compose)?
I think you can force it to be started, let me check (I pretty sure you can on aborted Task).
TenseOstrich47 this looks like elasticserach is out of space...
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking 🙂
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing 🙂
- SageMaker job is a container, which means for ...
compression=ZIP_DEFLATED if compression is None else compressionwdyt?
Hi ObnoxiousStork61
Is it possible to report ie. validation scalars but shifted by 1/2 iteration?
No 😞 these are integers
What's the reason for the shift?
I'm also curious 🙂