Reputation
Badges 1
25 × Eureka!Hi @<1523701797800120320:profile|SteadySeagull18>
...the job -> requeue it from the GUI, then a different environment is installed
The way that it works is, in the "originating" (i.e. first manual) execution only the directly imported packages are listed (no derivative packages that re required by the original packages)
But when the agent is reproducing the job, it creates a whole clean venv for the experiment, installs the required packages, then pip resolves the derivatives, and ...
SmarmyDolphin68 What's the matplotlib version ? and python version?
I aborted the task because of a bug on my side
π
Following this one, is treating abort as failed a must feature for the pipeline (in your case) or is it sort of a bug in your opinion ?
Update us if it solved the issue (for increased visibility)
It should be autodetected, and listed in the installed packages with something like:keras-contrib @git+https://www.github.com/keras-team/keras-contrib.gitIs this what you are seeing?
If not you can add it manually with:Task.add_requirements('git+ ') Task.init(...)Notice to call before Task.init
each of it gets pushed as a separate Model entity right?
Correct
But thereβs only one unique model with multiple different version of it
Do you see multiple lines in the Model repository ? (every line is an entity) basically if you store it under the same local file, it will override the model entry (i.e. reuse it and upgrade the file itself), otherwise you are creating a new model, "version" will be progress in time ?
you mean The Task already exists or you want to create a Task from the code ?
(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)
Basically the same restriction as manually launching two processes using the same GPU
I'm assuming your are looking for the AWS autoscaler, spinning EC2 instances up/down and running daemons on them.
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler
Hi ElegantKangaroo44 ,
This is basically the number of average number of experiments running, and the number of projects, and number of users. I think this is about it. nothing like google-analytics stuff. It is mainly aimed at giving some idea on how large is the usage. Sounds reasonable?
MelancholyElk85 that looks great, let me see how quickly we can push it (I think 1.1.5 needs to be pushed very soon, I'll check if we can have it before π )
Yep, basically this will query the Task and get the last one:
https://github.com/allegroai/clearml/blob/ca70f0a6f6d52054a095672dc087390fabf2870d/clearml/task.py#L729
Notice task_filter allows you do do all sorts of filtering
https://github.com/allegroai/clearml/blob/ca70f0a6f6d52054a095672dc087390fabf2870d/clearml/task.py#L781
Hmm so yes that is true, if you are changing the bucket values you will have to manually also adjust it in grafana. I wonder if there is a shortcut here, the data is stored in Prometheus, and I would rather try to avoid deleting old data, Wdyt?
If this is the case, then we do not change the maptplotlib backend
Also
I've attempted converting theΒ
mpl
Β image toΒ
PIL
Β and useΒ
report_image
Β to push the image, to no avail.
What are you getting? error / exception ?
Hi HappyDove3
Are you passing it this way?task.upload_artifact(name="my artifact", artifact_object=np.eye(3,3))
https://github.com/allegroai/clearml/blob/5953dc6eefadcdfcc2bdbb6a0da32be58823a5af/examples/reporting/artifacts.py#L38
BTW: is this on the community server or self-hosted (aka docker-compose)?
I think you can force it to be started, let me check (I pretty sure you can on aborted Task).
I see, so in theory you could call add_step with a pipeline parameter (i.e. pipe.add_parameter etc.)
But currently the implementation is such that if you are starting the pipeline from the UI
(i.e. rerunning it with a different argument), the pipeline DAG is deserialized from the Pipeline Task (the idea that one could control the entire DAG externally without changing the code)
I think a good idea would be to actually allow the pipeline class to have an argument saying always create from cod...
seems like the network inside the running code cannot access the localhost (even though you have --network=host . Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
TenseOstrich47 this looks like elasticserach is out of space...
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking π
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing π
- SageMaker job is a container, which means for ...
compression=ZIP_DEFLATED if compression is None else compressionwdyt?
Hi SmugOx94
Hmm are you creating the environment manually, or is it done by Task.init ?
(Basically Task.init will store the entire environment of conda, and if the agent is working with conda package manager it will use it to restore it)
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L50
Hi ObnoxiousStork61
Is it possible to report ie. validation scalars but shifted by 1/2 iteration?
No π these are integers
What's the reason for the shift?
I'm also curious π
BTW: I think it was fixed in the latest trains package as well as the cleaml package