Reputation
Badges 1
25 × Eureka!ResponsiveCamel97
BTW: any reason not to allow this flexibility ?
Yes, that seems to be the case. That said they should have different worker IDs agent-0 and agent-1 ...
What's your trains-agent version ?
you can also just create a venv and run the tests there (with the latest python package) ?
Hi ShakyJellyfish91
It seems clearml is using a single connection, that takes a long time download
Hmm, I found this one:
https://github.com/allegroai/clearml/blob/1cb5dbb276026644ae20fef63d58256cdc887818/clearml/storage/helper.py#L1763
Does max_connections=10
mean 10 concurrent connections ?
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
Ohh then YES!
the Task will be closed by the process, and since the process is inside the Jupyter and the notebook kernel is running, it is still running
Hi VirtuousFish83 ,
Is it throwing an exception? Are you seeing the plot in the UI but the title is incorrect?
JitteryCoyote63 the new wizard was pushed, you can check it out here:
https://github.com/allegroai/trains/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
BTW: next release to include it all is next week (hopefully :))
Hi @<1664079296102141952:profile|DangerousStarfish38>
You mean spin the agent on multiple Windows machines? Yes that is supported, I think that it is limited to venv (i.e. not docker) mode, but other than that should work out of the box
BTW updating the values in grafana is basically configuration of the heatmap graph, so it is fairly easy to do, just not.automatic
You mean like a name of the artifact ?
SolidSealion72 this makes sense, clearml deletes artifacts/models after they are uploaded, so I have to assume these are torch internal files
available agent, i.e. not running anything else.
I mean how long would instance 1 wait until instance 2 of the experiment is up and running?
In other words what happens of all the nodes/agents are working and we still "need" additional instance.
This is basically like "pre-allocating" the nodes, only they wait in real-time until the additional node joins them.
Agent A pulls the 3 node Task, the Task clones itself (Task B) and enqueues on "very high priory queue" Task A wait until Task B is ru...
New RC hopefully solves it @<1643060801088524288:profile|HarebrainedOstrich43> could you check if it works for you now?
pip install clearml==1.14.0rc0
None of them is problematic, this is what I'm trying to say 🙂
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)
Okay
Try to reset the experiment and resend for execution, let me know if you still get the error, if you do, could you send a screen grap of the Execution tab? Trains supports either git repo, or standalone code (jupyter) but not a mixture of the two. This means that if you want to run the jupyter/colab the cloning will have to be part of the notebook itself (as you already have it). That said, due to the way CoLab works, Trains will log your execution history (as opposed to the entire jupy...
If you edit the requirements to have
https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl
It uses only one CPU core, could I use multiprocessing somehow?
Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?
Now I'm curious what's the workaround ?
WittyOwl57 I think this is a great idea, can you open a feature issue on GitHub so this is not forgotten ?
BTW: regardless, if you have time to upgrade to the new the azure package upgrade, it will be great 🙂 this is on our to do list for a while, but since not a lot of users complained it got pushed ...
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
Hi @<1729309120315527168:profile|ShallowLion60>
Clearml in our case installed on k8s using helm chart (version: 7.11.0)
It should be done "automatically", I think there is a configuration var in the helm chart to configure that.
What urls are you urls seeing now, and what should be there?
Hi @<1529271085315395584:profile|AmusedCat74>
ClearML Scheduler where it doesn't reuse the task
What do you mean by doesn't reuse the Task, do you mean you want each time the scheduler is launched to basically overwrite the previous run ?
From creating the event to actually sending it ... 30 min sounds like enough "time"...
Hi EagerOtter28
I think the replacement should happen here:
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/clearml_agent/helper/repo.py#L277
Hi SuperficialGrasshopper36
/home/ubuntu/.clearml/venvs-builds.1/3.8/task_repository/repository_name/.venv
This is the problem, they should not be installed there, it should be in/home/ubuntu/.clearml/venvs-builds.1/3.8/
Could you post the poetry.lock file? Maybe it is something there?
What's the poetry version and cleaml-agent versions ?
Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all