Reputation
Badges 1
25 × Eureka!Hi DepressedChimpanzee34
Why do you need to have the configuration added manually ? isn't the cleaml.conf easier ? If not I think OS environments are easier no? I run run above code, everything worked with no exception/warning... What is the try/except solves exactly ?
Hi RotundSquirrel78
Could those be the example experiments ?
Are you running your own server, is it the saas free tier server?
can you get the agent to execute the task on the current conda env without setting up new environment?
Wouldn't that break easily ? Is this a way to avoid dockers, or a specific use case ?
is there any other way to get task from the queue running locally in the current conda env?
You mean including cloning the code etc. but not installing any python packages ?
WittyOwl57
To get task Id's use (e.g. all the tasks of a specific project):task_ids = Task.query_tasks(project_name="examples", task_filter={'status': ["completed"])
Then per task:
` for t_id in tasks_id:
t = Task.get_task(t_id)
conf_dict = t.get_configuration_as_dict(name="filter")
task_param = t.get_parameters()
task_param['filter'] = conf_dict
# this is to enable to forcefully update parameters post execution
t.mark_started(force=True)
# update hyper-parame...
I'm not sure TB support confusion matrix regardless, from anywhere in your code you can do:from trains import Task Task.current_task().get_logger().report_confusion_matrix(...)
And can I store models with no attachment to tasks?
Assuming you have the Model ID :model = InputModel(model_id='aabbcc') local_file_or_folder = model.get_weights()
Is this what you are looking for?
Okay found it, ElegantCoyote26 the step name is changed but the Task name remains the same ... π
I'll make sure we fix it on the next version
SillyPuppy19 are you aborting the experiment or are you trying to protect crash? Is it like a callback functionality you are looking for?
What you actually specified is torch the @ is kind of pip remark, pip will not actually parse it π
use only the link https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl
LovelyHamster1
Also you can use pip freeze
instead of the static code analysis , on your development machines set:detect_with_pip_freeze: false
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/docs/clearml.conf#L169
NICE! MoodyCentipede68 this is awesome π
So it seems to get the "hint" from the type:
This will worktf.summary.image('toy255', (ex * 255).astype(np.uint8), step=step, max_outputs=10)
wdyt, should it actually check min/max and manually cast it ?
I see...
Current (and this will change soon) the entire delta is stored in a single file, so there is no real way to download a "subset" of the data, only a parent version π
Lets say that this small dataset has a ID ....
Yes this would be exactly the way to do so:
` param ={'dataset': small_train_dataset_id_here}
task.connect(param)
dataset_folder = Dataset.get(param['dataset']).get_local_copy()
... Locally it will use the
small_train_dataset_id_here ` , then whe...
Are you suggesting just taking theΒ
read_and_process_file
Β function out of theΒ
read_dataset
Β method,
Yes π
As for the second option, you mean create the task in theΒ
init
Β method of the NetCDFReader class?
correct
It would be a great idea to make the Task picklelizable,
Adding that to the next version to do list π
PanickyMoth78
Is it limited to
accounts? (
unfortunately, yes π , but I'm sure sales will be able to hook you up ...
BattyLizard6 to my knowledge the main issue with fractional GPU, is there is no real restriction on GPU memory allocation (with the exception of MIG slices, which is limited in other ways).
Basically one process/container can consume the maximum GPU ram on the allocated card (this also includes http://run.ai fractional solution, at least from what I understand).
This means that developer A can allocate memory so that developer B on the same GPU will start getting out-of-memory
(Notice in a...
Hi SubstantialElk6
No need for that, you can use the helm chart (or spin them once with kubctl) then they take care of scheduling by themselves.
You can also use the k8s glue (basically spinning kubernetes pods automatically for you, based on the Tasks that you push into the ClearML queue)
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
In short, two possible deployments
Static k8s pod running the agent (then the agent runs all the experiments inside t...
Yea I know, I reported this
LOL, apologies these days it a miracle I still remember my login passwords π
I think EmbarrassedSpider34 is correct.
When you pass the requirements to clearml-task, actually the agent depending on how it was configured (conda / pip) will do the installation.
That said, maybe it is worth adding support to provide the env.yml in the CLI ?
(Notice that adding specific channels needs to be configured on the agent, they are not stored per Task)
AlertCamel57 wdyt?
In order to facilitate the multiple credentials one must use the Clearml SDK obviously.
Yes π
Oh that makes sense, This depends on how you setup the clearml k8s glue, (becuase the resource allocation is done by k8s) a good hack to limit the number of containers per GPU is to set a RAM limitation per pod, then k8s will know to limit the number of pods on the same GPU machine,
wdty?
Oh I see
but now I'm confused if this is from code, why aren't you coping the Pipeline ID from the UI?
regrading the query, it should be something like
task_to_schedule = Task.get_task(project_name='MyProject/.pipelines/PipelineName', task_name='PipelineName')
what's the error/reply ?
Nice π
@<1523710674990010368:profile|GreasyPenguin14> for future reference the agent
part in the clearml.conf is only created when you call clearml-agent init (no need for it for the python SDK). Full default configuration is here:
None
no, i just commented it and it worked fine
Yeah, we should add a comment saying "optional" because it looks as if you need to have it there if you are using Azure.
Hi @<1556812486840160256:profile|SuccessfulRaven86>
it does not when I run a flask command inside my codebase. Is it an expected behavior? Do you have some workarounds for this?
Hmm where do you have your Task.init ?
(btw: what's the use case of a flask app tracking?)
Then I deleted those workers,
How did you delete those workers? the autoscaler is supposed to spin the ec2 instances down when they are idle, in theory there is no need for manual spin down.