Reputation
Badges 1
981 × Eureka!There is no way to filter on long types? I can’t believe it
Yes, it works now! Yay!
and saved locally, which is why the second task, not executed in the same machine, cannot access the file
Thats how I would do it, maybe guys from allegro-ai can come up with a better approach 👍
Ok thanks! And for this?
Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
Sure, it’s because of a very annoying bug that I shared in this https://clearml.slack.com/archives/CTK20V944/p1648647503942759 , that I couldn’t solve so far.
I’m not sure you can downgrade that easily ...
Yea that’s what I thought, that’s a bit of pain for me now, I hope I can find a way to fix the bug somehow
That’s why I said “not really” 😄
Thanks for your input TenseOstrich47 , I was considering using a secret manager now, I guess that's the best option. I can move the secrets wherever I need them to be to make it work 🙂
Hi SmugDolphin23 thanks for the input! Will try now but that seems hacky: to have it working I have to specify python3.8 two times:
one in the agent config file (agent.default_python is already python3.8, but seems to be ignored) + make sure it is available (using python:3.8 docker image)Is there a way to prevent this redundancy? Ie. If I want to change the python version, I can control it from a single place?
SuccessfulKoala55 They do have the right filepath, eg:https://***.com:8081/my-project-name/experiment_name.b1fd9df5f4d7488f96d928e9a3ab7ad4/metrics/metric_name/predictions/sample_00000001.png
Try to spin up the instance of that type manually in that region to see if it is available
Thanks! Unfortunately still not working, here is the log file:
Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)
Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how t...
I am already trying with latest of pip 😞
AgitatedDove14 I eventually found a different way of achieving what I needed
And now that I restarted the server and went back into the project where I initially deleted the archived experiments, some of them are still there - I will leave them alone, too scared to do anything now 😄
AgitatedDove14 So what you are saying is that since I have trains-server 0.16.1, I should use trains>=0.16.1? And what about trains-agent? Only version 0.16 is released atm, this is the one I use
AgitatedDove14 yes but I don't see in the docs how to attach it to the logger of the earlystopping handler
super, thanks SuccessfulKoala55 !
I didn’t use ignite callbacks, for future reference:
` early_stopping_handler = EarlyStopping(...)
def log_patience(_):
clearml_logger.report_scalar("patience", "early_stopping", early_stopping_handler.counter, engine.state.epoch)
engine.add_event_handler(Events.EPOCH_COMPLETED, early_stopping_handler)
engine.add_event_handler(Events.EPOCH_COMPLETED, log_patience) `
Ok, this I cannot locate
So previous_task actually ignored the output_uri
SuccessfulKoala55 I deleted all :monitor:machine and :monitor:gpu series, but only deleted ~20M documents out of 320M documents in the events-training_debug_image-xyz . I would like now to understand which experiments contain most of the document to delete them. I would like to aggregate the number of document per experiment. Is there a way do that using the ES REST api?
ubuntu18.04 is actually 64Mo, I can live with that 😛