
Reputation
Badges 1
25 × Eureka!clearml-agent
Β repo please π
Hi AstonishingSwan80 , what do you mean by "ec2 API"?
Hi TroubledJellyfish71
What do you have listed on the Task's execution "installed packages" section ? (of the original Task) ?
How did it end up with an http link of pytorch ?
Usually it would be torch==1.11
...
EDIT:
I'm assuming the original Task was executed on a Mac M1, what are you getting when calling pip freeze
?
And where is the agent running ? (and is it venv or docker mode?)
Hi @<1541954607595393024:profile|BattyCrocodile47>
Has anyone used ClearML for this use case?
you mean as experiment management / model registry / data? I think this is the bread&butter of clearml π
regrading the other options ion the list, I think most of them are alternatives to metaflow, not covering the parts you mentioned, no?
Can I change the parameters before executing the draft task
Yes you can, after you clone the experiment everything becomes editable, so you can edit the config in the UI.
For example, let's assume I have config.yml, and in my code I do:my_file = task.connect_configuration('config.yml') with open(my_file, 'rt') as f: ...
Then after I clone it in the UI and edit the configuration, when it will be executed remotely,my_file
will contain the content of the configuration as s...
I'm with on this one π it better to make a company wide decision on these things and not allow too much flexibility (just two options to choose from, and it should be enough, I think)
The other way will not work, as if you start with "pip" you cannot fail ... (if you fail it's in run time which is too late)
UnevenDolphin73 sounds great, any chance you can open a git issue on clearml-agent repo for this feature request ?
Oh no π I wonder if this is connected to:
Any chance the logger is running (or you have) from a subprocess ?
Hi JoyousElephant80
Another possibility would be to run a process somewhere that periodically polls ClearML Server for tasks that have recently finished
this is the easiest way to implement what you are after, and have full control over the logic itself.
Basically you inherit from the Monitor class
And implement the callback function:
https://github.com/allegroa...
I pull all the parameters, and then manually filter on the HP keys (manually=I have to plug them in, they are not part of optimizer object)
So is this an improvement to optimizer._get_child_tasks_ids(...)
interface ?
e.g. return a structure like:[ { 'id': task_id, 'hp1': value, 'hp2': value, 'hp3': value, 'objective': dict(title='title', series='series', value=42 }, ]
That is odd ...
Could you open a GitHub issue?
Is this on any upload, how do I reproduce it ?
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
TrickySheep9
you are absolutely correct π
I guess the thing that's missing from offline execution is being able to load an offline task without uploading it to the backend.
UnevenDolphin73 you mean like as to get the Task object from it?
(This might be doable, the main issue would be the metrics / logs loading)
What would be the use case for the testing ?
odd message though ... it should have said something about boto3
Yes, sorry, that wasn't clear π
Hi UnevenDolphin73
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old ...
NICE!
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
Yes, if the Task is using a specific python version it will first try to find this one (i.e. which python3.7
) then use it to create the new venv
As a result -> Could the agent maybe also output theΒ
virtualenv
Β version used ...
is there a built in programmatic way to adjustΒ
development.default_output_uri
?
How about: In your Task.init(output_uri='...')
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
Hi LovelyHamster1 ,
you mean totally ignore the "installed packages" section, and only use the requirements.txt ?
Eg, i'm creating a task usingΒ
clearml.Task.create
Β , often it doesn't properly get the git diff correctly,
ShakyJellyfish91 Task.create does not store any "git diff" automatically, is there a reason not to use Task.init
?
HugeArcticwolf77 you can add --services-mode
to the agent, and it will basically keep on spinning Tasks in parallel (unfortunately the open source version does not include a way to limit it to a maximum of concurrent Tasks)
Notice you should be able to override them in the UI (under Args seciton)
I think we added it somewhere in 0.14, anyhow I just checked the Logger doc, it is there now π
Remove this from your startup script:
#!/bin/bash
there is no need that, it actually "markes out" the entire thing
Hi @<1655744373268156416:profile|StickyShrimp60>
My hydra OmegaConf configuration object is not always being picked up, and I am unable to consistently reproduce it.
... I am using clearml v1.14.4,
Hmm how can we reproduce it? what are you seeing what it does "miss" the hydra, i.e. are you seeing any Hydra section? how are you running the code (manually , agent ?)
JitteryCoyote63
So there will be no concurrent cached files access in the cache dir?
No concurrent creation of the same entry π It is optimized...
This is very odd ... let me check something
Do you accidentally know if there are any plans for an implementation with the logger variable, so that in case of something it would be possible to write to different tables?
CheerfulGorilla72 what do you mean "an implementation with the logger variable" ? pytorch-lighting defaults to the TB logger, which clearml will automatically catch and log into the clearml-server, you can always add additional logs with clearml interface Logger.current_logger().report_???
What am I mis...