Reputation
Badges 1
25 × Eureka!Oh :)task.get_parameters_as_dict()
task.connect
is two way, it does everything for you:base_params = dict(param1=123, param2='text') task.connect(base_params) print(base_params)
If you run this code manually, then print is exactly what you initialized base_params
with. But when the agent is running it, it will take the values from the UI (including casting to the correct type), so print will result in values/types from the UI.
Make sense ?
No worries, let's assume we have:base_params = dict( field1=dict(param1=123, param2='text'), field2=dict(param1=123, param2='text'), ... )
Now let's just connect field1:task.connect(base_params['field1'], name='field1')
That's it π
Just wanted to know how many people are actively working on clearml.
probably 30+ π
ReassuredTiger98 are you afraid from lack of support? or are you offering some (it is always welcomed) ?
We are always looking for additional talented people π DM me...
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
I'm sorry wrong line reference:
I'm assuming the error is due to ulimit missing:
try adding 16777216 to both soft/hard ulimit
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L58
Hi ReassuredTiger98
When clearml is running inside the docker the installed packages of the WebUI get updated.
Yes, this is by design, so the agent can always reproduce the exact python environment.
(internal the original requirements is also stored, but not available in the UI).
What exactly is the use case here ? wouldn't make sense to reproduce the entire working environment when you clone the executed Task ?
clearml will register conda packages that cannot be installed if clearml-agent is configured to use pip. So although it is nice that a complete package list is tracked, it makes it cumbersome to rerun the experiment.
Yes mixing conda & pip is not supported by clearml (or conda or pip for that matter)
Even python package numbers might not exist on both.
We could add a flag not to update back the pip freeze, it's an easy feature to add. I'm just wondering on the exact use case
preinstalled in the environment (e.g. nvidia docker). These packages may not be available via pip, so the run will fail.
Okay that's the part that I'm missing, how come in the first run the package existed and in the cloned Task they are missing? I'm assuming agents are configured basically the same (i.e. docker mode with the same network access). What did I miss here ?
Hi ReassuredTiger98
Could you send the log of both run ?
(I'm not sure this is a bug, or some misconfiguration , but the scenario should have worked...)
Then when ran a second time, the task will contain the requirements of the (conda-) environment from the first run.
What you see in the log under "Summary - installed python packages:" will be exactly what is updated on the Task. But it does not contain the "ruamel_yaml_conda" package, this is what I cannot get...
But I did find this part:ERROR: conda 4.10.1 requires ruamel_yaml_conda>=0.11.14, which is not installed.
Which point to conda needing this package and then failing to i...
ReassuredTiger98 both are running with pip as package manager, I thought you mentioned conda as package manager, no?agent.package_manager.type = pip
Also the failed execution is looking for "ruamel_yaml_conda" but it is nowhere to be found on the original one?! how is that possible ?
Hi SpicyLion54
the -f flag is not very stabe for pip (and cannot be added in requirements.txt). ClearML agent mwill automatically find the correct torch (from the torch repository) based on the cuda it detects in runtime.
This means it automatically translates torch==1.8.1 and will pull form the correct repo based on torch support table.
Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?
time.sleep(time_sleep)
You should not call time.sleep in async functions, it should be asyncio.sleep,
None
See if that makes a difference
Hi UnsightlyShark53 , just a quick FYI, you can also log the entire config file config.json
this will be stored as model configuration, and you can see it in the input/output models under the artifacts tab.
See example here you can path either the path to the configuration file, or the dictionary itself after you loaded the json, whatever is more convenient :)
ContemplativeGoat37
http://1.it seems the DNS resolving to the server fails? (Temporary failure in name resolution) Is this running on an agent, or manually ? "clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###" Is this you manually aborting the Task or is it aborting itslef due to the connectivity ?
4. what's the clearml/clearml-agent versions ?
Is there any references (vlog/blog) on deploying real-time model and do the continuous training pipeline in clear-ml?
Something along the lines of this one ?
https://clear.ml/blog/creating-a-fully-automatic-retraining-loop-using-clearml-data/
Or this one?
https://www.youtube.com/watch?v=uNB6FKIi8Wg
DilapidatedDucks58 I'm assuming clearml-server 1.7 ?
I think both are fixed in 1.8 (due to be released wither next week, or the one after)
You will have to build your own docker image based on that docker file, and then update the docker compose
Hi @<1569496075083976704:profile|SweetShells3>
Are you using the standard docker-compose ? are using the default elastic container ?
What exactly changed ?