Reputation
Badges 1
981 × Eureka!AgitatedDove14 I now tested with a real experiment, it works, but I saw two issues:
It first doesnt detect torch, downloads it but then says that it is already installed so it doesn't install it. One of the dependency of my repository is another repository (repo-2 in the logs). Both my repositories require numpy . When installing the first repository, it says Requirement already satisfied: numpy in /home/workeruser/.local/lib/python3.6/site-packages . Correct. But then it says `...
Hi @<1523701087100473344:profile|SuccessfulKoala55> I was able to find the issue, I was creating a queue and worker subprocess that were not properly cleaned up
For the moment this is what I would be inclined to believe
Sure ๐ Opened https://github.com/allegroai/clearml/issues/568
Still investigating, task.data.last_iteration is correct (equal to engine.state["iteration"] ) when I resume the training
I am not using hydra, I am reading the conf with:config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))
Also tried task.get_logger().report_text(str(task.data.hyperparams))
-> AttributeError: 'Task' object has no attribute 'hyperparams'
As to why: This is part of the piping that I described in a previous message: Task B requires an artifact from task A, so I pass the name of the artifact as a parameter of task B, so that B knows what artifact from A it should retrieve
ha nice, where can I find the mapping template of the original clearml so that I can copy and adapt?
Also I can simply delete the /elastic_7 folder, I donโt use it anymore (I have a remote ES cluster). In that case, I guess I would have enough space?
Sorry, I refreshed the page and itโs gone ๐
On the cloned experiment, which by default is created in draft mode, you can change the commit to point either a specific commit or the latest commit of the branch
Well no luck - using matplotlib.use('agg') in my training codebase doesn't solve the mem leak
Could you please point me to the relevant component? I am not familiar with typescript unfortunately ๐
Hi SuccessfulKoala55 , there it is > https://github.com/allegroai/clearml-server/issues/100
Yes, but I am not certain how: I just deleted the /data folder and restarted the server
What is weird is:
Executing the task from an agent: task.get_parameters() returns an empty dict Calling task.get_parameters() from a local standalone script returns the correct properties, as shown in web UI, even if I updated them in UI.So I guess the problem comes from trains-agent?
You are right, thanks! I was trying to move /opt/trains/data to an external disk, mounted at /data
Sure, where can I find this file?
I wouldn't do it, this is less code to maintain from your side and honestly too much auto magic makes it difficult for the user to control the environment (ie. to understand what happens behind the scenes). I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible
I did that recently - what are you trying to do exactly?
If the reporting is done on a subprocess, I can imagine that the task.set_initial_iteration(0) call is only effective in the main process, not in the subprocess used for reporting. Could it be the case?
I now have a different question: when installing torch from wheels files, I am guaranteed to have the corresponding cuda library and cudnn together right?
Not of the ES cluster, I only created a backup of the clearml-server instance disk, I didnโt think there could be a problem with ESโฆ
Yea again I am trying to understand what I can do with what I have ๐ I would like to be able to export as an environment variable the runtime where the agent is installing, so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
yes, exactly: I run python my_script.py , the script executes, creates the task, calls task.remote_execute(exit_process=True) and returns to bash. Then, in the bash console, after some time, I see some messages being logged from clearml
yes that makes sense, I will do that. Thanks!
it actually looks like I donโt need such a high number of files opened at the same time