Reputation
Badges 1
981 × Eureka!In all the steps I want to store them as artifacts to s3 because it’s very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps
So I want to be able to visualise it quickly as a table in the UI and be able to download it as a dataframe, which of report_media or artifact is better?
I am now trying with agent.extra_docker_arguments: ["--network='host'", ] instead of what I shared above
No, they have different names - I will try to update both agents to the latest versions
SuccessfulKoala55 I was able to make it work with use_credentials_chain: true in the clearml.conf and the following patch: https://github.com/allegroai/clearml/pull/478
Ok yes, I get it, this info is also available at the very beginning of the logs, where the agent logs the full docker run command, this docker_cmd is a shorter version?
I have no idea what's going on
Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?
Hi AgitatedDove14 , I upgraded to 1.3.1 and the bug of missing logs in the console is still there… 😞
I made another recording so that you can understand what it is about:
I enqueue a task the task starts, the logs shown in the console are very sparse I scroll up and down to try to fetch missing logs, without success I download the logs, open the file and there I see the full logs
Hi DeterminedCrab71 Version: 1.1.1-135 • 1.1.1 • 2.14
Thanks for the clarification SuccessfulKoala55 ! A follow-up question:
I would like to install several packages (opencv, numpy, torch) in the system-site-packages so that they are available in each experiment (to reduce setup time of the experiments). Installing them globally via
(docker was install with sudo snap install docker )
AgitatedDove14 Yes I have the xpack security disabled, as in the link you shared (note that its xpack.security.enabled: "false" with brackets around false), but this command throws:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
haa got it, I am on a self hosted server, that’s why I don’t see it
Hi CostlyOstrich36 , this weekend I took a look at the diffs with the previous version ( https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0# ) and I saw several changes related to the scrolling/logging:
apiserver/bll/event/ http://log_events_iterator.py apiserver/bll/event/ http://events_iterator.py apiserver/config/default/services/_mongo.conf apiserver/database/model/ http://base.py apiserver/services/ http://events.pyI suspect that one of these changes might be responsible ...
Well actually I do see many errors like that in the browser console:
So two possible cases for trains-agent-1: either:
It picks a new experiment -> show randomly one of the two experiments in the "workers" tab no new experiment in default queue to start -> show randomly no experiment or the one that it is running
by mistake I have two agents started in one machine
Answering myself: Yes, Task.set_base_docker RTFM!!!
You mean it will resolve by itself in the following days or should I do something? Or there is nothing to do and it will stay this way?
AgitatedDove14 I have a machine with two gpus and one agent per GPU. I provide the same trains.conf to both agents, so they use the same directory for caching venvs. Can it be problematic?
(BTW: it will work with elevated credentials, but probably not recommended)
What does that mean? Not sure to understand
So I need to have this merging of small configuration files to build the bigger one
CostlyOstrich36 I updated both agents to 1.1.2 and still go the same problem unfortunately. Since I can download the full log file from the Web UI, I guess the agents are reporting correctly?
Could it be that the elasticsearch does not return all the requested logs when it is queried from the WebUI to display it in the console?
Now that I think about it, I remember that on the changelog of the clearml-server 1.2.0 the following is listed:
` Fix UI Workers & Queues and Experiment Table pages ...