Reputation
Badges 1
611 × Eureka!Any idea why deletion of artifacts on my second fileserver does not work?
fileserver_datasets: networks: - backend - frontend command: - fileserver container_name: clearml-fileserver-datasets image: allegroai/clearml:latest restart: unless-stopped volumes: - /opt/clearml/logs:/var/log/clearml - /opt/clearml/data/fileserver-datasets:/mnt/fileserver - /opt/clearml/config:/opt/clearml/config ports: - "8082:8081"
ClearML successfu...
When I select many experiments it will only delete some and show an error message, that some could not be deleted. But if I only select a few, everything works fine.
Currently, my solution is to create an "agent-git" account and users can give read-access to this account which the clearml-agent then uses to clone. However, I find access-tokens to be a better solution. Unfortunately, clearml-agent removes the token from the git url
btw: Could you check whether agent.package_manager.system_site_packages is true or false in your config and in the summary that the agent gives before execution?
I start my agent in --foreground mode for debugging and it clearly show false , but in the summary that the agent gives before the task is executed, it shows true .
I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.
And how do I specify this in the output_uri ? The default file server is specified by passing True . How would I specify to use the second?
It seems like this is a bug however or is something like this to be expected? There shouldn't be files that are not shown in the WebUI..?
Artifact Size: 74.62 MB
Thanks for answering, but I still do not get it. file_history_size decides how many past files are shown? So if file_history_size=100 and I have 1 image/iteration and ran 1000 iterations, I will see images for iteration 900-1000?
@<1576381444509405184:profile|ManiacalLizard2> Thank you, but afaik this only works locally and not if you run your task on a clearml-agent!
I can put anything there: s3://my_minio_instance:9000 /bucket_that_does_not_exist and it will work.
I don't think so. It is related to issue with the clearml-server I posted in the other thread. Essentially the clearml-server hangs, then I restart it with docker-compose down && docker-compose up -d and the experiments sometimes show as running, but on the clearml-agents I see that actually nothing is running or they show as aborted.
I know that usually clearml-agents do not abort on server restart and just continue.
At least when you use docker containers the agent will reuse the existing python environment.
` apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/data/fileserver:/mnt/fileserver
depends_on:
- redis
- mongo
- elasticsearch
- fileserver
- fileserver_datasets
environment:
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_...
Okay, thank you anyways. I was just asking because I thought I had seen such a setting before. Must have been something different.
I think in the paid version there is this configuration vault, so that the user can pass their own credentials securely to the agent.
Unfortunately, I do not know that. Must be before October 2021 at least. I know I asked here how to use the preinstalled version and AgitatedDove14 helped me to get it work. But I cannot find the old thread 😕
Thank you for answering. So your suggestion would be similar to VexedCat68 's first idea, right?
Yes, that looks alright. Similar to before. Local execution works.
Thanks for your help again. I will just use detect_with_conda_freeze: true . Seems like a perfect solution for me!
Makes sense, but this means that we are not able to tell clearml-agent where to save on a per-task basis? I see the output_destination set correctly in clearml web interface, but as you say, clearml-agent always uses its api.fileserver ?
AnxiousSeal95 This bug seems to be affecting me. I just tried forcing clearml-agent to install clearml-agent==1.4.1 in the docker and now it works.
Btw: clearml-agent uses pip install clearml-agent -U to install clearml-agent in the docker. However, instead of using the newest clearml-agent it should use the version that the host machine is using to run clearml-agent in my opinion.
The debug samples? or the artifacts/models?
Both.
Yes, change the Task's output destination in the UI (or programmatically)
This has no effect. I am not able to change the files_sever, e.g. I can not change from None to None
If my files_server is None , it will always look there no matter what I set as output destination.
Or there should be an early error for trying to run conda based tasks on pip agents
clearml==0.17.4
` task dca2e3ded7fc4c28b342f912395ab9bc pulled from a238067927d04283842bc14cbdebdd86 by worker redacted-desktop:0
Running task 'dca2e3ded7fc4c28b342f912395ab9bc'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.vjg4k7cj.txt', '/tmp/.clearml_agent_out.vjg4k7cj.txt'
Current configuration (clearml_agent v0.17.1, location: /tmp/.clearml_agent.us8pq3jj.cfg):
agent.worker_id = redacted-desktop:0
agent.worker_name = redacted-desktop
agent.force_git_ssh...