Hi @<1576381444509405184:profile|ManiacalLizard2> , is there a specific reason you're running the agent inside a docker container instead of running the agent in docker mode which would make it spin up a container?
StaleButterfly40 , it looks like there might be a good solution for your request. In the previous link I provided, there is a parameter '
continue_last_task'
that should work for you 🙂
Is there a specific reason you would want them executed on the same machine? Cache?
From the error you provided it looks like virtualenv isn't installed on the environment
Hi @<1558986867771183104:profile|ShakyKangaroo32> , can you please elaborate more on what is happening? So you're taking an existing task that finished and forcing it to get 'started' again? Then you write some things to it sometimes and then later you 'revive' it again? And due to this it appears some artifacts are missing?
Hi @<1523701260895653888:profile|QuaintJellyfish58> , I think what you're referring to is caching steps - if nothing changed code or configuration wise then use outputs from previous pipeline run. Is that what you're looking for?
I think you need to specify some pythonic object to work with torch.save() - as it appears in their documentation:
https://pytorch.org/docs/stable/generated/torch.save.html
@<1601023807399661568:profile|PompousSpider11> , if you know it needs to be installed I guess you could even inject it as a bash script if running in docker mode or just have it already pre-installed. Regardless, clearml-agent
will attempt to resolve torch installs.
Hi @<1615881718445641728:profile|EnchantingSeaturtle2> , what version of clearml
are you using? Are you running the server yourself or using the community server?
@<1523701260895653888:profile|QuaintJellyfish58> , check cache_executed_step
parameter in pipeline steps
What are the Elasticsearch, Mongo and apiserver versions in the docker compose are? Backup will only work in this scenario when they are exactly the same between 2 systems.
Hi @<1853608151669018624:profile|ColossalSquid53> , if there is no connectivity to the clearml server, your python script will run regardless. clearml
will cache all logs/events and then flush them once connectivity to the server is resumed.
Hello MotionlessCoral18 ,
Can you please add a log with the failure?
Please try like Kirill mentioned. Also please note that there is no file target in the snippet you provided 🙂
Hi GorgeousMole24 , you can certainly compare across different projects.
Simply go to "all projects" and select the two experiments there (you can search for them at the top right to find them easily)
Hi @<1625303791509180416:profile|ExasperatedGoldfish33> , I would suggest trying pipelines from decorators. This way you can have very easy access to the code.
None
Hi @<1668427989253099520:profile|DisgustedSquid10> , Unfortunately the open source has not programmatic user API, you can however remotely access your server and edit the user file live.
If user management is key, then the enterprise has full SSO integration including RBAC and of course API access
VexedCat68 , I don't think such an example exists, but if you create one it would be great if you opened a PR for the open source 🙂
Hi @<1523701842515595264:profile|PleasantOwl46> , you can use users.get_all
to fetch them - None
Hi @<1576381444509405184:profile|ManiacalLizard2> , it will be part of the Task object. It should be part of the task.data.runtime
attribute
How are you trying to 'target' the file in the code?
Also, how many GPUs are you trying to run off?
ScaryBluewhale66 , please look in:
https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
The relevant section for you is auto_connect_frameworks
The usage would be along these lines:Task.init(..., auto_connect_frameworks={'matplotlib': False})
Meaning that you should configure your host as follows host: "somehost.com:9000"
2024-02-08 11:23:52,150 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
(edited)
This looks unrelated, to the hotfix, it looks like you misconfigured something and therefor failing to write to s3
I'm not sure I understand your second request. Can you please elaborate on the exact process you're thinking of?
Clone task via UI -> Edit a config section in UI -> Enqueue it to a queue -> Worker picks it up and starts running the task -> Task is finished
What am I missing here?
Can you please elaborate on what you're trying to do and what is failing?
SparklingElephant70 , can you please provide the full log of the run? You can download it through the webapp 🙂
Hi @<1585441179091079168:profile|ColossalArcticwolf5> , can you provide a log of the run?
Hi @<1810483998732849152:profile|NonsensicalDuck81> , as long as you have docker installed, yes 🙂