Hi @<1597762318140182528:profile|EnchantingPenguin77> , you can set this in the docker extra arguments section of the task
Do you have the associated email to the account or the workspace ID?
Hi @<1714451218161471488:profile|ClumsyChimpanzee54> , the Azure autoscaler is available only in the Scale/Enterprise plan. It functions the same as the GCP/AWS autoscalers. Basically scaling from 0 to as many as configured and then spinning them down automatically once the workload is over spin all the machines down like you described
How did you configure the files_server in clearml.conf ?
Also, how many GPUs are you trying to run off?
Hi RattyLouse61 , how are you adding users? Are you adding them as fixed users in one of the configuration files?
ReassuredTiger98 , BitterLeopard33 , I think I've encountered this 4GB http limit before. I think this should be fixed in next SDK release 🙂
Hi @<1570583227918192640:profile|FloppySwallow46> , please don't @ the entire channel for help 🙂
If a task is in pending it means that no agent picked it up yet. Maybe the agent is unavailable or the process crashed. Check in that direction
From the error you provided it looks like virtualenv isn't installed on the environment
default_output_uri is for artifacts & models while files_server is for debug samples and plots (if they are files)
Hi @<1597762318140182528:profile|EnchantingPenguin77> , can you please add the full log?
How are you building the pipeline?
Hi ReassuredTiger98 ,
I think it is something that was logged during the initial run, then the clearml-agent simply recreates the environment 🙂
SoreDragonfly16 Hi, what is your usage when saving/loading those files? You can mute both the save/load messages but not each one separately.
Also, do you see all these files as input models in UI?
2024-02-08 11:23:52,150 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
(edited)
This looks unrelated, to the hotfix, it looks like you misconfigured something and therefor failing to write to s3
You can use the API to call tasks.get_by_id and get that specific information. In the response it sits indata.tasks.0.completed
It's unrelated. Are you running the example and no scalers/plots are showing?
Hi @<1649221402894536704:profile|AdventurousBee56> , I'm not sure I understand. Can you add the full log and explain step by step what's happening?
Usually the location is on the file server at opt/clearml/data/fileserver
The addresses seems strange, is this the hostname?
It seems like a mix between hostname and IP?
Well not really
Please elaborate 🙂
Hi @<1523701553372860416:profile|DrabOwl94> , can you check if there are some errors in the Elastic container?
Hi @<1523701062857396224:profile|AttractiveShrimp45> , can you please add the configuration of your HPO app and the log?
What versions of clearml-agent & clearml are you using? Is it a self hosted server?
Can you try it with clearml==1.6.0 please?
Also, can you list the exact commands you ran?
Hi @<1795626098352984064:profile|SoggyElk61> , is it possible you have multiple environments?
ShallowGoldfish8 , I think the best would be storing them as separate datasets per day and then having a "grand" dataset that includes all days and new days are being added as you go.
What do you think?
Are you running in docker mode? You could maybe use another docker image that has python in it.
Hi AttractiveShrimp45 . You input min value as 0, max value as 1 and step as 1?