BoredGoat1
Hmm, that means it should have worked with Trains as well.
Could you run the attached script, see if it works?
DeliciousBluewhale87 Yes I think so, do notice that you might end up with maximum of 12 pods.
You can also do the following with max 10 nodes: (notice --queue can always get a list of nodes it will pull based on the order of the queues)python k8s_glue_example.py --queue high_priority_q low_priority_q --ports-mode --num-of-services 10
My bad I wrote refresh and then edited it to the correct "reload" π
BattyLion34 the closest I can think of the is monitoring class that can easily be extended.
Datasets are a type of Task, so we can monitor a project and trigger an action when we see a change in number of Tasks/Datasets that are completed.
Monitoring class:
https://github.com/allegroai/clearml/blob/master/clearml/automation/monitor.py
Monitoring example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
I think a dataset monitoring example wil...
Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
Hi JitteryCoyote63 a few implementation details on the services-mode, because I'm not certain I understand the issue.
The docker-agent (running in services mode) will pick a Task from the services queue, then it will setup the docker for it spin it and make sure the Task starts running inside the docker (once it is running inside the docker you will see the service Task registered as additional node in the system, until the Task ends) once that happens the trains-agent will try to fetch the...
yes ...
What's your use case for passing an empty dict ? (meaning how would one use it later)
Are there any services OOB like this?
On the open-source, I can't recall any but will probably be easy to write. Paid tier might have an offering though, not sure π
PYTHONPATH is still not working as expected
inside your code if you do :import os print("PYTHONPATH", os.environ["PYTHONPATH"])what are you getting?
I reached over 1M API calls in about one week using clearml-serving
Oh that makes sense now π
If I remember correctly, adding an additional model to a signal clearml-serving instance should not actually change the number of API calls, they are mostly affected by the number of clearml-serving / containers and not in the number of models.
remote repository
's lock file.
Which file is that? the poetry lock of the internal VCS lock (the agent itself)
How do you run theΒ
clearml-agent
Β in docker mode
clearml-agent --docker
See here:
https://clear.ml/docs/latest/docs/clearml_agent#docker-mode
Maybe the configuration file changed?
None
The logic is if the name and project are the same, and there are no artifacts/models, and the last time it was created was under 72 hours, reuse the Task
Check the links that are generated in the ui when you upload an artifact or model
but I'd prefer to have a new instance deployed for each new experiment and that it also terminates when no new experiments are queued
I'm not objecting, just wondered on the rational behind the decision π
Back to the AWS autoscaler:
Basically if you have the services-agent running on your cluster, it will just run the aws-autoscaler for you π
The idea of the service-agent is to run logic/monitoring Tasks suck as the aws autoscaler. Notice that service-mode means multiple job per...
GreasyPenguin14 whats the clearml version you are using, OS & Python ?
Notice this happens on the "connect_configuration" that seems to be called after the Task was closed, could that be the case ?
Make sense π
Just make sure you configure the git user/pass in the docker-compose so the agent has your credentials for the repo clone.
By your description it seems to make no difference whether I added the files via sync or add, since I will have to create a new dataset either way.
Sync is design to take a local folder/s and add/remove files from a dataset based on the local changes (it does that automatically based on file existence / content)
The changes (i.e. added files) are uploaded as delta changes relative to the parent version, this means we are not always uploading all files.
Add on the other hand means you...
PompousBeetle71 BTW: if you remove the type=str from the argparse, it will do what you want, None will stay None (instead of ''), all other values will be of type str as this is always the default π
Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.
task = Task.get_task('task_id_here') task.mark_started(force=True) task.upload_artifact(..., wait_on_upload=True) task.mark_completed()
Hi SarcasticSparrow10 , so yes it does, this is more efficient when using pytorch loaders, and in some other situations.
To disable it add to your clearml.conf:sdk.development.report_use_subprocess = false2. interesting error, maybe we can revert to "thread mode" if running under a daemon. (I have to admit, I'm not sure why python has this limitation, let me check it...)
Why do you ask? is your server sluggish ?
how did you install trains?pip install git+
If I access the dataset on the same location directly it works fine:
wait, I'm confused, how is it the datset us there? did it download the dataset?
are you saying this line for example will fail? (assuming you actually have a dataset by that name)
data_path = Dataset.get(dataset_name="002_Datenset_MASAM_for_fintuning", alias="002_Datenset_MASAM_for_fintuning").get_local_copy()
this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash
So the encoding itself is done YAML style, and based on your example \b Has to be encoded to \b because this is string encoding, like \n will become "new line"
Make sense ?
. Are there any option to remove the example projects?
So sorry just realized I missed your message
Yes, but I'm not sure it will have an effect, see here
why the memory usage of the elastic search still persist on 32 gb after removing experiments?
did you restart the server after removing the experiments?