Ohh if this is the case, and this is a stream of constant inference Results, then yes, you should push it to some stream supported DB.
Simple SQL tables would work, but for actual scale I would push into a Kafka stream then pull it (serially) somewhere else and push into a DB
p.s. you should remove this line 🙂extra_index_url: ["git@github.com:salimmj/xxxx"]
Can you copy the "Installed Packages" here, and point to the package causing the issue?
I wonder if using our own containers which should have most the deps will work better than a simpler container.
Why not, it's transparent, just run in --docker mode and provide a default docker image if the Task doesn't specify one.
Should I map the poetry cache volume to a location on the host?
Yes, this will solve it! (maybe we should have that automatically if using poetry as package manager)
Could you maybe add a github issue, so we do not forget ?
Meanwhile you can add the mapping here:
https://github.com/allegroai/clearml-agent/blob/bd411a19843fbb1e063b131e830a4515233bdf04/docs/clearml.conf#L137
extra_docker_arguments: ["-v", "/mnt/cache/poetry:/root/poetry_cache_here"]
Hi StrangePelican34 , you mean poetry as package manager of the agent? The venvs cache will only work for pip and conda, poetry handles everything internally:(
Hi JealousParrot68
no need for decorators, you can just pass the function to schedule_function=<function goes here>
🙂
See scheduler here
https://github.com/allegroai/clearml/blob/8708967a5ef4d8529a1a5ea417672e3ebbb258d7/clearml/automation/scheduler.py#L485
And triggers here:
https://github.com/allegroai/clearml/blob/8708967a5ef4d8529a1a5ea417672e3ebbb258d7/clearml/automation/trigger.py#L193
https://github.com/allegroai/clearml/blob/8708967a5ef4d8529a1a5ea417672e3ebbb258d7/clea...
Hi GracefulDog98
Any guess why the password is "incorrect" for me?
Basically the clearml-session CLI needs to be able to access (SSH) into the host (cleaml-agent) machine,
is that possible?
Hi JealousParrot68
You mean by artifact names ?
parser.add_argument( "--dataset_mean", type
=
float, nargs
=
"+", default
=
0.5)
I think providing nargs='+ ' assumes the type is a list. nonetheless we should be able to support it. Could you please add a GitHub issue so we do not forget ?
on the side note, is there any way to automatically give more meaningful names to the running docker containers?
What do you mean by that? running where? and where will you see them ?
and sometimes there are hanging containers or containers that consume too much RAM.
Hmmm yes, but can't you see it in CLearML dashboard ?
unless I explicitly add container name in container arguments, it will have a random name,
it would be great if we could set default container name for each experiment (e.g., experiment id)
Sounds like a great feature! with little implementation work 🙂 Can you add a GitHub issue on clearml-agent ?
Which means there will be atleast multiple published models entries of same model over time?
Only the specific one will be published (not all the Models the Task created)
I think you are correct, it seems like it is missing requirements to boto/azure/google (I will make sure this is added). In the meantime, you can stop the "triton serving engine" Task, reset it, add boto3 to the installed packages and relaunch.
That said your main issue might be packaging the python model. Basically you need to create a model from the entire folder (with whatever there is inside the folder), then Triton should be able to run it (if the config.pbtxt is correct).
` m = OutputMo...
The issue is uploading reporting fro http uploads (object storage will report upload). Basically the http upload is post with urllib that does not support upload callbacks for progress report. If you have an idea here, we will gladly add it (as you mentioned it can be quite annoying to have to open network manager to verify the upload is progressing)
Thanks LethalCentipede31 , i think (3) is the most stable solution (as it doesn't require to add another package, and should work on any python version / OS)
This is actually what we do for downloads .
DO you know if there is a minimum required python requests version ?
RoundMosquito25 do notice the agent is pulling the code from the remote repo, so you do need to push the local commits, but the uncommitted changes clearml will do for you. Make sense?
How does the folder structure look like, and where is the "package" and the entry script ?
If the only issue is this linetask.execute_remotely(..., exit_process=True)
It has to finish the static analysis of the entire repository (which usually happens in the background but now we have to wait for it). If the repo is large this could actually take 20sec (depending on CPU/drive of the machine itself)
it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
How can i make it such that any update to the upstream database
What do you mean "upstream database"?
UnevenDolphin73 are you positive, is this reproducible? What are you getting?
Hmm I wonder, can you try with this line before?Task._report_subprocess_enabled = False frameworks = { 'tensorboard': True, 'pytorch': False } Task.init(...)
ScantMoth28 it should work, I think default deployment also has an NGINX with reverse proxy on it switching from " http://clearml-server.domain.com/api " to " http://api.clearml-server.domain.com "
Hi DullCamel78
Hi everyone! Has anyone tried running
aws_autoscaler.py without docker?
Well generally since this is a remote machine the easiest way to control environment is with containers, hence the default use case. In theory you can change it to use venv, but then of course your a somewhat limited with the diff drivers/cuda/python environement.
performance under docker is 10% lower than on bare metal
add to your extra docker args
` extra_docker_arguments: ["...
Are you trying to upload an artifact post execution ?
I found "scheduler" on allegroai github, is it something related to the case I want to make?
MoodyCentipede68 it is exactly what you are looking for 🙂
Do notice that you need to make sure you have your services queue configured and running for that to work 🙂
If possible, i would like all together prevent the fileserver and write everything to S3 (without needing every user to change their config)
There is no current way to "globally" change the default files server (I think this is part of the enterprise version, alongside vault etc.).
What you can do is use an OS environment to override the conf file:CLEARML_FILES_HOST="
"
PricklyRaven28 wdyt?
Now I suspect what happened is it stayed on another node, and your k8s never took care of that
the issue moving forward is if we restart the pod we will have to manually update that again.
Can't you map the nginx configuration file ? (making the changes persistent across pods)