BTW:
Task.add_requirements('tensorflow', '2.2') will make sure you get the specified version 🙂
Hi WearyLeopard29
Yes 🙂 this is exactly how it should work
WickedGoat98 the agent itself can be executed on bare metal, no need to setup a docker for it (although fully supported)
Specifically the docker compose has the docker running in services mode, i.e. for CPU light weight tasks such as running pipelines .
If the agent running on GPU, the easiest way to is run on bare metal
(since you are using venv mode, if the cuda is not detected at startup time, it will not install the GPU version, as it has no CUDA support)
GrievingTurkey78 did you open the 8008 / 8080 / 8081 ports on your GCP instance (I have to admit I can't remember where exactly in the admin panel you do that, but I can assure you it is there :)
task._wait_for_repo_detection()
You can use the above, to wait until repository & packages are detected
(If this is something users need, we should probably make it a "public function" )
Hi BlandPuppy7 , is this Trains related, are you trying to integrate it, and need help?
In theory yes, in practice you will be using the same docker image for all the services, and they will never interfere with one another. and you have the option to do more sophisticated stuff, like map the file-server data for a clean up service (should be out in a few days :)) so a balance. Also remember that relatively speaking docker are quite light weight, this is not like saying a VM per service...
GiganticTurtle0 fix was just pushed to GitHub 🙂pip install git+
It works. However, still, it sometimes takes a strangely long time for the agent to pick up the next task (or process it), even if it is only "Hello World".
The agent check every 2/5 seconds if there is a new Task to be launched, could that be it?
ChubbyLouse32 and this works when running python code and not when the agent is running ?
On the same machine ?
Hi ShallowArcticwolf27
However, the AMI for version 0.16.1 has the following docker-compose file
I think we moved the docker-compose yaml when we upgraded from trains to clearml. Any reason your are installing the old docker-compose ?
However, I have not yet found a flexible solution other than ssh-agent forwarding.
And is it working?
Hi PanickyMoth78
My local
clearml.conf
file has agent's
git_user
and
git_pass
defined as in my
in order for the autoscaler to access your git , in the wizard you have to provide the git user/token
The component agent's log has:
Executing task id [90de043e354b4b28a84d5cc0788fe63c]: repository = branch = version_num =
Hmm, how does the decorator of the component looks like ? meaning did you specify a repo/branch/commi...
okay, let me know if it works
IrritableGiraffe81 could it be the pipeline component is not importing pandas inside the function? Notice that a function decorated with pipeline component become a stand-alone, this means that if you need pandas you need to import inside the function. The same goes for all the rest of the packages used.
When you are running with run_loclly or debug_pipeline you are using your local env , as opposed to the actual pipeline where a new env is created inside the repo.
Can you send the Entire p...
Hmm, so this is kind of a hack for ClearML AWS autoscaling ?
and every instance is running an agent? or a single Task?
If there is new issue will let you know in the new thread
Thanks! I would really like to understand what is the correct configuration
so does the container install anything,
The way the agent works with dockers:
spin the docker Install the base stuff inside the docker (like git and make sure it has python etc) Create a new venv inside the docker, inheriting everything from the docker's system wide python packages, this means that if you have the "object_detection" package installed, it will be available inside the new venv. Install specific python package your Task requires (inside the venv). This allows you to over...
Great!
I'll make sure the agent outputs the proper error 🙂
So when the agent fire up it get's the hostname, which you can then get from the API,
I think it does something like "getlocalhost", a python function that is OS agnostic
Then this is by default the free space on the home folder (`~/.clearml') that is missing free space
data["encoded_lengths"]
This makes no sense to me, data is a numpy array, not a pandas frame...
if in the "installed packages" I have all the packages installed from the requirements.txt than I guess I can clone it and use "installed packages"
After the agent finished installing the "requirements.txt" it will put back the entire "pip freeze" into the "installed packages", this means that later we will be able to fully reproduce the working environment, even if packages change (which will eventually happen as we cannot expect everyone to constantly freeze versions)
My problem...
but the logger info is missing.
What do you mean? Can I reproduce it ?
BTW: The code sample you shared is very similar to how you create pipelines in ClearML, no?
(also could you expand on how you create the Kedro node ? from te face o fit it looks like another function in the repo, but I have a feeling I'm missing something)
Hi SubstantialElk6 I believe you just need to use clearml 1.0.5 , and make sure you rae passing the correct OS environment to the agent
It might be broken for me, as I said the program works without the offline mode but gets interrupted and shows the results from above with offline mode.
How could I reproduce this issue ?
But there might be another issue in between of course - any idea how to debug?
I think I missed this one, what exactly is the issue ?
Hi @<1683648242530652160:profile|ApprehensiveSeaturtle9>
I send a request to the endpoint but never unload (the gpu memory keep increasing when I infer with a new model).
They are not unloaded after the request is done. see discussion here: None
You can however remove the model from the serving session (but I do not think this is what you meant)
I'm assuming you want to run multiple models on a single GPU with not en...