some dependencies will sometimes require different pip versions.
none š maybe setuptools, but not pip version
(pip is just a utility to install packages, it will not be a dependency of one)
CrookedWalrus33
Force SSH git authentication, it will auto mount the .ssh from the host to the docker
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25
Hi QuaintJellyfish58
You can always set it inside the function, withTask.current_task().output_uri = "s3://"I have to ask, I would assume the agents are pre-configured with "default_output_uri" in the clearml.conf, why would you need to set it manually?
Remove this from your startup script:
#!/bin/bash
there is no need that, it actually "markes out" the entire thing
Hi PricklyGiraffe97
Is this related?
https://clear.ml/blog/increase-huggingface-triton-throughput-by-193/
We actually added a specific call to stop the local execution and continue remotely , see it here: https://github.com/allegroai/trains/blob/master/trains/task.py#L2409
In the installed packages section it includes
pywin32 == 303
even though that is not in my requirements.txt.
So for some reason it is being detected (meaning your code base actually imports it in code)
But you can just remove it, either by manually editing the cloned Task (right click, reset, then you can edit the section), or via codeTask.ignore_requirements("pywin32") task = Task.init(...)
I didn't realise that pickling is what triggers clearml to pick it up.
No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)
Hi IntriguedRat44
You can make log it offline (i.e. into a local folder/zip) by calling:Task.set_offline(True)You can also set the environment variable:TRAINS_OFFLINE_MODE=1You could also just skip the Trains.init call š
Does that help?
I called task.wait_for_status() to make sure the task is done
This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object
GiddyTurkey39 Okay, can I assume "Installed packages" contains the packages you need?
If so, you can setup trains-agent on a machine (see instructions on the github)
And then clone the experiment, and enqueue it into the "default" queue (or any other queue your agent is connected to)
https://github.com/allegroai/trains-agent
Hi AttractiveWoodpecker16
I think is the correct channel for that question.
(any chance you can move your thread there?)
Specifically just email billing@clear.ml they will cancel (no need to worry about the beginning of the month, just explain and they will not charge over Nov)
EDIT: I know they are working on making it a one click in the UI, main limit is what happens with the data that was stored and was above the free tier threshold, anyhow I think next version will sort that as well.
tf datasets is able to handle batch downloading quite well.
SubstantialElk6 I was not aware of that, I was under the impression tf dataset is accessed on a file level, no?
@<1699955693882183680:profile|UpsetSeaturtle37> good progress, regrading the error, 0.15.0 is supposed to be out tomorrow, it includes a fix to that one.
BTW: can you run with --debug
Hi LazyTurkey38
What do you mean the git repo is not recognized? When execute_remotely leaves you should see on the task a ref to the git repo with the exact commit ID you have locally pulled, do you see it under the Execution tab?
Yes including this. (There was a fix to an issue with trains-agent and disabling frameworks, it is already part of 0.16.3 )
GreasyPenguin14 yes there is š
https://github.com/allegroai/clearml/issues/209
Set environment variable CLEARML_NO_DEFAULT_SERVER=1
iām just curious about how does trains server on different nodes communicate about the task queue
We start manual, we tell the agent just execute the task (notice we never enqueued it), if all goes well we will get to multi-node part š
You can try callingtask._update_repository()I'm still trying to figure out how to reproduce it...
Yes š
BTW: do you guys do remote machine development (i.e. Jupyter / vscode-server) ?
MysteriousBee56 okay look for the folder ~/.trains/vcs_cache you will find the git repo there, just overwrite the content with your local copy
HappyDove3 where are you running the code?
(the upload is done in the background, but it seems the python interpreter closed?!)
You can also wait for the upload:task.upload_artifact(name="my artifact", artifact_object=np.eye(3,3), wait_on_upload=True)
PompousParrot44 Enterprise licensing pricing usually custom tailored to the size of the company and based on usage. If you are interested feel free to leave details in the "contact us" form on the website, and someone from sales will contact you shortly after.
Hi ShortElephant92
This isn't an issue if the user is using a Service Account JSON Key,
Are you saying that when you are using GS python sdk directly it works?
For context, the google cloud storage SDK allows an authorized user credentials.
ClearML actually uses the google python SDK, the JSON is just a way to pass the credentials to the google SDK, I'm not sure it points to "service account"? where did that requirement came from ?
is it from here ` Service account info was n...
That's with the key at
/root/.ssh/id_rsa
You mean inside the container that the autoscaler spinned ?
Notice that the agent by defult would mount the Host .ssh over the existing .ssh inside the container, if you do not want this behavior you need to set: agent.disable_ssh_mount: true in clearml.conf
It uses only one CPU core, could I use multiprocessing somehow?
Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?
But the git apply failed, the error message is the "xxx already exists in working directory" (xxx is the name of the untracked file)
DefeatedOstrich93 what's the clearml-agent version?