
Reputation
Badges 1
25 × Eureka!CourageousLizard33 Are you using the docker-compose to setup the trains-server?
Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?
Hmm BitterStarfish58 what's the error you are getting ?
Any chance you are over the free tier quota ?
BitterStarfish58 could you open a GitHub issue on it? I really want to make sure we support it (and I think it should not be very difficult)
I'm not sure the files-server supports "continue" from last position...
Thanks BitterStarfish58 !
I think it would make sense to have one task per run to make the comparison on hyper-parameters easier
I agree. Could you maybe open a GitHub issue on it, I want to make sure we solve this issue 🙂
AstonishingWorm64
You can turn on the venv cache , it will just handle it's own full env caching 🙂
See here:
https://github.com/allegroai/clearml-agent/blob/4f7407084d1900a79d455570c573e60f40208742/docs/clearml.conf#L100
Okay that makes sense.best_diabetes_detection
is different from your example curl -X POST "
None "
notice best_mage_diabetes_detection` ?
JitteryCoyote63 see if upgrading the packages as they suggest somehow fixes it.
I have the feeling this is the same problem (the first error might be trains masking the original error)
SourSwallow36 it is possible.
Assuming you are not logging metrics by the same name, it should work.
try:Task.init('examples', 'training', continue_last_task='<previous_task_id_here>')
Hi ImpressionableRaven99
Yes, it is 🙂
Call this one before task.init, and it will run offline (at the end of the execution, you will get a link to the local zip file of the execution)Task.set_offline(True)
Then later you can import it to the system with:Task.import_offline_session('./my_task_aaa.zip')
As long as you import clearml on the main script, it should work. Regarding the Nvidia container, it should not interfere with any running processes, the only issue is memory limit. BTW any reason not to spin an agent on a dedicated machine? What is the gpu used for in the ckearml server machine?
Why can we even change the pip version in the clearml.conf?
LOL mistakes learned the hard way 🙂
Basically too many times in the past pip versions were a bit broken, which is fine if they are used manually and users can reinstall a diff version, but horrible when you have an automated process like the agent, so we added a "freeze version" option, only with greater control. Make sense ?
@<1523702932069945344:profile|CheerfulGorilla72> use the following bucket name when you are configuring your files/output uri
s3://<iphere>:<porthere>/<bucket_here>
From there everything should work as expected
I see, by default it will look for requirements.txt in the root of the repo (the actual repo).
That said in code you can specify the requirements .txt:Task.force_requirements_env_freeze(requirements_file='repo/project-a/requirements.txt') task = Task.init(...)
Notice, you need to call it prior to the Task.init call
Damn, okay I'll make sure we fix the order.
Could you verify the ~= works as intended (if the order id correct)
Thanks ContemplativePuppy11 !
How would you pass data/args between one step of the pipeline to another ?
Or are you saying the pipeline class itself stores all the components ?
SubstantialElk6 feel free to tweet them on their very inaccurate comparison table 🙂
print(requests.get(url='
print(requests.get(url='
ColorfulBeetle67 you might need to configure use_credentials_chain
see here:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/docs/clearml.conf#L85
Regrading the Token, I did not find any reference to "AWS_SESSION_TOKEN" in the clearml code, my guess it is used internally by boto?!
When you set the pod make sure you mount the clearml local cache folder to the PV
basically /root/.clearml/cache/
Not intentional! When I launched the AMI it was running an older version
I think this is exactly the reason they decided to change the location 🙂 so you will have to manually upgrade, reasoning is we changed directory names (maybe a few more things)
Yes shutdown the current docker copse curl the new docker compose rename folder spin it up againFull instructions here:
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_aws_ec2_ami.html#upgrading