Reputation
Badges 1
25 × Eureka!But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
that depends, how would the sibling packages get to a remote machine ?
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
EFS get downloaded to the k8 pod local volume?
EFS is an Amazon service that mounts a persistent FS into ec2 instances, I believe they have support for k8s as a service as well, which would make it kind of like a PV only as a service.
Does that make sense ?
So what will you query ?
Any chance you can open a GitHub issue so we do not forget this feature ?
it means it should work in
~/clearml.conf
no?
Yes exactly
I was hoping to be able to set the default server-wide
I think this type of server-side wide defaults is not supported in the open-source version.
But in most cases, setting it up on the clearml-agents is probably the important thing. btw: you can also set it in an OS environment CLEARML_DEFAULT_OUTPUT_URI
but I'd prefer to have a new instance deployed for each new experiment and that it also terminates when no new experiments are queued
I'm not objecting, just wondered on the rational behind the decision π
Back to the AWS autoscaler:
Basically if you have the services-agent running on your cluster, it will just run the aws-autoscaler for you π
The idea of the service-agent is to run logic/monitoring Tasks suck as the aws autoscaler. Notice that service-mode means multiple job per...
I mean to reduce the API calls without reducing the scalars that are logged, e.g. by sending less frequent batched updates.
Understood,
In my current trials I am using up the API calls very quickly though.
Why would that happen?
The logging is already batched (meaning 1API for a bunch of stuff)
Could it be lots of console lines?
BTW you can set the flush period to 30 sec, which would automatically collectt and batch API calls
https://github.com/allegroai/clearml/blob/25df5efe7...
When exactly are you getting this error ?
JitteryCoyote63 are you calling to:my_task.output_uri = "
s3://my-bucket
in the code itself ?
Why not with Task.init output_uri=...
Also this is running remotely there is no need fo r that, use the Execution -> Output -> Destination and put it there, it will do everything for you π
Hi @<1523702652678967296:profile|DeliciousKoala34>
What's the clearml-server version you are working with?
Can you check with the latest RC?
pip3 install clearml==1.9.2rc2
Sure thing :)
BTW could you maybe PR this argument (marked out) so that we know for next time?
Hi @<1523702786867335168:profile|AdventurousButterfly15>
Make sure you pass output_uri=true in Task.init
It will automatically upload your model to the file server. You can also configure it in the clearml.conf, look for defualt_output_uri
Hi @<1551376687504035840:profile|StraightSealion9>
AWS Autoscaler to create a new instance when you enqueue a task to the relevant queue.
Does that mean that you were able to enqueue a Task and have it launch on the remote EC2 machine ?
- Maybe we should add an option, archive components as well ...
repeat it until they are all dead π
Hi @<1523701504827985920:profile|SubstantialElk6>
I would split the first stage into two. The first one passing data to the others, the second as "monitoring ", Wdyt?
Notice the parents
argument when creating a new Dataset
The main issue is the model itself is stored on your files server that is/was configured to " None " this means that you cannot access it from anywhere other than tha actual machine (i.e. inside a container this is not accessible).
Change your configuration (i.e. clearml.conf) files_server: http://<Local_IP>:8081
Then rerun the example (importantly, re run the training so a new model will be generated and registered under the new address, with the IP). should work...
Think I will have to fork and play around with itΒ
NICE! (BTW: if you manage to get it working I'll be more than happy to help push the PR)
Maybe the quickest win is to store just the .py as model ?
Well I guess you can say this is definitely not self explanatory line π
but, it is actually asking whether we should extract the code, think of it as:if extract_archive and cached_file: return cls._extract_to_cache(cached_file, name)
Thanks MortifiedDove27 ! Let me see if I can reproduce it, if I understand the difference, it's the Task.init in a nested function, is that it?
BTW what's the hydra version? Python, and OS?
just to check. Does the k8s glue install torch by default?
SubstantialElk6 what do you mean the glue installs torch ?
The glue will take a Task from the queue create a k8s job (basically use the same docker and inside the docker run get the agent to execute the requested Task). Where would the "torch" come into play?
Hi GiganticTurtle0
ClearML will only list the directly imported packaged (not their requirements), meaning in your case it will only list "tf_funcs" (which you imported).
But I do not think there is a package named "tf_funcs" right ?
Oh yes, you probably have sorting or filter applies there :)
Hi UnevenDolphin73
I cannot initialize a task before loading the file, but the docs for
connect_configuration
Yes, that's basically the problem. you have to decide where is the main driver.
If you are executing the code "manually" (i.e. not via the agent) then there is no problem, obviously you have the local file and you can use it to load the "project name" etc, then you just call Task.connect_configuration to log the content.
If you are running the same code via the agent...
HealthyStarfish45 this sounds very cool! How can I help?