@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task() you can use the parts that you need in the dict and pass them to:task_overrides argument in add_step (you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)
it seems like each task is setup to run on a single pod/node based on the attributes like
gpu memory
,
os
,
num of cores,
worker
BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...
Hi FierceHamster54
Dataset is downloading multi threaded already
But yes get_local_copy() is thread / process safe
Not sure on the cause but if you do:
mp.set_start_method('fork', force=True)
There is no semaphore leakage
And having a pdf is easier/better than sharing a link to the results page ?
If you edit the requirements to have
https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl
So this is an additional config file with enterprise?
Extension to the "clearml.conf" capabilities
Is this new config file deployable via helm charts?
Yes, you can also set it company/user wide using the clearml Vault feature (again enterprise, sorry 😞 )
simply record the type of each argument when you store it, and keep it in the database, unbeknownst to the user, what do you say?
This is now supported, but then you still need to flatten the dict.
Maybe we can just support "empty_dict/new_value = 42" if the original was "empty_dict = {}"
WDYT?
so the thing with IAM roles, they are designed to allow AWS instances to get "automatic" permission (based on the IAM role). They are not actually designed to generate key/secret as I think the lifetime is be default relatively short. Since the actual request to the S3 comes from the client browser (i.e. outside of AWS cluster) the IAM role cannot apply, and you have to provide the key/secret. The easiest way is to generate S3 keys regardless of the IAM roles, to be used with the clients (sp...
How are you starting the agent?
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
Hmm yes this is exactly what should not happen 🙂
Let me check it
Seems like everything is in order. Can you curl to the API/web/files server?
JitteryCoyote63 are you suggesting it happens ?
(obviously it should not 🙂 )
... these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?
Helper functions are not "component", they are actually files that will be accessible when running the component itself.
am I missing something ?
That is a bit odd, But SSH keys have to have a specific chmod flags for them to work (security issues)
What was the error ?
CleanWhale17 what is " Online-Training Support(for Dataset Shifts" ?
Thanks!
fyi: This section is not necessary if you you have clearml.conf file in ~/Task.set_credentials( api_host=" ", web_host=" ", files_host=" ", key='********************', secret='***********************' )Let me check the code for a min
I mean to use a function decorated with
PipelineDecorator.pipeline
inside another pipeline decorated in the same way.
Ohh... so would it make sense to add "helper_functions" so that a function will be available in the step's context ?
Or maybe we need a new to support "standalone" decorator?! Currently to actually "launch" the function step, you have to call it from the "pipeline" main logic function, but, at least in theory, one could do without the Pipeline itself.....
The easiest is to pass an entire trains.conf file
Our server is deployed on a kube cluster. I'm not too clear on how Helm charts etc.
The only thing that I can think of is that something is not right the the load balancer on the server so maybe some requests coming from an instance on the cluster are blocked ...
Hmm, saying that aloud that actually could be?! Try to add the following line to the end of the clearml.conf on the machine running the agent:
api.http.default_method: "put"
Thank you WackyRabbit7 please feel free to remind me if it slips away during my night time (yes I do sleep , contrary to common belief :))
Your git execution needs this file, just like your machine does, to know where the server is and how to authenticate. You have to Manually pass it to your git action.
If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion
LOL no worries 🙂
Basically the git & python analysis can take some time (I mean it can take a minute! on a large repository)
And we wanted to make sure Task.init returns quickly (it already has to authenticate with the server that slows it down, and a few more things)
The easiest way is to have the code analysis run in the background since usually there is no interaction ...
Hi @<1523702786867335168:profile|AdventurousButterfly15>
I do not think they log more than that ?!
(what happens if you use TB?)
The -m src.train is just the entry script for the execution all the rest is be taken care by the Configuration section (whatever you pass after it will be ignored if you are using Argparse as it is auto-connects with ClearML)
Make sense ?
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the ass
No worries I totally feel you.
As a quick hack in the actual code of the Task itself, is it reasonable to have:task = Task.init(....) task.set_initial_iteration(0)