
Reputation
Badges 1
25 × Eureka!Hmm I wonder, can you try with this line before?Task._report_subprocess_enabled = False frameworks = { 'tensorboard': True, 'pytorch': False } Task.init(...)
We’d be using https in production
Nice 🙂
@<1687653458951278592:profile|StrangeStork48> , I was reading this thread trying to understand what exactly is the security concern/fear here, and I'm not sure I fully understand. Any chance you can elaborate ?
How does
deferred_init
affect the process?
It ders all the networking and stuff in the background (usually the part that might slow the Task initialization process)
Also, is there a way of specifying a blacklist instead of a whitelist of features?
BurlyPig26 you can while list per framework and file name, exampletask = Task.init(..., auto_connect_frameworks={'pytorch' : '*.pt', 'tensorflow': ['*.h5', '*.hdf5']} )
What am I missing ?
RobustRat47 I think you have to use the latest clearml package for that (1.6.0)
Or can it also be right after
Task.init()
?
That would work as well 🙂
sounds good, CheerfulGorilla72 could I ask you to open a github issue and suggest it? just so we do not forget ?
Programmatically before , importing the package, set os.environ['TRAINS_CONFIG_FILE']='~/my_new_trains.conf'
BTW: What's the use case for doing so?
thanks for helping again
My pleasure :)
But from your other answer, I think I'm understanding that you
can
have multiple agents on a single instance listening to the same queue.
Correct
So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.
Correct (that said I do not understand how come a single Task does not utilize the CPU, I was under the impression it is run...
Hmm SuccessfulKoala55 any chance the nginx http was pushed to v1.1 on the latest cloud helm chart?
WackyRabbit7 this is funny, it is not ClearML providing this offering
some generic company grabbed the open-source and put t there, which they should not 🙂
the first runs perfectly fine,
Just making sure, running in an agent?
the second crashes
Running inside the same container as the first one ?
If this is the case why not have the stream process call the rest api, then move forward with the result? This way it scales out of the box, the main "conceptual" difference is that the restapi is used internally, and the upside is the event streaming processing becomes part of the application layer, not tied with the compute cost of the model , wdyt?
It manages the scheduling process, so no need to package your code, or worry about building dockers etc. It also has an AWS autoscaler, that spins ec2 instances based on the amount of jobs you have in the execution queue, and the limit of your budget (obviously spinning down machines that are idle)
By default SSH server is not running in a lot of scenarios (k8s for example, Windows, MacOS)...
Also SoreDragonfly16 could you test with if the issue exists with trains==0.16.2rc0
?
Are you sure you added the pytorch channel in clearml.conf ?
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L64
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one
also in the global pipeline script (not just inside the function)
Getting the last checkpoint can be done via.
Task.get_task(task_id='aabbcc').models['output'][-1]
DepressedChimpanzee34 <character> will almost always be converted into \ because otherwise it will not support \t or \n etc.
What I'm looking here is some logic that will allow us not to break backwards compatibility on the one hand, but still will allow you to have something like "first\second" entry.
WDYT? any ideas? (I really want to make sure we fix it as soon as possible)
I simplified the code, just so I could test it, this one seems to work, feel free to add the missing argparser parts :)
` from argparse import ArgumentParser
from trains import Task
model_snapshots_path = 'mnt/trains'
task = Task.init(project_name='examples', task_name='test argparser', output_uri=model_snapshots_path)
logger = task.get_logger()
def main(args):
print('Got args: %s' % args)
if name == 'main':
parent_parser = ArgumentParser(add_help=False)
parent_parser....
ClumsyElephant70
Can you manually run the same command ?['python3.6', '-m', 'virtualenv', '/home/user/.clearml/venvs-builds/3.6']
Basically:python3.6 -m virtualenv /home/user/.clearml/venvs-builds/3.6'
orpip install -U trains
My bad I wrote refresh and then edited it to the correct "reload" 😞
My apologies, let me rephrase:
if you are using pip ans package manager and not running in docker-mode, trains-agent
cannot touch the cuda/cuddn drivers (actually .so) library.
If you want to verify you can check echo $LD_LIBRARY_PATH
Yeah, but I still need to update the links in the clearml server
yes... how many are we talking about here?
But every agent is a different pod so I do not know how properly share the folder with images.
Can I conclude Kubernetes running the agents ?