Reputation
Badges 1
25 × Eureka!command line π
cmd.exe / bash
Hi SmallGiraffe94
I think it now has to be a semantic version (like pyhton packages for example)
This is so that the auto version increment can bump to the next one automatically.
Maybe adding the date as a tag would make sense? what do you think?
Or maybe in the description field
well from 2 to 30sec is a factor of 15, I think this is a good start π
trains[azure] give you the possibility to do the following:from trains import StorageManager my_local_cached_file = StorageManager.get_local_copy('azure://bucket/folder/file.bin')This means you do not have to manually download stuff/ and maintain the cache local cache, the StorageManager will do that for you.
If you do no need that ability, no need to install the trains[azure] you can just install trains
Unfortunately, we haven't had the time to upgrade to the Azure storage v...
MagnificentSeaurchin79 YEY!!!!
Very cool!
Do you feel like making it public, I have the feeling a lot of people will appreciate it, this is very useful π
Hmm StrangePelican34
Can you verify you call Task.init before TB is created ? (basically at the start of everything)
In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks
I think I know what the issue is, it seems tqdm is using Unicode for the CR this is the 1b 5b 41 sequence I see on the binary log.
Let me see if I can hack something for you to test π
Basically two options, spin the clearml-k8s-glue, as a k8s service.
This service takes clearml jobs and creates k8s job on your cluster.
The second option is to spin agents inside pods statically, then inside the pods the agent work in venv model.
I know the enterprise edition has more sophisticated k8s integration where the glue also retains the clearml scheduling capabilities.
https://github.com/allegroai/clearml-agent/#kubernetes-integration-optional
AbruptHedgehog21 what exactly do you store as a Mode file ? is this a python object pickled ?
BroadMole98 Awesome, can't wait for your findings π
ScaryKoala63
When it fails what's the number of files you have in:/home/developer/.clearml/cache/storage_manager/global/ ?
OddShrimp85
the Task id is UUID that is generated by the backend server, there is no real way to force it to have a specific value π
BattyLion34
if I simply clone nntraining stage and run it in default queue - everything goes fine.
When you compare the Task you clone manually and the Task created by the pipeline , what's the difference ?
Yes MuddySquid7 it is automatically detects it (regardless of you uploading DF as an artifact).
How are you saving the dataframe ?
(it will auto log any joblib.save call, is that it?)
Hi CooperativeFly2
is it possible to create multiple train-agent per gpu
Yes you can, that said memory cannot be actually shared between GPU processes (GPU time is obviously shared) so you have to be careful with the Tasks actually being executed in parallel.
For instance:TRAINS_WORKER_NAME=host_a trains-agent daemon --gpus 0 --queue default TRAINS_WORKER_NAME=host_b trains-agent daemon --gpus 0 --queue default
Correct the serving Task ID is the clearml serving session. It is the instance that holds all the information of this specific setup and models
I simplified the code, just so I could test it, this one seems to work, feel free to add the missing argparser parts :)
` from argparse import ArgumentParser
from trains import Task
model_snapshots_path = 'mnt/trains'
task = Task.init(project_name='examples', task_name='test argparser', output_uri=model_snapshots_path)
logger = task.get_logger()
def main(args):
print('Got args: %s' % args)
if name == 'main':
parent_parser = ArgumentParser(add_help=False)
parent_parser....
Hi LudicrousParrot69
Not sure I follow, is this pyfunc running remotely ?
Or are you looking for interfacing with previously executed Tasks ?
Are you also adding those metrics to the experiment table as extra columns ?
This is very odd ... let me check something
Your git execution needs this file, just like your machine does, to know where the server is and how to authenticate. You have to Manually pass it to your git action.
but maybe hyperparam aborts in those cases?
from the hyperparam perspective it will be trying to optimize the global minimum, basically "ignoring" the last value reported. Does that make sense ?
I understand I can change the docker image for a component in the pipeline, but for the
it isnβt possible.
you can always to Task.current_task.connect() from the pipeline function itself, to connect more configuration arguments you basically add via the function itself, all the pipeline logic function arguments become pipeline arguments, it's kind of neat π regrading docker, the idea is that you use a very basic python docker (the default for services) queue for all...
Hi DisgustedDove53
Unfortunately SSO in general is not part of the open-source (the integration is way to complex and will cause too many security issues).
On the paid tier there is full SSO integration including SAML. I'm pretty sure it also has a permission system on-top so you can control visibility / access inside the clearml platform.
First I would check the CLI command it will basically prefill it for you:
https://clear.ml/docs/latest/docs/apps/clearml_task
Specifically to your question, working directory "." is the root of the git repo
But I would avoid adding it manually, use the CLI, it will either use ask you to provide info or take the git repo details from the local copy
Since this fix is all about synchronizing different processes, we wanted to be extra careful with the release. That said I think that what we have now should be quite stable. Plan is to have the RC available right after the weekend.
