Reputation
Badges 1
25 × Eureka!ScantMoth28 where are you seeing this warning ?
Hi SubstantialElk6
You are uploading an artifact, a good use case for numpy artifact would be a feature table.
If you want to upload an image use either report_media or report_image or upload PIL image as artifact.
What do you think?
Can you post them, I think there is something there that prevents the update (i.e. pip related).
For example:packagename @ git+https:///....Will be translated by pip to:
If packagename is installed do nothing, if it is not installed use git+https://... to install it
UnevenDolphin73 I have a suspicion we have a few terms mixed:
hyperparameters :
These are essentially key/value.
when you call Task. connect (dict_with_params), clearml will flatten the dict and you end up with key/value
configuration objects :
These are actually blobs of text, the UI will show as is
When you call my_local_file=Task. connect_configuration (name, "path/to/config/file")
The entire Content of the config file is stored on the Task object itself.
Back to the use case, instead ...
Are they ephemeral or later used by other Tasks, execution etc ?
For example: configuration files, they are specific for an execution, and someone will edit them.
Initial weights files, are something that multiple execution might needs them, and they will be used to restore an execution. Data, even if changing, is usually used by multiple executions tasks etc.
It seems like you treat these files as "configurations", is that right ?
I would like to bypass this behavior because my code has a need for a specific version of PyTorch.
DilapidatedCow43 you will get exactly the pytorch version you need, but complied to the CUDA version that is installed (pytorch people actually maintain multiple versions based on different cuda versions)
Hi PungentLouse55
it depends on the trains-server version you are running.
If the trains-server >= 0.16 then you have to add "Args/" prefix. If you are running an older version, then you should not add any prefix.
Hi GrotesqueOctopus42 ,
BTW: is it better to post the long error message on a reply to avoid polluting the channel?
Yes, that is appreciated π
Basically logs in the thread of the initial message.
To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)
Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!
and what is --storage s3//:inference ?
if you are using minio it should be something like None
Notice you have to specify the IP:port otherwise it thinks it is an AWS endpoint
Just making sure I understand, basically same ArgParser support we already have, but for python-fire (which is the ability to automatically log the arguments, and then change them when executed by trains-agent), correct?
If this is the case, are you familiar with the implementation of python-fire ? What I'm looking for is where exactly the parsing happens, so we could patch it, and log/override values
I guess I would need to put this in the extra_vm_bash_script param of the auto-scaler, but it will reboot in loop right? Isnβt there an easier way to achieve that?
You can edit the extra_vm_bash_script which means the next time the instance is booted you will have the bash script executed,
In the meantime, you can ssh to the running instance and change the ulimit manually, wdyt?
GreasyPenguin14 makes total sense.
In that case I would say variants to the accuracy make sense to me, I would suggest:title='trains', series='accuracy/day' and title='trains', series='accuracy/night'
Regrading hierarchy, from the implementation perspective a unique identifier is always the combination of title/series (or in other words metric/variant), introducing another level is a system wide change.
This means it might be more challenging than expected ...
SubstantialElk6 if you call Task.init with continue_last_task=<task_id> it will automatically add the last_iteration of the previous run, to any logging/report so you never overwrite the previous reports π
PompousParrot44
you can always manually store/load models, example: https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/examples/reporting/model_config.py#L35 Sure, you can patch any frame work with something similar to what we do in xgboost, any such PR will be greatly appreciated! https://github.com/allegroai/trains/blob/master/trains/binding/frameworks/xgboost_bind.py
Hi CooperativeFox72 trains 0.16 is out, did it solve this issue? (btw: you can upgrade trains to 0.16 without upgrading the trains-server)
Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)pip install clearml-agent==1.2.0rc0
- Triton server does not support saving models off to normal RAM for faster loading/unloadingCorrect, the enterprise version also does not support RAM caching
Therefore, currently, we can deploy 100 models when only 5 can be concurrently loaded, but when they are unloaded/loaded (automatically by ClearML), it will take a few seconds because it is being read from the the SSD, depending on the size.
Correct, there is also deserializing CPU time (imaging unpickling 20GB file, this takes ...
How can I ensure that additional tasks arenβt created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
SlipperyDove40 I just installed a fresh copy py3.6 and plotly on ubuntu. the entire venv dir is ~86MB
the second seems like a botocore issue :
https://github.com/boto/botocore/issues/2187
Hi AdventurousWalrus90
Thank you for the kind words! π
/home/usr_338436_ulta_com/.clearml/venvs-builds/3.7/.gitignore
so this is the error on the agent ?
JitteryCoyote63 the agent.cuda_version (or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version should actually be deprecated, and is not actually used.
For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDN...
Thanks MortifiedDove27 ! Let me see if I can reproduce it, if I understand the difference, it's the Task.init in a nested function, is that it?
BTW what's the hydra version? Python, and OS?
Hi JollyChimpanzee19
What are the versions (clearml , TF , PT), also could you add one more line from the stack (I.e. which call triggered the exception)
No worries, just found it. Thanks!
I'll make sure to followup on the GitHub issue for better visibility π
UnevenDolphin73 are you positive, is this reproducible? What are you getting?
This one should work:
` path = task.connect_configuration(path, name=name)
if task.running_locally():
my_params = read_from_path(path)
my_params = change_parmas(my_params) # change some staff
store back the change, my_params assumed to be the content of the param file (text)
task.set_configuration_object(name=name, config_taxt=my_params) `
However I'm quite confident, that plots and scalars are not visible online only when 'git diff to large to store' appears.
These should be unrelated, are you seeing console outputs ?