Reputation
Badges 1
25 × Eureka!Hi PanickyMoth78
Hmm yes, I think the StorageManager (i.e. the google storage pythonclinet) also needs a json file with the credentials.
Let me check something
Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.
StaleButterfly40 are you sure you are getting the correct image on your TB (toy255) ?
Hover near the edge of the plot, the you should get a "bar" you can click on to resize
VivaciousPenguin66 I have the feeling it is the first space in the URI that breaks the credentials lookup.
Let's test it:from clearml import StorageManager uri = ' ` Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt'
original
StoargeManager.get_local_copy(uri)
qouted
StoargeManager.get_local_copy(uri.replace(' ', '%20')) `
You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.
Hey JitteryCoyote63 I think I need to better explain the config feature:agent.package_manager.post_packages = ["PyJWT"]Basically this means that IF you have pyjwt in the installation package it will be installed after everything else is installed.
This doesn't mean it will always be installed.
Think for example "horovod" has to be installed after you have TF / PyTorch installed.
(The same goes for "pre_package" and Cython)
QuaintJellyfish58 Notice it tries to access AWS not your minio"This seems like a bug?! can you quickly verify with previous version ?
Also notice you have to provide the minio section in the clearml.conf so it knows how to access the endpoint:
https://github.com/allegroai/clearml/blob/bd53d44d71fb85435f6ce8677fbfe74bf81f7b18/docs/clearml.conf#L113
To automate the process, we could use a pipeline, but first we need to understand the manual workflow
You need to use tf.summary.image and not summary_ops_v2.image
Fixed on main branch (see github issue), RC later today
Image needs to be in range [0, 1] and not [0, 255] (matplotlib and tensorboard can handle either one)
Is there a code to reproduce ?
Hi OddAlligator72
for instance - remove all the metrics from some step onward?Β
(I think that as long as the Task is not published you could do such a thing directly with the RestAPI (aka APIClient from python)
What's the use case?
JitteryCoyote63 did you use the new implementation:
https://github.com/allegroai/trains/blob/master/trains/automation/aws_auto_scaler.py
JitteryCoyote63 I think this only holds for the conda distribution.
(Actually quite interesting, I wonder what happens if you already installed cudatoolkit...)
Hi VexedCat68
txt file or pkl file?
If this is a string , it just stored it (not as a file, this is considered a "link")
https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/clearml/binding/artifacts.py#L521
Hi RipeGoose2
So the http://app.community.clear.ml already contains it.
Next release of the standalone server (a.k.a clearml-server) will include it as well.
I think the ETA is end of the year (i.e. 2 weeks), but I'm not sure on the exact timeframe.
Sounds good ?
you can run md5 on the file as stored in the remote storage (nfs or s3)
s3 is implementation specific (i.e. minio weka wassaby etc, might not support it) and I'm actually not sure regrading nfs (I mean you can run it, but it actually means you are reading the data, that said, nfs by definition I'm assuming is relatively fast access)
wdyt?
Yeah the doctring is always the most updated π
TrickySheep9
you are absolutely correct π
Hi OddShrimp85
If you pass 'output_uri=True' to task init, it will upload the model automatically, or as you said manually with outputmodel class
HiΒ SmoggyGoat53
There is a storage limit on the file server (basically 2GB per file limit), thisΒ is the cause of the error.
You can upload the 10GB to any S3 alike solution (or a shared folder). Just set the "output_uri" on the Task (either at Task.init or with Task.output_uri = " s3://bucket ")
It uses only one CPU core, could I use multiprocessing somehow?
Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?
ReassuredTiger98 no, but I might be missing something.
How do you mean project-specific?
I can verify the behavior, I think it has to do with the way the subparser was setup.
This was the only way for me to get it to run:script.py test blah1 blah2 blah3 42When I passed specific arguments (for example --steps) it ignored them...
Done π
Test it on your local setup (I would hate to push a broken fix)
Is that possible?
So can you verify it can download the model ?
Sure. JitteryCoyote63 so what was the problem? can we fix something?