Reputation
Badges 1
25 × Eureka!Hmm if this is case, you can add some prints in here:
None
the service/action will tell you what you are sending
wdyt?
Hi PungentLouse55 ,
I think can see how these magic lines solved it, and I think you are onto something.
Any chance what happened is multiple workers were trying to simultaneously save/load the same Model ?
I can add files to the data set, even after I finish the experiment?
Correct
https://clear.ml/docs/latest/docs/clearml_data#creating-a-dataset
https://clear.ml/docs/latest/docs/guides/data%20management/data_man_cifar_classification
https://github.com/allegroai/clearml/blob/master/docs/datasets.md#create-dataset-from-code
Anyhow from your response is it safe to assume that mixing inΒ
Β code with the core ML task code has not occurred to you as something problematic to start with?
Correct π Actually we believe it makes it easier, as worst case scenario you can always run clearml in "offline" without the need for the backend, and later if needed you can import that run.
That said, regrading (3), the "mid" interaction is always the challenge, clearml will do the auto tracking/upload of the mod...
UI for some anomalous file,
Notice the metrics are not files/artifacts, just scalars/plots/console
we concluded that we don't want to run it through ClearML after all, so we ran it standalone
out of curiosity, what was the conclusion and why?
Thanks BroadSeaturtle49
I think I was able to locate the issue !=
breaks the pytroch lookup
I will make sure we fix asap and release an RC.
BTW: how come 0.13.x have No linux x64 support? and the same for 0.12.x
https://download.pytorch.org/whl/cu111/torch_stable.html
CooperativeSealion8 let me know if you managed to solve the issue, also feel free to send the entire trains-server log. I'm assuming one of the dockers failed to boot...
What's the OS running the server?
Seems like something is not working with the server, i.e. it cannot connect with one of the dockers.
May I suggest to carefully go through all the steps here, make sure nothing was missed
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md
Especially number (4)
CooperativeSealion8
when it first asks me to enter my full name
Where? in the Web?
You can do:task = Task.get_task(task_id='uuid_of_experiment')
task.get_logger().report_scalar(...)
Now the only question is who will create the initial Task, so that the others can report to it. Do you have like a "master" process ?
Yes, as long as the client is served from http://app.something.com it will look for the api server at http://api.something.com
We should probably make sure it is properly stated in the documentation...
Yes, actually the first step would be a toggle button for regexp in the search, the second will be even more advanced search.
May I suggest you post it on the UI suggestion issue https://github.com/allegroai/trains/issues/81 ?
If you one each "main" process as a single experiment, just don't call Task.init in the scheduler
logger.report_scalar(title="loss", series="train", iteration=0, value=100)
logger.report_scalar(title="loss", series="test", iteration=0, value=200)
You can always click on the name of the series and remove it for display.
Why would you need three graphs?
I'm sorry wrong line reference:
I'm assuming the error is due to ulimit missing:
try adding 16777216 to both soft/hard ulimit
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L58
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
Hi AdventurousRabbit79
Try:"extra_clearml_conf" : "aws { s3 {key: A, secret : B, region: C, }} ",
Generally speaking no need for the quotes on the secret/key
You also need the comma to separate between keys.
You can test if it is working by adding the same string to your local clearml.conf and importing the cleaml package
Hi ShallowArcticwolf27
First of all:
If the answer to number 2 is no, I'd loveee to write a plugin.
Always appreciated β€
Now actually answering the Q:
Any torch.save (or any other framework save) will either register or automatically upload, the file (or folder) in the system. If this is a folder it will be zipped and uploaded, if a file just uploaded to to the assigned storage output (the cleaml-server, any object storage service, or shared folder). I'm not actually sure I...
I assume now it downloads "more" data as this is running in parallel (and yes I assume that before it deleted the files it did not need)
But actually, at east from a first glance, I do not think it should download it at all...
Could it be that the "run_model_path" is a "complex" object of a sort, and it needs to test the values inside ?
FrustratingWalrus87 Unfortunately TB's TSNE is not automatically captured by ClearML (Scalars, histograms etc. are)
That said, matplotlib will be automatically captured do you can run your own PCA/tSNE and use matplotlib to visualize (ClearML will capture it).
The same applies for plotly.
What do you think?