
Reputation
Badges 1
25 × Eureka!Hi SlimyRat21 :
Tool that will help me track and manage the different configs and simulation logs across different runs and versions of the simulation.
Definitely covered by Trains, it also does that with very little code changes (if any) to your current code base
Tool that will help me gather and compare the results from specific simulation runs
Same as above π
Do you you have an experience or tips on using trains for non-ML before investing time into this and seeing...
I think your use case is the original idea behind "use_current_task" option, it was basically designed to connect code that creates the Dataset together with the dataset itself.
I think the only caveat in the current implementation is that it should "move" the current Task into the dataset project / set the name. wdyt?
Hi Team, I'm currently trying to install ClearML-Server on a Powerpc server with RedHat7.
You are a brave man LividCrab90 !
s there dockerfiles for the ClearML-Server stack somewhere ?
The main issue is replacing the DB containers, do you have elastic/mongo/redis for powerpc ?
Hi ShallowArcticwolf27
First of all:
If the answer to number 2 is no, I'd loveee to write a plugin.
Always appreciated β€
Now actually answering the Q:
Any torch.save (or any other framework save) will either register or automatically upload, the file (or folder) in the system. If this is a folder it will be zipped and uploaded, if a file just uploaded to to the assigned storage output (the cleaml-server, any object storage service, or shared folder). I'm not actually sure I...
Notice you have configure the shared driver for the docker, as the volume mount doesn't work without it. https://stackoverflow.com/a/61850413
Can you clone the git with the .ssh credentials on the host machine ?
If so, can you do the same manually inside a docker (i.e. spin a docker with mount -v /home/hostuser/.ssh:/root/.ssh) ?
Thanks @<1569496075083976704:profile|SweetShells3> ! let me see if I can reproduce the issue
WickedGoat98 I suspect the main difference is with GitHub your are cloning with https (i.e. not credentials needed) , but with gitlab you are using SSH authentication to clone the repository .If on the machine running the trains-agent
you can "git clone" your repository (i.e. from command line), the trains-agent should be able to do the same (basically make sure you have the SSH keys in your ~/.ssh folder.
Are you testing the trains-agent service from (i.e. from the docker compose) o...
Hi JitteryCoyote63 ,
The easiest would probably be to list the experiment folder, and delete its content.
I might be missing a few things but the general gist should be:from trains.storage import StorageHelper h = StorageHelper('s3://my_bucket') files = h.list(prefix='s3://my_bucket/task_project/task_name.task_id') for f in files: h.delete(f)
Obviously you should have the right credentials π
HealthyStarfish45 the pycharm plugin is mainly for remote debugging, you can of course use it for local debugging but the value is just to be able to configure your user credentials and trains-server.
In remote debbugging, it will make sure the correct git repo/diff are stored alongside the experiment (this is due to the fact that pycharm will no sync the .git folder to the remote machine, so without the plugin Trains will not know the git repo etc.)
Is that helpful ?
ShallowCat10 Thank you for the kind words π
so I'll be able to compare the two experiments over time. Is this possible?
You mean like match the loss based on "images seen" ?
Hm, one of the issues I have with this change is that now every dataset hat doesnβt have a semantic version cannot be loaded anymore
Okay we definitely need to solve that.
Any chance I can ask to open a github issue (just so we do not forget).
I will pass it quickly along so that we can maybe offer a fix in the next RC
I had no idea it was going to do that and sent your servers over 1.4M API hits unintentionally
Yeah, that is way too much, I think relates to the frequency it updates the console π
EnviousStarfish54
it seems that if I don't use plt.show() it won't show up in Allegro, is this a must?
Yes , at plt.show / plt.save Trains will capture the plot and send it to the backend.
BTW: when you hover over the empty plot area, do you see the plotly objects, or is it all blank ?
Correct (basically pip freeze results)
Hi @<1572395181150310400:profile|DeterminedHare56>
Yes Slack is not the best for knowledge sharing, but it is the easiest for users to communicate over, and it is the easiest to setup and scale.
Specifically you can find historical log of the Slack channel here: None
Which we hoped google will index, but seems like this is still not working as expected, if you have any inputs it will be great to improve it
Hi SpicyOtter88plt.plot([0, 1], [0, 1], 'r--', label='')
ti cannot have a legend without a label, so it gives it "anonymous" label, I think it should just get "unlabeled 0" wdyt?
CleanPigeon16 , just making sure, docker is installed and configured on the host machine (i.e. Azure machine)?
Hi CooperativeFox72
I think the upload reporting (files over 5mb) was added post 0.17 version, hence the log.
The default is upload chunk reporting is 5MB, but it is not configurable, maybe we should add it to the clearml.conf ? wdyt?
@<1523701868901961728:profile|ReassuredTiger98> thank you so much for testing it!
Oh!
I see this is using the colab as remote agent (i.e. to launch jobs on it),
[ColabKernelApp] CRITICAL | Bad config encountered during initialization: The 'kernel_class' trait of <main.ColabKernelApp object at 0x7fa41b29e5c0> instance must be a type, but 'google.colab._kernel.Kernel' could not be imported
Can you send the full log?
FrothyShark37 what was different in your script ?
Thanks FrothyShark37
I just verified, this would work as well, I suspect what was missing is the plt.show
call, this is the actual call that triggers clearml
Hi JitteryCoyote63
The easiest is to inherit the ResourceMonitor class and change the default logging rate (you could also disable some of the metrics).
https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/task.py#L565
Then pass the new class to Task.init as auto_resource_monitoring
Can you post here the actual line? seems like we can fix it to also support this scenario (if we could test it)
Hi FrothyShark37
Can you verify with the latest version?
pip install -U clearml