Reputation
Badges 1
25 × Eureka!Parent makes sense if you are changing the data of the parent version, but some data is preserved. Which will make the delta-based storage only store the diff.
If everything is different, and you call sync
for example, then it will not reference any previous "snapshot", so there will be no redundancy in storage, but you still get a pointer to the "parent" version.
Make sense ?
Hi PunyGoose16 ,
next release includes it (eta after this weekend 😉 )
SubstantialElk6
Hmm do you have torch in the "installed packages" section of the Task ?
(This what the agent is using to setup the environment inside the docker, running as a pod)
HappyLion37 did you check the https://github.com/allegroai/trains/tree/master/examples/services/hyper-parameter-optimization ?
You can very quickly get it distributed as well
Hi @<1569496075083976704:profile|SweetShells3>
Are you using the standard docker-compose ? are using the default elastic container ?
What exactly changed ?
BTW: I think it was fixed in the latest trains package as well as the cleaml package
Hi SubstantialElk6
We can't seem to find a way for the end user to pass in their git credentials when they run their codes in both agent and non-agent scenarios. Any advice here?
The bottom line is the agent needs to have read-only access to all the repositories so it can launch any Task. I would recommend to create an agent git user with read-only credentials and configure the agent to use it. wdyt?
So could it be that pip install --no-deps .
is the missing issue ?
what happens if you add to the installed packages "/opt/keras-hannd" ?
I guess this is doable:
You can get the entire set of scalars like as pandas DF: https://www.tensorflow.org/tensorboard/dataframe_api
(another example: https://stackoverflow.com/a/45899735 )
Then iterate over the different runs and create + report scalars)
` from clearml import Task
for run in runs:
task = Task.create_task(...)
logger = task.get_logger()
not real code, just example:
w_times, step_nums, vals = zip(*event_acc.Scalars('Accuracy'))
for step, val in zip(step_nums...
f I log 20 scalars every 2000 training steps and train for 1 million steps (which is not that big an experiment), that's already 10k API calls...
They are batched together, so at least in theory if this is fast you should not get to 10K so fast, But a Very good point
Oh nice! Is that for all logged values? How will that count against the API call budget?
Basically this is the "auto flush" it will flash (and batch) all the logs in 30sec period, and yes this is for all the logs (...
So you want to launch the second step before the task that is uploading artifact is completed, but after the artifacts are uploaded?
PlainSquid19 yes the link is available on in the actual paid product 😞
I don't think they have the documentation open yet...
My recommendation is to fill the contact us form, you'll get a free online tour as well 😉
If this is GitHub/GitLab/Bitbucket what I'm thinking is just a link opening an iframe / tab with the exact entry point script / commit.
What do you think?
Hi WickedGoat98
A few background notions:
Docker do not store their state, so if you install something inside a docker, the moment you leave, it is gone, and the next time you start the same docker you start from the same initial setup. (This is a great feature of Dockers) It seems the docker you are using is missing wget. You could build a new docker (see the Docker website for more details on how to use a Dockerfile). The way trains-agent works in dockers is it installs everything you ne...
EnviousStarfish54 whats your matplotlib version ?
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
Thanks EnviousStarfish54 we are working on moving them there!
BTW, in the mean time, please feel free to open GitHub issue under train, at least until they are moved (hopefully end of Sept).
Hi ThankfulOwl72 checkout TrainsJob
object. It should essentially do what you need:
https://github.com/allegroai/trains/blob/master/trains/automation/job.py#L14
HandsomeCrow5 check the latest RC, I just run the same code and it worked 🙂
Okay verified, it won't work with the demo server. give me a minute 🙂
That said, it might be different backend, I'll test with the demoserver
metric=image is the name in the dropdown of the denugimages
QuaintJellyfish58 this is very odd, and the "undefined" is always marked as example?
Sure GiddyTurkey39 , Checkout the cleanup service:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
Hi YummyMoth34 they will keep on trying to send reports.
I think they try for at least several hours.
The upload itself is in the background.
It should not take long to prepare the plot for sending. Are you experiencing a major delay ?
Hi @<1546303293918023680:profile|MiniatureRobin9>
Im not sure to understand the difference between a worker and an agent.
hmm we should probably make that clearer 🙂
agent = the clearml-agent instance running on the machine
worker is the system term representing the instance of the agent
You can have one machine with multiple agents (i.e. multiple workers) running on it.
Does that make sense ?
JitteryCoyote63 There is a basic elastic license that should always be there. If for some reason it was deleted/expired then the following command should fix it:
curl -XPOST ' http://localhost:9200/_xpack/license/start_basic '