
Reputation
Badges 1
25 × Eureka!Nope - confirmed to be running on the OS's Python environment,
okay so bare metal root is definitely not recommended.
I'm not sure how/why it get's stuck though π
Any chance you can run the agent as non-root?
Also maybe preferred in docker mode, so it is easier for you to control the environment of the Task
JuicyFox94
NICE!!! this is exactly what I had in mind.
BTW: you do not need to put the default values there, basically it reads the defaults from the package itself trains-agent/trains and uses the conf file as overrides, so this section can only contain the parts that are important (like cache location credentials etc)
And where is the ClearmlLogger
comming from?
Hi @<1724960464275771392:profile|DepravedBee82>
After
Starting Task Execution:
It will literally start the process running your code,
Can you send the full log of the Task? what is the code doing? which system is running the agent (i.e. Windows/Mac/Linux docker etc)
CharmingStarfish14 can you check something from code, just to see if this would solve the issue?
Yeah I think this kind of makes sense to me, any chance you can open a GH issue on this feature request?
StraightDog31 can you elaborate? where are the parameters stored? who is trying to access them, and maybe for what purpose ?
What I'm trying to do is to filter is between two datetimes...Is that possible?
could you expand ?
You mean I can do Epoch001/ and Epoch002/ to split them into groups and make 100 limit per group?
yes then the 100 limit is per "Epoch001" and another 100 limit for "Epoch002" etc. π
Assuming git repo looks something like:.git readme.txt module | +---- script.py
The working directory should be "."
The script path should be: "-m module.scipt"
And under the Configuration/Args, you should have:args1 = value args2 = another_value
Make sense?
inΒ
Β issues a delete command to the ClearML API server,...
almost, it issues the boto S3 delete commands (directly to the S3 server, not through the cleaml-server)
And that I need to enter an AWS key/secret in the profile page of the web app here?Β (edited)
correct
What is the difference toΒ
file_history_size
Number of unique files per titles/series combination (aka how many images to store in the history, when the iteration is constantly increasing)
well cudnn is actually missing from the base image...
I see, is this what you are looking for?
https://allegro.ai/docs/task.html#trains.task.Task.init
continue_last_task='task_id'
might it be related to the docker socket not being mounted to the agent daemon running inside a docker container?
Oh yes, if the daemon is running Inside a docker container than you need both --privileged and mounting of the docker socket, to get it to work
Hi ReassuredTiger98
So let's assume we call:logger.report_image(title='training', series='sample_1', iteration=1, ...)
And we report every iteration (keeping the same title.series names). Then in the UI we could iterate back on the last 100 images (back in time) for this title / series.
We could also report a second image with:logger.report_image(title='training', series='sample_2', iteration=1, ...)
which means that for each one we will have 100 past images to review ( i.e. same ti...
Yey! okay let me make sure we add this feature to the Task.init arguments so one can control it from code π
How do you currently report images, with the Logger or Tensorboard or Matplotlib ?
GrievingTurkey78 MagnificentSeaurchin79 do you guys want to start a PR branch we cal all work on it?
Not yet π
It should not be complex to implement,
The actual aws auto scaler class is implementing just two functions:
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L104
def spin_down_worker(self, instance_id):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L...
. Would you have any suggestions about where I could look to debug? Maybe the docker logs of the web server?
Let me check, we had the same issue reported today, Let me double check with front-end people and get back to you
With the warning ?
I was able to reproduce it on the old versions, but it seems fixed on the latest from GitHub.
- yes they will! This is exactly the idea :)
- yes it will store it as text file (as is raw text) notice the return value is the file you should open. This is because when running via agent the return file will contain the conf file from the UI. Make sense?
Hmm let me check first when it is going to upgraded and if there is a workaround
Thanks MagnificentPig49 !
if project_name is None and Task.current_task() is not None: project_name = Task.current_task().get_project_name()
This should have fixed it, no?
WittyOwl57 what about? vm.max_map_count
echo "vm.max_map_count=262144" > /tmp/99-clearml.conf
sudo mv /tmp/99-clearml.conf /etc/sysctl.d/99-clearml.conf
sudo sysctl -w vm.max_map_count=262144
sudo service docker restart `https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac (5)
Seems like credentials error
Do you have everything setup correctly in your ~/clearml.conf
?