Reputation
Badges 1
25 × Eureka!Please send the full log, I just tested it here, and it seems to be working
Hmm what do you have here?
os.system("cat /var/log/studio/kernel_gateway.log")
@<1541954607595393024:profile|BattyCrocodile47> first let me say I β€ the dark theme you have going on there, we should definitly add that π
When I run
python set_triggers.py; python basic_task.py
, they seem to execute, b
Seems like you forgot to start the trigger, i.e.
None
(this will cause the entire script of the trigger inc...
Hi @<1523701797800120320:profile|SteadySeagull18>
...the job -> requeue it from the GUI, then a different environment is installed
The way that it works is, in the "originating" (i.e. first manual) execution only the directly imported packages are listed (no derivative packages that re required by the original packages)
But when the agent is reproducing the job, it creates a whole clean venv for the experiment, installs the required packages, then pip resolves the derivatives, and ...
GiddyTurkey39 can you ping the server-address
(just making sure, this should be the IP of the server not 'localhost')
Hi @<1544128915683938304:profile|DepravedBee6>
You mean like backup the entire instance and restore it on another machine? Or are you referring to specific data you want to migrate?
BTW if you are upgrading old versions of the server I would recommend upgrading to every version in the middle (there are some migration scripts that need to be run in a few of them)
Should work, follow the backup process, and restore into a new machine:
None
to get all the image metrics:client.events.get_task_metrics(tasks=['6adb929f66d14731bc76e3493ab89d80'], event_type='training_debug_image')
metric=image is the name in the dropdown of the denugimages
FYI:ssh -R 8080:localhost:8080 -R 8008:localhost:8008 -R 8081:localhost:8081 replace_with_username@ubuntu_ip_here
solved the issue π
Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent
) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.
Make sense ?
Hi @<1544853695869489152:profile|NonchalantOx99>
I would assume the clearml-server configuration / access key is misconfigured in your copy of example.env
JitteryCoyote63 you mean from code?
HandsomeCrow5 check the latest RC, I just run the same code and it worked π
That said, it might be different backend, I'll test with the demoserver
JitteryCoyote63 s3 should work, you can go to your profile page, see if you do not have some old credentials already there, maybe this is the issue.
Hi ApprehensiveFox95
You mean from code remove the argparse arguments ?
Or post execution in the UI?
Sure :task = Task.init(..., auto_connect_arg_parser={'arg_not_to_log': False})
This will cause all argparse to automatically be logged (and later editable) with the exception of the argument arg_not_to_log
Notice that if you have --arg-something, to exclude it add to the dict arg_something': False
now it stopped working locally as well
At least this is consistent π
How so ? Is the "main" Task still running ?
because comparing experiments using graphs is very useful. I think it is a nice to have feature.
So currently when you compare the graphs you can select the specific scalars to compare, and it Update in Real Time!
You can also bookmark the actual URL and it is fully reproducible (i.e. full state is stored)
You can also add custom columns to the experiment table (with the metrics) and sort / filter based on them, and create a summary dashboard (again like ll pages in the web app, URL is...
YEY π π
it should be fairly easy to write such a daemon
from clearml.backend_api.session.client import APIClient
client = APIClient()
timestamp = time() - 60 * 60 * 2 # last 2 hours
tasks = client.tasks.get_all(
status=["in_progress"],
only_fields=["id"],
order_by=["-last_update"],
page_size=100,
page=0,
created =[">{}".format(datetime.utcfromtimestamp(timestamp))],
)
...
references:
[None](https://clear.ml/...
I guess I would need to put this in the extra_vm_bash_script param of the auto-scaler, but it will reboot in loop right? Isnβt there an easier way to achieve that?
You can edit the extra_vm_bash_script
which means the next time the instance is booted you will have the bash script executed,
In the meantime, you can ssh to the running instance and change the ulimit manually, wdyt?
This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?
I mean to use a function decorated withΒ
PipelineDecorator.pipeline
Β inside another pipeline decorated in the same way.
Ohh... so would it make sense to add "helper_functions" so that a function will be available in the step's context ?
Or maybe we need a new to support "standalone" decorator?! Currently to actually "launch" the function step, you have to call it from the "pipeline" main logic function, but, at least in theory, one could do without the Pipeline itself.....
What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?
(torchvision vs. cuda compatibility, will work on that),
The agent will pull the correct torch based on the cuda version that is available at runtime (or configured via the clearml.conf)