
Reputation
Badges 1
151 × Eureka!Sorry for late reply AgitatedDove14
The code that init Task is put inside the first node. https://github.com/noklam/allegro_test/blob/6be26323c7d4f3d7e510e19601b34cde220beb90/src/allegro_test/pipelines/data_engineering/nodes.py#L51-L52
repo: https://github.com/noklam/allegro_test
commit: https://github.com/noklam/allegro_test/commit/6be26323c7d4f3d7e51...
Now my problem is clearml-agent pick up the job but fail to run the docker.
Not sure why my elasticsearch & mongodb crashed. I have to remove and recreate all the dockers. Then clearml-agent works fine too
hmmm... you mention plt.show() or plt.savefig() will both trigger Trains to log it.
plt.savefig does not trigger logging for me. Only plt.show() does. If you run plt.show() in a python script, it will pop out a new window for matplotlib object and block the entire program, unless you manually close it.
(On Window Machine at least)
I have tried adding the line to conf but seems not working as well... are u able to run with proper logging?
Also I am unclear what is the difference of storageManager and StorageHelper, is there an example that integrate that with model training.
I go through the doc and seems it doesn't mention downloading from artifact (programatically)?
for the most common workflow, I may have some csv, which may be updated from time to time
The "incremental" config seems does not work well if I add handlers in the config. This snippets will fail with the incremental
flag.
` import logging
from clearml import Task
conf_logging = {
"version": 1,
"incremental": True,
"formatters": {
"simple": {"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": "INFO",
"formatter": "s...
I can confirm this seems to fix this issue, and I have reported this issue to kedro
team see what's their view on this. So it seems like it did remove the TaskHandler
from the _handler_lists
I am running on Window 10 Machine, is this not compatible?
Yes, i did use foreground.
I tested in a older "trains" server, it will show up log like this if no job is pick up. While my new "clearml-agent" shows nothing
No tasks in queue bb1bb1673f224fc98bbc8f86779be802
No tasks in Queues, sleeping for 5.0 seconds
AgitatedDove14 Thanks! This seems to be a more elegant solution
I mean, once I add environment variable, can trains.conf overwrite it? I am guessing environment variable will have a higher hierarchy.
The things that I want to achieve is:
Block user to access to public server If they configure trains.conf, then it's fine
import os os.environ["TRAINS_API_HOST"] = "YOUR API HOST " os.environ["TRAINS_WEB_HOST"] = "YOUR WEB HOST " os.environ["TRAINS_FILES_HOST"] = "YOUR FILES HOST "
I need this as I want to write a wrapper for internal use.
I need to block the default behavior that link to public server automatically when user has no configuration file.
I don't think it is running in subprocess, stdout/stderr is output in terminal. If I use print() it actually logged, but the logger info is missing.
SuccessfulKoala55 Where can I find related documentation? I am not aware that I can configure this, I would like to create user myself.
Digest: sha256:407714e5459e82157f7c64e95bf2d6ececa751cca983fdc94cb797d9adccbb2f Status: Downloaded newer image for nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
I am abusing the "hyperparameters" to have a "summary" dictionary to store my key metrics, due to the nicer behaviour of diff-ing across experiments.
I am not sure what are those example/1/2/3 are, I only have one chart
This will cause a redundant Trains session, I guess.