Reputation
Badges 1
25 × Eureka!Hi @<1688721797135994880:profile|ThoughtfulPeacock83>
the configuration vault parameters of a pipeline step with the add_function_step method?
The configuration vault are a per set at execution user/project/company .
What would be the value you need to override ? and what is the use case?
MuddySquid7 the fix was pushed to GitHub, you can now install directly from the repo:pip install git+
So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)
If you use joblib.dump (which is like pickle but safer/faster) it will be auto logged
https://github.com/allegroai/clearml/blob/4945182fa449f8de58f2fc6d380918075eec5bcf/examples/frameworks/scikit-learn/sklearn_joblib_example.py#L28
Hi PompousParrot44
You can check the cleanup service example.
It sleeps for 24 hours then spins up and does its thing.
You can always launch this service tasks on the services queue, its purpose is to run those services on the trains-server as additional CPU services. They will also be registered as service nodes, so you have visibility into which service is running.
In order to clone a task and wait for its completion.
Use the TrainsJob
https://github.com/allegroai/trains/blob/65a4a...
Optional[Sequence[Union[str, Dataset]]]
None, list of string or list of Datasets objects
(each one is a parent (supporting multiple parents)
ReassuredTiger98 in theory it should work, do you know what is actually stored ? (I mean reencoding it means you have to have opencv / ffmpeg which might be too much to ask)
Martin I told you I can't access the resources in the cluster unfortunately
π
so it seems there is some misconfiguration of the k8s glue, because we can see it can "talk" to the clearml-server, but it seems it fails to actually create the k8s pod/job. I would start with debugging the k8s glue (not the services agents). Regardless, I think the next step is to get a log of the k8s glue pod, and better understand the issue.
wdyt?
Maybe you should makeΒ
naming_function
Β as public variable inΒ
SearchStrategy
Β class or allow changing it inΒ
HyperParameterOptimizer
Β class?
I like this idea, let's do that
Just making sure, you hit the 1024 character limit on S3 path?
If this is the case we should also fix the "artifact naming" to take that into account (it already does and has a limit, see here:
https://github.com/allegroai/clearml/blob/24464b7c1019f7a7b3149ecb80a379...
Hi UnevenDolphin73
Does ClearML somehow
remove
any loggers from
logging
module? We suddenly noticed that we have some handlers missing when running in ClearML
I believe it adds a logger, it should not remove any loggers,
What's the clearml version you are using ?
We suddenly have a need to setup our logging after every
task.close()
Hmm that gives me a handle on things, any chance it is easily reproducible ?
Hi @<1598487094601191424:profile|MysteriousCow84>
You should put it in the dedicated section:
None
The log is missing, but the Kedro logger is print toΒ sys.stdout in my local terminal.
I think the issue night be it starts a new subprocess, and that subprocess is not "patched" to capture the console output.
That said if an agent is running the entire pipeline, then everything is logged from the outside, so whatever is written to stdout/stderr is captured.
ElegantCoyote26 what you are after is:docker run -v ~/clearml.conf:/root/clearml.conf -p 9501:8085
Notice the internal port (i.e. inside the docker is 8080, but the external one is changed to 9501)
ngrok to connect to the remote server at the office?
That makes sense, I guess this is the equivalent of using a VPN, from that point onward clearml-session can directly access the remote machine, right?
ElegantCoyote26 I don't think Keras logs it anywhere unless you have TB, so nowhere to take the data from...
In short, yes you have to have TB :)
ElegantCoyote26 point me to where Keras stores the data π
If in the process of integration you had to add a logger/callback to your Keras code, that is the equivalent of using the TB.
ElegantCoyote26parser = get_parser() args_ = vars(parser.parse_args()) task.connect(args_)
There is no need to connect args_
Task.init will automatically catch the argparser.
Hi ElegantCoyote26
sometimes the agents load an earlier version of one of my libraries.
I'm assuming some internal package that is installed from a wheel file not a direct git repo+commit link ?
great π
two things:
I'm not sure argparse supports dict as a type (I mean it will take anything but I'm not sure it will parse your arguments as dict) I know there was an issue with argparsing, but I think it was solvedbtw: Basically the way clearml-agent works, it does not actually pass the arguments in commandline but directly to the argparser at runtime
What happens if you clone the Task (the one with Args showing and without the explicit task.connect(_args)
and send it to the age...
PreciousParrot26 I think this is really a matter of the CI process having very limited resources. just to be clear, you are correct and the steps them selves are Not executed inside the CI environment, but it seems that even running the pipeline logic is somehow "too much" for the limited resources... Make sense ?
OSError: [Errno 28] No space left on device
Hi PreciousParrot26
I think this says it all π there is no more storage left to run all those subprocesses
btw:
I am curious about why a
ThreadPool
of
16
threads is gathered,
This is the maximum simultaneous jobs it will try to launch (it will launch more after the launching is doe, notice not the actual execution) but this is just a way to limit it.
controller_object.start_locally()
. Only the pipelinecontroller should be running locally, right?
Correct, do notice that if you are using Pipeline decorator and calling run_locally()
the actual the pipeline steps are also executed locally.
which of the two are you using (Tasks as steps, or functions as steps with decorator)?
Hi TrickyFox41
Hey since Hydra does not work with
clearml-task
I should shouldn't it? what does not work ?
Hi DrabCockroach54
I think the Kubernetes integration (k8s glue) is not part of the open-source features, and is only available as enterprise feature π
Xeon E3-1240: 4 - 5 hours!wow... yes definitely worth upgrading π
This is odd, how are you spinning clearml-serving ?
You can also do it synchronously :
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
Hi LovelyHamster1
Could you think of a toy code that reproduces this issue ?
It uses only one CPU core, could I use multiprocessing somehow?
Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?